Representing Numbers and Letters with Binary: Crash Course Computer Science #4

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Smash the binary!

... wait, no, that's a different thing.

👍︎︎ 3 👤︎︎ u/_InexplicablySo_ 📅︎︎ Mar 16 2017 🗫︎ replies
Captions
Hi I’m Carrie Anne, this is Crash Course Computer Science and today we’re going to talk about how computers store and represent numerical data. Which means we’ve got to talk about Math! But don’t worry. Every single one of you already knows exactly what you need to know to follow along. So, last episode we talked about how transistors can be used to build logic gates, which can evaluate boolean statements. And in boolean algebra, there are only two, binary values: true and false. But if we only have two values, how in the world do we represent information beyond just these two values? That’s where the Math comes in. INTRO So, as we mentioned last episode, a single binary value can be used to represent a number. Instead of true and false, we can call these two states 1 and 0 which is actually incredibly useful. And if we want to represent larger things we just need to add more binary digits. This works exactly the same way as the decimal numbers that we’re all familiar with. With decimal numbers there are "only" 10 possible values a single digit can be; 0 through 9, and to get numbers larger than 9 we just start adding more digits to the front. We can do the same with binary. For example, let’s take the number two hundred and sixty three. What does this number actually represent? Well, it means we’ve got 2 one-hundreds, 6 tens, and 3 ones. If you add those all together, we’ve got 263. Notice how each column has a different multiplier. In this case, it’s 100, 10, and 1. Each multiplier is ten times larger than the one to the right. That's because each column has ten possible digits to work with, 0 through 9, after which you have to carry one to the next column. For this reason, it’s called base-ten notation, also called decimal since deci means ten. AND Binary works exactly the same way, it’s just base-two. That’s because there are only two possible digits in binary – 1 and 0. This means that each multiplier has to be two times larger than the column to its right. Instead of hundreds, tens, and ones, we now have fours, twos and ones. Take for example the binary number: 101. This means we have 1 four, 0 twos, and 1 one. Add those all together and we’ve got the number 5 in base ten. But to represent larger numbers, binary needs a lot more digits. Take this number in binary 10110111. We can convert it to decimal in the same way. We have 1 x 128, 0 x 64, 1 x 32, 1 x 16, 0 x 8, 1 x 4, 1 x 2, and 1 x 1. Which all adds up to 183. Math with binary numbers isn’t hard either. Take for example decimal addition of 183 plus 19. First we add 3 + 9, that’s 12, so we put 2 as the sum and carry 1 to the ten’s column. Now we add 8 plus 1 plus the 1 we carried, thats 10, so the sum is 0 carry 1. Finally we add 1 plus the 1 we carried, which equals 2. So the total sum is 202. Here’s the same sum but in binary. Just as before, we start with the ones column. Adding 1+1 results in 2, even in binary. But, there is no symbol "2" so we use 10 and put 0 as our sum and carry the 1. Just like in our decimal example. 1 plus 1, plus the 1 carried, equals 3 or 11 in binary, so we put the sum as 1 and we carry 1 again, and so on. We end up with 11001010, which is the same as the number 202 in base ten. Each of these binary digits, 1 or 0, is called a “bit”. So in these last few examples, we were using 8-bit numbers with their lowest value of zero and highest value is 255, which requires all 8 bits to be set to 1. Thats 256 different values, or 2 to the 8th power. You might have heard of 8-bit computers, or 8-bit graphics or audio. These were computers that did most of their operations in chunks of 8 bits. But 256 different values isn’t a lot to work with, so it meant things like 8-bit games were limited to 256 different colors for their graphics. And 8-bits is such a common size in computing, it has a special word: a byte. A byte is 8 bits. If you’ve got 10 bytes, it means you’ve really got 80 bits. You’ve heard of kilobytes, megabytes, gigabytes and so on. These prefixes denote different scales of data. Just like one kilogram is a thousand grams, 1 kilobyte is a thousand bytes…. or really 8000 bits. Mega is a million bytes (MB), and giga is a billion bytes (GB). Today you might even have a hard drive that has 1 terabyte (TB) of storage. That's 8 trillion ones and zeros. But hold on! That’s not always true. In binary, a kilobyte has two to the power of 10 bytes, or 1024. 1000 is also right when talking about kilobytes, but we should acknowledge it isn’t the only correct definition. You’ve probably also heard the term 32-bit or 64-bit computers – you’re almost certainly using one right now. What this means is that they operate in chunks of 32 or 64 bits. That’s a lot of bits! The largest number you can represent with 32 bits is just under 4.3 billion. Which is thirty-two 1's in binary. This is why our Instagram photos are so smooth and pretty – they are composed of millions of colors, because computers today use 32-bit color graphics Of course, not everything is a positive number - like my bank account in college. So we need a way to represent positive and negative numbers. Most computers use the first bit for the sign: 1 for negative, 0 for positive numbers, and then use the remaining 31 bits for the number itself. That gives us a range of roughly plus or minus two billion. While this is a pretty big range of numbers, it’s not enough for many tasks. There are 7 billion people on the earth, and the US national debt is almost 20 trillion dollars after all. This is why 64-bit numbers are useful. The largest value a 64-bit number can represent is around 9.2 quintillion! That’s a lot of possible numbers and will hopefully stay above the US national debt for a while! Most importantly, as we’ll discuss in a later episode, computers must label locations in their memory, known as addresses, in order to store and retrieve values. As computer memory has grown to gigabytes and terabytes – that’s trillions of bytes – it was necessary to have 64-bit memory addresses as well. In addition to negative and positive numbers, computers must deal with numbers that are not whole numbers, like 12.7 and 3.14, or maybe even stardate: 43989.1. These are called “floating point” numbers, because the decimal point can float around in the middle of number. Several methods have been developed to represent floating point numbers. The most common of which is the IEEE 754 standard. And you thought historians were the only people bad at naming things! In essence, this standard stores decimal values sort of like scientific notation. For example, 625.9 can be written as 0.6259 x 10^3. There are two important numbers here: the .6259 is called the significand. And 3 is the exponent. In a 32-bit floating point number, the first bit is used for the sign of the number -- positive or negative. The next 8 bits are used to store the exponent and the remaining 23 bits are used to store the significand. Ok, we’ve talked a lot about numbers, but your name is probably composed of letters, so it’s really useful for computers to also have a way to represent text. However, rather than have a special form of storage for letters, computers simply use numbers to represent letters. The most straightforward approach might be to simply number the letters of the alphabet: A being 1, B being 2, C 3, and so on. In fact, Francis Bacon, the famous English writer, used five-bit sequences to encode all 26 letters of the English alphabet to send secret messages back in the 1600s. And five bits can store 32 possible values – so that’s enough for the 26 letters, but not enough for punctuation, digits, and upper and lower case letters. Enter ASCII, the American Standard Code for Information Interchange. Invented in 1963, ASCII was a 7-bit code, enough to store 128 different values. With this expanded range, it could encode capital letters, lowercase letters, digits 0 through 9, and symbols like the @ sign and punctuation marks. For example, a lowercase ‘a’ is represented by the number 97, while a capital ‘A’ is 65. A colon is 58 and a closed parenthesis is 41. ASCII even had a selection of special command codes, such as a newline character to tell the computer where to wrap a line to the next row. In older computer systems, the line of text would literally continue off the edge of the screen if you didn’t include a new line character! Because ASCII was such an early standard, it became widely used, and critically, allowed different computers built by different companies to exchange data. This ability to universally exchange information is called “interoperability”. However, it did have a major limitation: it was really only designed for English. Fortunately, there are 8 bits in a byte, not 7, and it soon became popular to use codes 128 through 255, previously unused, for "national" characters. In the US, those extra numbers were largely used to encode additional symbols, like mathematical notation, graphical elements, and common accented characters. On the other hand, while the Latin characters were used universally, Russian computers used the extra codes to encode Cyrillic characters, and Greek computers, Greek letters, and so on. And national character codes worked pretty well for most countries. The problem was, if you opened an email written in Latvian on a Turkish computer, the result was completely incomprehensible. And things totally broke with the rise of computing in Asia, as languages like Chinese and Japanese have thousands of characters. There was no way to encode all those characters in 8-bits! In response, each country invented multi-byte encoding schemes, all of which were mutually incompatible. The Japanese were so familiar with this encoding problem that they had a special name for it: "mojibake", which means "scrambled text". And so it was born – Unicode – one format to rule them all. Devised in 1992 to finally do away with all of the different international schemes it replaced them with one universal encoding scheme. The most common version of Unicode uses 16 bits with space for over a million codes - enough for every single character from every language ever used – more than 120,000 of them in over 100 types of script plus space for mathematical symbols and even graphical characters like Emoji. And in the same way that ASCII defines a scheme for encoding letters as binary numbers, other file formats – like MP3s or GIFs – use binary numbers to encode sounds or colors of a pixel in our photos, movies, and music. Most importantly, under the hood it all comes down to long sequences of bits. Text messages, this YouTube video, every webpage on the internet, and even your computer’s operating system, are nothing but long sequences of 1s and 0s. So next week, we’ll start talking about how your computer starts manipulating those binary sequences, for our first true taste of computation. Thanks for watching. See you next week.
Info
Channel: CrashCourse
Views: 1,294,909
Rating: 4.9347768 out of 5
Keywords: John Green, Hank Green, vlogbrothers, Crash Course, crashcourse, education, binary, unicode, ascii, boolean, computers, computing, 8-bit, 32-bit, 64-bit, byte, megabyte, gigabyte, kilobyte, emoji, memory, interoperability, mojibake, true, false, computer science, adding binary, encoding schemes
Id: 1GSjbWt0c9M
Channel Id: undefined
Length: 10min 45sec (645 seconds)
Published: Wed Mar 15 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.