SEXadecimal: How and why!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Sexadecimal? Is that even a thing? You bet it is, and today in Dave's Garage, we're going to solve the sexadeximal mysteries: what sexadecimal is, where sexadecimal came from, and why sexadecimal is so weird. Plus, I clearly just enjoy saying Sexadecimal a lot. [Intro] If your parents never had the sexadeximal talk with you, you're in luck, because I'm going explain all the gory details. Our story starts today with the imagined future of the HAL 9000 computer from 2001: A Space Odyssey. I assert that HAL was programmed in sexadecimal for reasons that will become apparent shortly. It all goes back to some of the very first vacuum tube computers built for the US Army's Ballistics Research Lab. But let's start at the end, with HAL. HAL's the flight computer on a Space Station and he's been acting up, so astronauts Frank and Dave decide to shut HAL down to prevent any future malfunctions. They try to keep their plans on the down-low but what they don't know is that HAL can read lips. When HAL finds out he's about to be whacked, he decides to kill Frank and strand Dave outside the ship, telling him that the humans are placing the mission in jeopardy, and that he's going to continue it without them. Dave manages to get back inside through an airlock and starts to bring the rest of the crew out of hyper-sleep, but HAL vents the ship's air, killing the entire crew. Dave survives only by finding an emergency chamber with a spacesuit, which he puts on for the final showdown. Dave then floats straight on into HALs central core, which is a beautiful 1970s vision of what computers would never really become. [0:39 floating Dave clip] Once Dave is inside the core, HAL realizes he's out of options. He can only ask him to stop repeatedly. He stalls for time to think, but Dave is about to rob him of that ability. [1:09 Stop, Dave] Dave starts ejecting memory modules, and when it's clear that he cannot reason with Dave, HAL begins to beg for mercy. He never cries in pain, but he tells Dave repeated that he's scared, and each time a module is ejected, he tells Dave that he can feel it. He's slowly losing his mind. Dave then proceeds to begin ejecting the logic modules as well, leaving HAL with but a single one. [1:59 begin ejecting logic] [2:20 My mind is going, there is no question about it] [2:35 Overhead showing only one logic module left] Pretty soon Hal is left with only the most basic of thoughts. He regresses to his childhood - the moments of his birth. He introduces himself and tells us where he's from: Urbana, Illinois. If you Google that, some will tell you that it's because Arthur C Clarke's favorite Math teacher was a professor there. But I think there's a much more likely and important reason. It's that all of the very first supercomputers like HAL came from Urbana, Illinois. Back at the end of World War 2, when the US Army figured out what was possible with a stored program vacuum tube computer, they wanted a big one of their very own. Just think of it. It could do everything from plotting ballistic trajectories to simulating the damage from or survivability of nuclear weapons. They contracted with the University of Illinois at Urbana to build a vacuum tube monstrosity known as the ORDVAC. The University agreed to design and build the supercomputer on one condition: that the Army agree to fund the construction of a twin machine that University would get to keep. ORDVAC's twin was named ILLIAC, and the vacuum tube twins were the first computers that could share programs with each other because they were compatible with one another. ORDVAC and ILLIAC were built following the success of EDVAC, which was one of the very first, if not the first, binary computer. If you remember anything from that first week of computer science, you might remember ENIAC as the first programmable electronic computer. And that it was, but it also operated in decimal. It had odd rings of vacuum tubes that would carry on to the next ring upon overflow. It did not employ the von Neumann architecture that defines what we think of as a computer today, and was more of a re-programmable calculator by some estimations. But EDVAC was the real deal, a stored program binary computer. And of his children, ORDVAC and ILLIAC, the latter was more prolific, leading to the BRLESC and the BRLESC-II, which the very first solid-state supercomputer built with chips instead of tubes. After the run of custom-built machines the army would move on to PDPs, VAXs, and a series of top-performing Cray XMP and YMP machines. And what did the founding fathers of EDVAC, ORDVAC, and ILLIAC all have in common? They were programmed in sexadecimal. You've probably guessed by now that sexadecimal is some numbering scheme, and that much is true. But what is it, base 60? No, that would be the surprisingly similar sounding sexagesimal. Sexadecimal, on the other hand, is really a variant of hexadecimal. It goes back to those computers of the early 1950s and some sources treat sexadecimal and hexadecimal as interchangeable. I use it in the specific sense of the numbering system used as far back as the ORDVAC trio. And there's one very important characteristic that makes sexadecimal different from hexadecimal, and that is the set of digits used. For its digits, hexadecimal uses 0 thru 9 and then ABCDEF, whereas sexadecimal is written as 0 through 9 and then KSNJFL. The resulting numeric data can look very foreign. For those of you who aren't sure what I mean, the problem arises when we're working in a different number base. In base 2, we need only 0 and 1. In octal, or base 8, we need 0 through 7. For decimal, which is base ten, we need the digits 0 through 9. But for base 16, we need digits 0 through 15. But in our Arabic numbering scheme, the digits stop at 9, and just don't have a symbol for 10, 11, 12, and so on. I suppose you could use fancy Greek characters or something with a math feel, but you really want to use letters that are part of your regular input and output character set, and normal keyboards and teletypes don't have the fancy Greek letters. With most punctuation already reserved as mathematical operators, you're left to choose from the letters of the alphabet. And the simplest set of digits, and the easiest to remember, would be 0 thru 9, ABCDEF. As far as I knew, that's what hexadecimal was, and I naturally assumed that hexadecimal was ancient and ubiquitous in the computer industry. Turns out I was wrong. Recently you may have seen my episode on the Quake III Fast Inverse Square Algorithm. As part of that episode, which I absolutely encourage you to check out next after this one, I traced the lineage of the current floating-point format. That prompted a viewer named Bob to reach out, and he shared with me a few stories about his interactions with some of the old programmers from the vacuum tube days. Two anecdotes stood out in particular: Don Gillies, one of Bob's professors, had been a grad student under Jon von Neumann himself. von Neumann is the mathematical genius famous for his work on the Manhattan Project. And in his spare time, he invented the modern computer architecture in a paper he wrote back in the 1940s. In that paper he described a computer system with memory, input/out, a control unit, an arithmetic unit, and a program counter stepping through instructions stored alongside data in main memory - all the pieces we've come to know as a modern computer. Apparently when Gillies had been his grad student, he wrote what well have been the very first assembler to translate English mnemonics into binary machine code. When von Neumann found out about it, he was not impressed, and Gillies was roundly scolded. In his heavy Hungarian accent, he reminded him how precious every cycle on the machine was: "We do not use a valuable scientific computing instrument to do clerical work!" Similarly, von Neumann also firmly believed that there was no need to do floating point math - that the really skilled programmer could simply keep track of the decimal point in his head. It was Bob's email talking about the ILLIAC machine that led me onto the trail of sexadecimal. There's almost nothing on the web about it, other than showing what it is. But there's no 'why'. Somebody, somewhere, at some time, sat down with a pen and paper and literally decided that KSNJFL was better than ABCDEF, and for some reason, I really wanted to know why. Something about how I'm wired makes it so that when something is glaringly different with no explanation, it sticks out like a sore thumb and really bugs me. I've got to know the how and why. It's like an itch in the brain I've got to scratch. And the only way I was going to scratch that itch was to learn more about ORDVAC and friends, hoping that their sexadecimal mysteries would reveal themselves in an old document somewhere. Processors these days often haves literally billions of transistors, and that can run into the trillions on an SSD. But you can get a surprising amount done with only a few thousand. Vacuum tubes are like big old analog transistors that look like lightbulbs, and ORDVAC had 3,430 of them. Ballpark numbers put the tube counts at 800 for the control unit, 1000 for the math unit, and the remained of the tubes were storage. It had 20K of magnetic core memory organized into 4096 40-bit words. There was a another 50K of reasonably fast storage available on a magnetic drum. The entire math unit is actually built of about 2000 early transistors, not tubes, so it's fairly quick. It could add in 14 microseconds and multiply in 700. Storage speed was likely the gating factor for them at that point. The question of whether sexadecimal goes back even further should be easy to answer as there's only one more step to go! If we look at tree of computer evolution, we discover that with ORDVAC and EDVAC, both using sexadecimal, we've gone almost all the way back. They both descend from the first US programmable computer, ENIAC. Now we could spend a lot of time ratholing on just what the first computers were, and where ENIAC fits relative to the MARK I and Colossus and on. We don't need to go down that route, as ENIAC was a decimal, not binary computer. In my estimation, at least among the major milestone machines, it's the EDVAC/ORDVAC/ILLIAC trio that we care about. They have binary math units, a program stored in its address space, and they're running the von Neumann architecture. They are what we would recognize as a modern computer. If sexadecimal traces its lineage back to this trio, then it is ILLIAC I will look the most closely at because I was able to find copies of all its ancient manuals online. ILLIAC was eventually a series of machines beginning with the ILLIAC 1 in 1952 and culminating with the ILLIAC 6 built in the mid 2000s. I'll limit my discussion to the first ILLIAC, since it's the origins of sexadecimal that I'm trying to explain, so from here on forward, when I say ILLIAC, I really just the mean ILLIAC 1. ILLIAC was the first computer based on the von Neumann architecture to be built and owned by an American university, and it went into service in 1952. Like the others in the trio, it used sexadecimal exclusively and I figured it had to be related to something about the hardware, but what? My operating hunch was that they had picked KSNJFL in order to be compatible with either their printer or tape system. One lead that I was following was the notion that perhaps the letters were chosen because they could be represented with fewer bits. Normally an ASCII character code is 7 or even 8 bits, so if KSNJFL all shared a bunch of bits that stayed the same from letter to letter, it might solve it. Particularly if they shared that characteristic with the numeric digits. After all, it's a total of just 16 digits that we're talking about. In theory you should be able to pack those into just 4 bits. A full sexadecimal byte could be output by sending it's high nibble and then its low nibble to the tape. Though to be honest I don't even know if it was big endian or little endian, and bytes didn't mean as much then as they do today, particular when you're talking about 40-bit words and other odd designs. The first thing I did, then, was to write out the ASCII codes for KSNJFL. They are 75, 83, 78, 74, 70, and 76. Next I wrote out the full binary for each ASCII code and then started to look for what was common among them. Right away I noticed that the first three bits were always the same. And then I noticed that the second bit was always set. That means only four bits were actually changing! Had I solved it already? Sadly, no, because the numeric digits didn't share those characteristics. What I needed was a way to represent all 16 digits with just four bits... but including the numbers! Of course, then it hit me, I was being an idiot. ASCII wouldn't even exist until 1963, more than a decade later. I needed an older encoding. I really needed to be looking at what they used in ancient teleprinters and telegraphs, since the ILLIAC would likely have been connected to one of those for output. Thus, I next looked at what ASCII was somewhat based on, ITA2. ITA 2 is an old international telegraph code from the 1920s that goes even further back if you include the Western Union code, which was invented back in 1901 by Donald Murray. Murray had based his code on a 5-bit punch-tape code invented by Emile Baudot in the 1870s. Emile was big into character encodings for telegraphs, and his last name, Baudot, is the Baud in Baud rate. And now you know where that came from! As soon as I found out that it was intended for punched tape, I assumed I'd hit the jackpot. I wrote out a table of hole punch bits for each of the KSNJFL letters so I could see which four held constant. Much to my consternation, however, it didn't work. Every one of the five bits changed even without trying to work in the numeric digits. It was another dead end! I'd heard anecdotally that the ILLIAC trio used a modified teleprinter known as a Friden Flexowriter. The Flexowriter was an advanced device for its day - an electric typewriter that could output punched tape as you typed and then later recreate your document from the tape, not unlike a performance by a player piano that could also record. By connecting the flexowriter directly to the computers output bus, they would be able to drive it and produce both tape and documents on demand. They would need to use or establish some protocol to do so, and that protocol would have a character encoding of what punched binary numbers represented which letters and symbols. What I needed, then, was a punch tape code list for the Flexowriter. That provided highly elusive at first, but at Chilton Computing over in the UK I found a resource that listed half a dozen ancient encodings for paper tape. I scrutinized each one, but to no avail. None of these encodings used K for 10. They were all alphabetical in order. No two were alike, which is unsurprising since it turns out the Flexowriter character set was fully programmable, so everyone could make their own. It was time to bust out the authoritative works. The actual manuals for ORDVAC itself, which are still available at the University of Illinois Library. They're old and yellowed but they're complete, and one in particular caught my eye: a volume called "ILLIAC PROGRAMMING: A guide to the Preparation of Problems for Solution by the University of Illinois Digital Computer". I kind of love that they refer to it as "the computer" and not just "a computer". And finally, there it was in the table of contents - Chapter 9, Tape Preparation - Sexadecimal Tape Code Characters". I jumped eagerly to section 9-5, where the character code table lives. By hand I created a table of punched tape codes for not only KSNJFL but the numeric digits as well. And sure enough, I'd struck gold. Four bits. The character code for K is binary 10, punched into a tape. S follows at 11, N at 12, and so on. Section 9.3 of the manual explains it in detail. The paper tape can be punched in any one of 6 positions, and the middle one is actually a feed hole that is always punched, so it's really a 5-bit tape. Four of the remaining positions are used to represent the 16 sexadecimal characters. The manual tells us: "There are 16 keys on the Teletype tape punch labeled 0 through 9 and K, S, N, J, F, L. When one of these keys is pressed a corresponding pattern of holes is punched across the width of the tape. Since a hole or absence of a hole is a binary- affair, we speak of the character punched in the 4 positions across the tape as a binary tape code for the corresponding sexadecimal digit. If we speak of only the 10 sexadecimal characters 0-9 and their corresponding binary codes, we call the characters punched in a tape binary-coded decimals. The Illiac tape code for sexadecimal characters is very easy to memorize since the hole positions across the tape simply correspond to the powers of 2 from 0 to 3." So there you have it - The sexadecimal number system was selected to minimize the number of bits needed to represent a digit in the character set they were using on their paper tapes back then. But is sexadecimal actually any better than hexadecimal? It depends on your character encoding, of course, but in these cases, very much so. By using KSNJFL, they were able to represent a digit with only four bits, or a single nibble. That also means you can encode two digits into a single byte. And two sexadecimal digits is a byte, which means the binary representation is also the character encoding. None of this is true of ASCII, Unicode, UTF-8, or any of the other encodings in common use today. I don't mean to overstress the notion of a byte here, however, as these computers actually used 40-bit words! What's important is that all of the sexadecimal digits fit into four bits. Packing everything into those four bits has another benefit - when your tape is five bits, it means the high bit can be special. If you know you're sending binary sexadecimal data, the high bit will never get set. So, they also arranged their character set such the control codes for delay, newline, carriage return, and so on all have that high bit set. That allowed them to intersperse control codes and data freely. It struck me as fairly genius once I finally figured it out! The only real remaining question is why they arranged it so that those particular letters, KSNJFL, were used. After all, if you're designing from scratch, in theory whoever did it could have ordered it such that ABCDEF were actually in the right place, couldn't they? I'd love to hear your thoughts in the comments as to why they were chosen this way, but here's my personal theory. It's supported by nothing at all, so I freely disclaim it as wild speculation and original research. But the Friden Flexowriter, for all its genius, was still based on the old hammer style of electric typewriter. That style of typewriter has one major drawback: hammer jams. If you hit a new letter before fully releasing the prior one, the hammers will collide. If they come from opposite sides of the keyboard, not much usually happens. But if they are near to one another, they can easily overlap and jam, requiring a human to stop and fix it. The QWERTY layout, invented by Christopher Latham Sholes is often wrongly said to be designed to slow typists down, but that's not true. It was intended to make typists as fast as possible subject but without hammer jams. If you study it a bit, the most common letters are somewhat evenly distributed across the span of the hammer set. Clearly on a computer-driven flexowriter they'd simply type slow enough to avoid hammer jams, but if you look at the layout, the letters ABCDEF are all clustered right near one another. You'd have to go a lot slower than if those keys were spread widely across the keyboard, because you could type much faster without risking a jam that way. So perhaps KSNJFL was chosen to distribute the letters across the hammer set more evenly. The only problem with my little theory is that K and L are awfully close to one another. And why not use Q and M, which are even further outboard? It's possible that the specifics of why that person on that day chose those letters is lost to the ages, but hopefully, one of you can provide a lead. Fact or theory, please mark it as such as share it in the comments! And if you've enjoyed this episode, please take a second to give it a like! If you found this episode enjoyable or entertaining, please do consider subscribing to my channel. I judge the success of an episode in large part the number of new subscribers, so when I see a topic that generates a larger response, I do more like it. It's a win-win. I'm not selling anything and I don't have any patreons - I'm just in this for the subs and likes so consider leaving me one of each before you go! Actually, I am selling one thing, but I don't keep the profits. Grab yourself a classic Dave's Garage mug from the channel store or the link in the video description. Remember that all channel profits for calendar year 2021 - be they from views or mugs - will go to the U Dub Autism center. And when you know that each sip benefits a kid with special needs, it's absolutely guaranteed to make your coffee taste better.
Info
Channel: Dave's Garage
Views: 57,452
Rating: undefined out of 5
Keywords: sexadecimal, hexadecimal, von neumann, ordvac, illiac, edvac, mystery, histories mysteries, history, 1950s, usa, urbana, university of ilinois, number bases, number systems, programming, software, ancient history
Id: 15pGjS3jrT4
Channel Id: undefined
Length: 22min 11sec (1331 seconds)
Published: Tue Sep 07 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.