Putting Video on a Floppy Disk

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

This is the music video to Toxic being played entirely off a 3.5” 1.44MB floppy. Seriously. You can just take this disk, pop it into a computer running Windows, run the program, and watch the ever-iconic music video get played out right in your terminal. Really. But, I pretend to hear you ask, how does one play a video from a floppy disk. Well, even if it is old, a floppy is still a storage device. All you need to do is take a video, and copy it onto the disk. Well, so long as that video is smaller than 1.44MB. And therein lies the problem. Floppy disks are small, and video is, not. My videos now generally render at around 20kbps, which means that a floppy couldn’t even hold a second of HD video, let alone three minutes. That’s at 60FPS 1080p though, and as you might expect, lower quality implies lower bitrate implies a smaller file size. When it comes to video encoding, there are a bunch of dials to control resolution, framerate, audio sample rate, the number of audio channels, what codec to use, and so on. Turn these knobs all down, and we get a video that easily fits on a floppy, but it looks like this. Redditor RichB93 from the LGR subreddit used these encoding settings, along with a better disk formatting scheme, and was able to encode an entire five minute episode of LGR down to a file small enough to be played off of a floppy disk. Clint even made a Blerb about it, good stuff. Turns out, that’s all you need to fit a video onto a floppy. Except, well, I dunno, that feels too easy in my opinion. The quality isn’t bad, but it’s all dependent on modern video codecs like H.265, and if the computer you stick it in doesn’t have the hardware or software to decode that, the video file is as good as useless. I thought maybe I could try to include a playback program on the disk, but most video players like VLC here are already way too big to fit on a floppy, even before we throw the video file in. So, how do we actually get a video onto a floppy that we can playback on almost any computer? We’re going to want something compact, efficient to decode, and portable. And immediately my thoughts went to text. We can take every frame of the video and turn it into ASCII art, and that’s perfect. Text is relatively easy to compress, almost every computer has some type of terminal output, and as a rule, using a terminal-based interface over a graphical one automatically increases your hacker cred by at least a few points. Converting video into text isn’t that new of a concept. In fact, there’s a really cool demo from 1997 called BB, which plays a procedurally generated animation using only text in your terminal. Neat stuff, but by far the coolest thing about the BB project, is that they released their image to text conversion tools under the name aalib. That means that converting a video into ASCII art is as simple as loading each frame in from the mp4, and pushing them all through aalib. But, something spoke to me. Maybe it was the naive programmer’s desire to “build everything all by myself,” maybe it was just me skipping ahead a bit and realizing that a more complex character set would kill any attempts at compression we might need to use. Whatever the case, I decided not to use aalib, and instead, make the video to text conversion all by myself. So… How does one make a video to ASCII converter? Well, it’s important to remember that a video is just a sequence of frames. And each frame is a set of three monochrome images. And each image is a 2D grid of intensity values, one for each pixel. In ASCII art, for our purposes anyways, each character is essentially one pixel, so our first step is to scale down each frame to the resolution the text output is going to have. Then, for each pixel, we need to decide what character we’re going to use to represent that color, and here is where things get interesting. You’ve probably already noticed from some of the ASCII art examples I’ve shown, some characters like spaces or commas work well at representing dark areas, since the character doesn’t have a lot of white pixels, and similarly, hashes and at signs are good for bright colors since they’re mostly filled in with white pixels. But what about everything else? There are a lot of colors between black and white. Well, rule #1, for now, I’m not going to be dealing with RGB color. Don’t get me wrong, I could totally do it, but color makes things more complicated, it’ll take up more space on the floppy, and it makes my program much harder to port between Linux and Windows, so for now, we’re just gonna change every frame to grayscale. Greyscale ends up being nice to work in anyhow. Each color exists on a brightness scale from zero to one, and we can pick which character to use based on which range it falls in. To get the ranges, I made a program to print out all the characters to the terminal, took their average brightness, plotted them out, and selected seven characters whose brightnesses were pretty evenly spaced apart. And, when we run the video through this process, this is what we get. It’s at least recognizable, but it does come with a slight problem. We only have seven brightness levels here, which means that we lose detail between the levels, especially on the darkest one. But this is a problem dithering can easily fix. If pixels alternate between two brightness levels in a checkerboard type pattern, we can give the illusion of more colors than we have, since our eyes blend the colors together. I’m using Floyd-Steinberg dithering here, which distributes any error accumulated when picking the color of a pixel into its neighbors. It’s a simple algorithm, but this step single handedly makes our text render of the video frames look a lot better, especially in the darker frames. As it turns out, a scaling step, a greyscale step, and a dithering step are all you need to convert a frame of video into a block of ASCII text, and those steps are all easy enough to work into a short python script. Before I move onto the next section, I should probably answer the question I’m sure a number of you are wondering… “Why toxic?” Originally the idea was just to use ANY music video, since I wanted to use something recognizable that was under 4 minutes. Originally, I was thinking of using the Rickroll video, or Take on Me, but those videos don’t have a lot of contrast. When you try to convert them to text, they come out looking mostly gray, and it’s hard to make out much in terms of details. On the complete opposite side of this problem is the Bad Apple video, which barring some anti-aliasing is entirely monochrome. Pretty boring for this project, in my opinion, plus just about every possible port of this has been made by programmers far more talented than me. Which brings us to the music video for Toxic. Turns out, this video is perfect for this kind of text rendering. Almost all the scenes have strong contrast between bright lights and dark shadows, and there are lots of closeups on peoples’ faces to show off the detail I’m able to preserve. Plus, I just like the song, ok? Free Britney. I did tease color video to text a minute ago, so I should probably talk about that a bit. First off, how is this even possible? Most terminals, at least in linux-land, support a set of formatting codes called the ANSI escape codes. These are a sequence of bytes that appear in the output stream. When the terminal sees them, it goes “oh, this must be formatting information” and it follows the encoded command to move the cursor, or switch the text color, or start writing in bold text. When it comes to setting colors, you get to change the foreground color and the background color, and depending on how advanced your terminal is, you have either 8, 16, or 256 colors to choose from. The cool thing about the 256 color set, is that the standard palette includes a set of RGB colors with 6 brightness levels per channel. Generating color video is as simple as printing a bunch of spaces, and formatting the background to whatever terminal color is the closest to our pixel. The result is recognizable albeit a bit pixelated. But, if we’re able to blend black to white using text, we can blend between the colors we’re given to get an even better color depth. I wrote a brute force program to try every foreground, background, and character combination in the ANSI palette, and determine the best combination for each RGB color from 0-255. In a sense, I generated a much more advanced 3D version of the grayscale bounds we talked about before. The result being, a table you can give any RGB color, and it tells you the exact foreground, background and character combination to match that color the closest in an ANSI terminal. And, it works. I’m able to draw absolutely beautiful color images in my terminal, still using nothing but colored text. The only drawback for video, is that the Windows terminal isn’t great at parsing these ANSI codes, and with multiple formatting sequences per character, it’s no longer able to keep up with our framerate. But that’s something I can fix through the magic of editing. Alright, that’s been enough tangents, let’s get down to the real business of this video. How do we get this text encoding of Toxic onto a Floppy disk? If your answer was drag and drop the text onto the floppy, you are unfortunately incorrect. While the source video is 18MB, the text encoding of Toxic, rendered at 160 columns like I’ve been showing, comes out to just about 45MB. That’s about 30 floppies worth of data. So, first things first. Let’s drop our resolution down by half in each dimension. I needed to do this anyways to match the width of the old 80 column terminal in Windows, but that step alone gets us to 11MB, which is now only eight-ish floppies, but still more than one. To bring this down further, we’ll need some form of compression. I want this thing to be as simple as inserting the floppy and clicking the program, so ZIP archives aren’t gonna cut it. Instead, I tried a few different methods for compressing images with small color palettes, like the classic Run Length Encoding. The idea is in most drawn images, you’ll find long horizontal strips of the exact same color, so, what’s easier to write down? 50 individual blue pixels, or one blue pixel times 50. In our case, since we only have 7 characters, there are really only 3 bits of significant data per byte, and we can use the spare 5 bits to contain the run length for each character. In the optimal case, one byte can stand for 32 pixels. In practice, it’s not even close. The dithering patterns kill the RLE here, since on average, each run only covers two or three characters before we need to start another. That didn’t work, so here’s another approach more suited for text: Huffman Coding. In this scheme, we build up a decision tree to decode bits from the compressed output. Most importantly though, different characters can have different encoding lengths and we can exploit this to give the most common characters the shortest encodings. It’s a brilliant technique, but, in the case where letters are used fairly evenly, the results are only as good as the 3-bit encoding we started with. One thing I started to play around with was constructing a Markov matrix of the transitions between characters as we print out a line. Basically, what are the probabilities of one character following another in our image? As it turns out for each character, almost always, it’s highly likely that we stay on that character or go to a character of similar brightness due to dithering. Given the previous character we can rank each choice of what comes next based on its likelihood, and I ended up encoding the image this way. Now, instead of one representing brightness level one, it means, pick the most likely next character. Two, means pick the second most likely step and so on. In this encoding, most of the pixels turn into ones or twos, and since one is way more common than the other numbers, it encodes nicely on a Huffman tree as a single bit. Theoretically, this algorithm can produce an 8x compression rate, and in practice it gets about 5x. This method gets us all the way down to 2MB, and the egotist in me would like to point out, that’s actually better than ZIP compression. 2MB is still too big for a formatted floppy, but I feel like I’ve worked enough here, so I’m gonna cut one last corner and switch the playback frame rate from 30 FPS to 15. I’m sure there are going to be a bunch of 240FPS pro gamers in the comments saying otherwise, but honestly, when rendering to text, I can barely tell the difference, and it cuts our data in half. With that, the entire 18MB video rendered in text, can be compressed all the way down to just 1.1MB of data. Now we just have to get it off of the floppy. This step just involves a C program to extract and print our compressed text. Writing it was mostly straightforward, though getting it to run well on older hardware required a few fixes. To improve speed, I needed to cut my program down to just one printf call per frame. The older terminal also had a hard time scrolling all the text my program put out, so now instead of starting a new line when drawing a frame, I just put the cursor back at the top left and draw over the last one. With those beautification steps in place, let’s see it. Here I have my TC1100 tablet running Windows XP and the Toxic floppy. All we have to do is pop it in, and run the program. It takes a little time to load the program in, but once it’s done, we see the Toxic video. Pretty slick, if I do say so myself. There’s no sound, since I was focusing mostly on the video for this project, but I’m sure with the right bitrate, and maybe some more compression on the whole disk, you could probably squeeze in some audio data to get the full Toxic experience. The cool thing is, since this is a 32 bit executable, it should be able to run on Windows versions as far back as Windows 95. Testing in a virtual machine, I can confirm that, yes it definitely still runs on Windows 95. But, the output does have a hard time keeping up. Even so, that is still definitely ASCII Spears on the screen, so I’ll take it. My original goal with this project was to be able to fit a reasonable length video, and the software to play it back all on a single floppy disk, and that’s exactly what I have in my hand here. All it took was converting the video into a stream of ASCII characters, then figuring out a way to compress all that text. I compiled it here for Windows, but this video should be displayable on just about anything with an 80-column terminal. If anyone wants to make a port to some really obscure hardware, or make some improvements to my program, I’d love to see it. Like always, I’ll have a link in the description to the GitHub repo with the code necessary to textify and floppify any video you want, so be sure to check that out. Anyways, there now exists a floppy disk port of the Toxic video. I’m not sure who was asking for this, but you're welcome.

Info

Channel: The Science Elf

Views: 375,814

Rating: undefined out of 5

Keywords:

Id: uGoR3ZYZqjc

Channel Id: undefined

Length: 12min 55sec (775 seconds)

Published: Thu Mar 10 2022