Putting Video on a Floppy Disk

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
This is the music video to Toxic being played  entirely off a 3.5” 1.44MB floppy. Seriously. You   can just take this disk, pop it into a computer  running Windows, run the program, and watch the   ever-iconic music video get played out right in  your terminal. Really. But, I pretend to hear you   ask, how does one play a video from a floppy disk. Well, even if it is old, a floppy is still a   storage device. All you need to do is take  a video, and copy it onto the disk. Well,   so long as that video is smaller than 1.44MB. And  therein lies the problem. Floppy disks are small,   and video is, not. My videos now generally render  at around 20kbps, which means that a floppy   couldn’t even hold a second of HD video, let  alone three minutes. That’s at 60FPS 1080p though,   and as you might expect, lower quality implies  lower bitrate implies a smaller file size.  When it comes to video encoding, there are  a bunch of dials to control resolution,   framerate, audio sample rate, the number of  audio channels, what codec to use, and so on.   Turn these knobs all down, and we get a video that  easily fits on a floppy, but it looks like this.   Redditor RichB93 from the LGR subreddit  used these encoding settings, along with   a better disk formatting scheme, and was able  to encode an entire five minute episode of LGR   down to a file small enough to be played off of  a floppy disk. Clint even made a Blerb about it,   good stuff. Turns out, that’s all you  need to fit a video onto a floppy.  Except, well, I dunno, that feels too  easy in my opinion. The quality isn’t bad,   but it’s all dependent on modern video codecs  like H.265, and if the computer you stick it in   doesn’t have the hardware or software to decode  that, the video file is as good as useless.   I thought maybe I could try to include a playback  program on the disk, but most video players like   VLC here are already way too big to fit on a  floppy, even before we throw the video file in.  So, how do we actually get a video onto a floppy  that we can playback on almost any computer? We’re   going to want something compact, efficient to  decode, and portable. And immediately my thoughts   went to text. We can take every frame of the video  and turn it into ASCII art, and that’s perfect.   Text is relatively easy to compress, almost  every computer has some type of terminal output,   and as a rule, using a terminal-based interface  over a graphical one automatically increases your   hacker cred by at least a few points. Converting video into text isn’t that   new of a concept. In fact, there’s a  really cool demo from 1997 called BB,   which plays a procedurally generated animation  using only text in your terminal. Neat stuff, but   by far the coolest thing about the BB project, is  that they released their image to text conversion   tools under the name aalib. That means that  converting a video into ASCII art is as simple   as loading each frame in from the mp4, and pushing  them all through aalib. But, something spoke to   me. Maybe it was the naive programmer’s desire  to “build everything all by myself,” maybe it was   just me skipping ahead a bit and realizing that a  more complex character set would kill any attempts   at compression we might need to use. Whatever the  case, I decided not to use aalib, and instead,   make the video to text conversion all by myself. So… How does one make a video to ASCII converter?   Well, it’s important to remember that a video is  just a sequence of frames. And each frame is a set   of three monochrome images. And each image is a  2D grid of intensity values, one for each pixel.   In ASCII art, for our purposes anyways,  each character is essentially one pixel,   so our first step is to scale down each frame to  the resolution the text output is going to have.   Then, for each pixel, we need to decide what  character we’re going to use to represent that   color, and here is where things get interesting. You’ve probably already noticed from some of the   ASCII art examples I’ve shown, some characters  like spaces or commas work well at representing   dark areas, since the character doesn’t have a  lot of white pixels, and similarly, hashes and   at signs are good for bright colors since they’re  mostly filled in with white pixels. But what about   everything else? There are a lot of colors between  black and white. Well, rule #1, for now, I’m not   going to be dealing with RGB color. Don’t get  me wrong, I could totally do it, but color makes   things more complicated, it’ll take up more space  on the floppy, and it makes my program much harder   to port between Linux and Windows, so for now,  we’re just gonna change every frame to grayscale.  Greyscale ends up being nice to work in anyhow.  Each color exists on a brightness scale from zero   to one, and we can pick which character  to use based on which range it falls in.   To get the ranges, I made a program to print  out all the characters to the terminal, took   their average brightness, plotted them out, and  selected seven characters whose brightnesses were   pretty evenly spaced apart. And, when we run the  video through this process, this is what we get.  It’s at least recognizable, but it does come with  a slight problem. We only have seven brightness   levels here, which means that we lose detail  between the levels, especially on the darkest   one. But this is a problem dithering can easily  fix. If pixels alternate between two brightness   levels in a checkerboard type pattern, we can  give the illusion of more colors than we have,   since our eyes blend the colors together. I’m  using Floyd-Steinberg dithering here, which   distributes any error accumulated when picking  the color of a pixel into its neighbors. It’s a   simple algorithm, but this step single handedly  makes our text render of the video frames look   a lot better, especially in the darker frames. As it turns out, a scaling step, a greyscale step,   and a dithering step are all you need to convert  a frame of video into a block of ASCII text,   and those steps are all easy enough  to work into a short python script.   Before I move onto the next section, I should  probably answer the question I’m sure a number   of you are wondering… “Why toxic?” Originally  the idea was just to use ANY music video,   since I wanted to use something recognizable  that was under 4 minutes. Originally, I was   thinking of using the Rickroll video, or Take on  Me, but those videos don’t have a lot of contrast.   When you try to convert them to text, they come  out looking mostly gray, and it’s hard to make out   much in terms of details. On the complete opposite  side of this problem is the Bad Apple video, which   barring some anti-aliasing is entirely monochrome.  Pretty boring for this project, in my opinion,   plus just about every possible port of this has  been made by programmers far more talented than   me. Which brings us to the music video for Toxic.  Turns out, this video is perfect for this kind of   text rendering. Almost all the scenes have strong  contrast between bright lights and dark shadows,   and there are lots of closeups on peoples’ faces  to show off the detail I’m able to preserve.   Plus, I just like the song, ok? Free Britney. I did tease color video to text a minute ago,   so I should probably talk about that a bit. First  off, how is this even possible? Most terminals,   at least in linux-land, support a set of  formatting codes called the ANSI escape codes.   These are a sequence of bytes that appear in  the output stream. When the terminal sees them,   it goes “oh, this must be formatting  information” and it follows the encoded   command to move the cursor, or switch the  text color, or start writing in bold text.   When it comes to setting colors, you get to change  the foreground color and the background color, and   depending on how advanced your terminal is, you  have either 8, 16, or 256 colors to choose from.   The cool thing about the 256 color set, is that  the standard palette includes a set of RGB colors   with 6 brightness levels per channel. Generating  color video is as simple as printing a bunch of   spaces, and formatting the background to whatever  terminal color is the closest to our pixel.  The result is recognizable albeit a bit pixelated.  But, if we’re able to blend black to white using   text, we can blend between the colors we’re given  to get an even better color depth. I wrote a brute   force program to try every foreground, background,  and character combination in the ANSI palette,   and determine the best combination for  each RGB color from 0-255. In a sense,   I generated a much more advanced 3D version of  the grayscale bounds we talked about before.   The result being, a table you can give any RGB  color, and it tells you the exact foreground,   background and character combination to match  that color the closest in an ANSI terminal.   And, it works. I’m able to draw absolutely  beautiful color images in my terminal, still using   nothing but colored text. The only drawback for  video, is that the Windows terminal isn’t great   at parsing these ANSI codes, and with multiple  formatting sequences per character, it’s no longer   able to keep up with our framerate. But that’s  something I can fix through the magic of editing.  Alright, that’s been enough tangents, let’s  get down to the real business of this video.   How do we get this text encoding of Toxic onto  a Floppy disk? If your answer was drag and drop   the text onto the floppy, you are unfortunately  incorrect. While the source video is 18MB, the   text encoding of Toxic, rendered at 160 columns  like I’ve been showing, comes out to just about   45MB. That’s about 30 floppies worth of data. So,  first things first. Let’s drop our resolution down   by half in each dimension. I needed to do this  anyways to match the width of the old 80 column   terminal in Windows, but that step alone gets us  to 11MB, which is now only eight-ish floppies,   but still more than one. To bring this down  further, we’ll need some form of compression.  I want this thing to be as simple as inserting  the floppy and clicking the program, so ZIP   archives aren’t gonna cut it. Instead, I tried  a few different methods for compressing images   with small color palettes, like the classic Run  Length Encoding. The idea is in most drawn images,   you’ll find long horizontal strips of the exact  same color, so, what’s easier to write down? 50   individual blue pixels, or one blue pixel times  50. In our case, since we only have 7 characters,   there are really only 3 bits of significant data  per byte, and we can use the spare 5 bits to   contain the run length for each character. In the  optimal case, one byte can stand for 32 pixels.   In practice, it’s not even close. The  dithering patterns kill the RLE here,   since on average, each run only covers two or  three characters before we need to start another.  That didn’t work, so here’s another approach more  suited for text: Huffman Coding. In this scheme,   we build up a decision tree to decode  bits from the compressed output.   Most importantly though, different characters can  have different encoding lengths and we can exploit   this to give the most common characters the  shortest encodings. It’s a brilliant technique,   but, in the case where letters are used  fairly evenly, the results are only as   good as the 3-bit encoding we started with. One thing I started to play around with was   constructing a Markov matrix of the transitions  between characters as we print out a line.   Basically, what are the probabilities of one  character following another in our image?   As it turns out for each character, almost  always, it’s highly likely that we stay on   that character or go to a character of  similar brightness due to dithering.   Given the previous character we can rank each  choice of what comes next based on its likelihood,   and I ended up encoding the image this way. Now,  instead of one representing brightness level one,   it means, pick the most likely next character.  Two, means pick the second most likely step and   so on. In this encoding, most of the pixels turn  into ones or twos, and since one is way more   common than the other numbers, it encodes  nicely on a Huffman tree as a single bit.   Theoretically, this algorithm can produce an 8x  compression rate, and in practice it gets about   5x. This method gets us all the way down to 2MB,  and the egotist in me would like to point out,   that’s actually better than ZIP compression. 2MB is still too big for a formatted floppy, but   I feel like I’ve worked enough here, so I’m gonna  cut one last corner and switch the playback frame   rate from 30 FPS to 15. I’m sure there are going  to be a bunch of 240FPS pro gamers in the comments   saying otherwise, but honestly, when rendering  to text, I can barely tell the difference, and   it cuts our data in half. With that, the entire  18MB video rendered in text, can be compressed   all the way down to just 1.1MB of data. Now  we just have to get it off of the floppy.  This step just involves a C program to extract  and print our compressed text. Writing it was   mostly straightforward, though getting it to run  well on older hardware required a few fixes. To   improve speed, I needed to cut my program down to  just one printf call per frame. The older terminal   also had a hard time scrolling all the text my  program put out, so now instead of starting a new   line when drawing a frame, I just put the cursor  back at the top left and draw over the last one.  With those beautification steps in place, let’s  see it. Here I have my TC1100 tablet running   Windows XP and the Toxic floppy. All we have to  do is pop it in, and run the program. It takes   a little time to load the program in, but once  it’s done, we see the Toxic video. Pretty slick,   if I do say so myself. There’s no sound, since I  was focusing mostly on the video for this project,   but I’m sure with the right bitrate, and  maybe some more compression on the whole disk,   you could probably squeeze in some audio  data to get the full Toxic experience.  The cool thing is, since this is a 32 bit  executable, it should be able to run on Windows   versions as far back as Windows 95. Testing  in a virtual machine, I can confirm that,   yes it definitely still runs on Windows  95. But, the output does have a hard time   keeping up. Even so, that is still definitely  ASCII Spears on the screen, so I’ll take it.  My original goal with this project was to  be able to fit a reasonable length video,   and the software to play it back all on a single  floppy disk, and that’s exactly what I have in my   hand here. All it took was converting the  video into a stream of ASCII characters,   then figuring out a way to compress all that text.  I compiled it here for Windows, but this video   should be displayable on just about anything with  an 80-column terminal. If anyone wants to make a   port to some really obscure hardware, or make some  improvements to my program, I’d love to see it.   Like always, I’ll have a link in the description  to the GitHub repo with the code necessary to   textify and floppify any video you want, so be  sure to check that out. Anyways, there now exists   a floppy disk port of the Toxic video. I’m not  sure who was asking for this, but you're welcome.
Info
Channel: The Science Elf
Views: 375,814
Rating: undefined out of 5
Keywords:
Id: uGoR3ZYZqjc
Channel Id: undefined
Length: 12min 55sec (775 seconds)
Published: Thu Mar 10 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.