Hello, Assembly! Retrocoding the World's Smallest Windows App in x86 ASM

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I'm really surprised a how similar it is to a C program. I was expecting all kind of weird addresses and all of this, but it's actually pretty clear

πŸ‘οΈŽ︎ 11 πŸ‘€οΈŽ︎ u/Sl3dge78 πŸ“…οΈŽ︎ Mar 22 2021 πŸ—«︎ replies

Just an update! I'm down to 1448 bytes now. Sonicmouse pointed out that I wasn't merging my sections which was wasting space!

πŸ‘οΈŽ︎ 4 πŸ‘€οΈŽ︎ u/daveplreddit πŸ“…οΈŽ︎ Mar 23 2021 πŸ—«︎ replies

I just don’t believe he’s typing in real time. It’s way too fast. But it’s still really cool!

πŸ‘οΈŽ︎ 7 πŸ‘€οΈŽ︎ u/s4lt3d πŸ“…οΈŽ︎ Mar 22 2021 πŸ—«︎ replies

Nice video. What resources would you recommend to learn more about assembly?

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/a_false_vacuum πŸ“…οΈŽ︎ Mar 22 2021 πŸ—«︎ replies

Hello world in x86-64 assembly for a saner operating system: https://jameshfisher.com/2018/03/10/linux-assembly-hello-world/

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/holgerschurig πŸ“…οΈŽ︎ Mar 23 2021 πŸ—«︎ replies
Captions
Hey, I'm Dave, Welcome to my Shop! I'm Dave Plummer, a retired operating systems developer from Microsoft going back to the MS-DOS and Windows 95 days, and today we're going on an adventure back in time to Retrocode some x86 assembly language. Come along for a few minutes as we ride the lighting to build the world's smallest fully functional Windows application. I won't be hiding the details behind some fancy IDE: I'll do it live and raw right in a simple text editor, and I'll explain how every detail works. And when it's time to build it, I'll shell out to the command line to do it like it's 1989. And did I mention that I'd be doing it all live, without a net? [Intro] Yes, indeed, it's true. As you can tell by my long, flowing grey beard and colorful robes, I'm one of the old wizards that actually wrote products like MS-DOS and I can still program in real assembly language. That's because I grew up in the 80s and 90s working on video games and operating systems back when ringing the last few drops of speed out of a CPU simply required that you spoke to it in its native tongue. Writing an operating system such as MS-DOS demanded that you work in assembly lest the bloat of a compiled language consume too much of the precious 640K of lower memory available. For size and speed, there's simply no beating assembly language. It's a bit of a lost art, however, as programmers don't use it much these days. Modern processors are fast enough and today's compilers effective enough that the C++ code written for them, if done diligently, will perform almost as well. But there is still always some certain portion of speed and size left to be found by the precision blade that is assembly language. Let me take a moment here to clear up one common point of confusion, and that's the difference between assembly language and machine language. In assembly language the directives represent the fundamental operations that the CPU can perform, and they are exceedingly basic, like add, subtract, and shift. But naturally, the CPU doesn't use names for its instructions, it numbers them. Whereas we represent the addition opcode with the three English letters "A-D-D", the computer simply numbers its instructions and represents the addition opcode with the number 4. That would be a bit cumbersome for humans to work with all day, so assembly language uses English mnemonics instead. It's a direct translation from one to the other and they produce the same code. Because make no mistake, you CAN code in machine language, and it's actually how I started because I didn't even have access to a symbolic assembler like MASM to do the translation for me. As a teen I wrote a small game reminiscent of Galaga, and I did it all in machine language using what's known as a machine language monitor. It allows you to key in the instructions and data manually in hex bytes. I can still do it today from memory, even 40 years later, or at least some of it. For example, without looking it up I still know that A9 00 loads the accumulator with the color code for black and that 8D 21 D0 stores it in the screen color register. I'll likely just always know that stuff. The only catch is that you can't move code around, at least not with the 6502 CPU I was using back then. The addresses are all hard coded once an instruction is in place and so if you wanted to insert a new piece of code, you'd have to place it at the end, jump out to it from the middle, execute it, and then jump back. It was a truly hideous way to work that generated complete create spaghetti code, but it was quite an education. As a rule of thumb, then, if you're dealing with the human readable form like ADD, it's assembly language. If you're dealing with the hex number 04, that's the machine language. It's a direct one to one mapping that can go back and forth without change, however, and the assembly language is just a convenience to make it more readable to humans. In my last Retrocoding episode, I wrote Hello World for Windows in C. It was about the most basic app possible in terms of functionality, with a main window, menu, about dialog, and so on, all written directly to the classic user32 and gdi32 APIs. And yet when I called it "bare metal programming", a few folks balked at the idea because C is technically an intermediate level language and thus isn't as low as assembly. That misses the point that we were writing to the lowest level possible APIs, however, which is what I meant. Still, the point was well taken - as Dr. Feynman once said, there's plenty of room at the bottom, so let's eliminate any remaining fat and see just how small we can make this thing. The C binary was a tad over 100K, but you can't make a direct comparison since it included icon resources. I imagine that if we backed all of the resources and runtimes out of the image, we'd be somewhere around 16K of code. And that's actually pretty tiny. In fact, if you're storing it on a disk formatted with the FAT filesystem, odds are it'll take up a full 32K cluster no matter how small you can make it, so does it even matter at that point? And besides, how small is small, anyway? How small can you go in theory? Well, a single page of memory in Windows is 4K, and that's the smallest amount of memory that a process can occupy, so perhaps that's the ultimate goal. But can you really write a working Windows program that will fit in 4K? A Windows app that would fit into memory on the original Commodore PET? Let's find out right now. One thing we need to decide on is what we mean by a working Windows program. As a definition, I'm going to build an application very much like the Hello Windows app from last episode but without any graphical resources such as icons or menus that take up extra space. We'll have a main window with a caption bar, system menu, close button, and minimize and maximize widgets, and everything should work properly. It should even custom paint its client area and render text and respond to resizing appropriately. That's the basics I expect. While I'm getting the old nano editor ready, I should let you know that I'm now offering the classic Dave's Garage mug for sale online in the channel store. It's the same one as designed by my daughter for my Father's Day present this year and features my son's original logo design, and available in four colors. Visit the channel page and get yours today, and rest assured that I'm still just in this for the subs and likes: any net profits from merchandise sales will go to Autism research. As you likely know, I wrote the original Windows Task manager, and I've also got a few of these super limited edition Task Manager enamel pins. How limited are they? I just had a small batch made for myself, but I've decided to give away a pair of the autographed pins to (a) the subscribed user with the comment that receives the most upvotes on this video without actually soliciting upvotes, and (b) one to a randomly selected subscribed user who makes any useful comment on the video. That way I reward success and yet everyone still has a shot! But you've got to be subscribed to win, so be sure to comment, and right now we'll dive straight on into the editor and start building our app. Are you sitting comfortably? Then we'll begin. [Editor] I'm going to use nano again as my editor of choice today. When it's time to assemble the code into a working binary, I'll shell out and do that on the command line so you can see me do so in all of the nitty gritty detail. Our entire project will be included within a single source file, HelloAssembly.asm. That's it, that's all, no other include files or version files or manifests or other nonsense. Since we'll be omitting icons and other resources, there's no RC file to worry about either. This single .asm file will assemble directly into a working windows application with no further intervention: assembler, linker, program: run it. I love that part. No mystery, every byte has a purpose and every byte makes sense. To get started, we'll now create that single file, call it HelloAssemly.asm, and start coding. Before we can enter any actual code, however, as usual there's some housekeeping to do. To wit, we need to configure the compiler and include the required headers and libs that our project will need in order to build. This is for most of us the least interesting part of any assembly project, but it is an important one, so let me quickly bang that out and get it done. [Go to end of option casemap] Our first lines tell the assembler that we'll be working in the 386-instruction set on a standard flat memory model. In the olden days of segmented 16-bit code you had options like large, small, and even compact and tiny, but once you're 32 bits, it's all just the flat model. The 'casemap' option ensures that our code will be insensitive to case unless referring to a system identifier. [Go to end of .inc includes] Next, we need to include our .inc files which are to assembly language as a .h files are to a C project. They include things like structure definitions and the function prototypes for system calls in User32 and GDI. It's how the assembly and linker know what arguments those functions take, for example. [Go to the end of the .lib includes] In assembly language we can also specify the .lib files that will be ultimately used to link the binary to libraries and DLLs. These would include things like the import descriptor tables, strings, and other binary data that is needed to glue your code to the system APIs. If you want to call CreateWindow in User32, for example, the linker needs to know where in the DLL it's located, how many arguments of what size, and so on. [End of Forward declarations] If I'm not mistaken, MASM is what's known as a two-pass assembler. On the first pass everything is measured up and the assembler figures out how many bytes the instructions will take up and therefore where all the labels will land. It also has to know the signature of any functions that will be jumping to or calling later, so we will provide one for the single function we jump forward to: our WinMain function that will be launched from the main code entry point. [End of Appname] Here we have a couple of numeric constants to be used for the window size as well as two initialized string constants. Just like code, since these do not change, they will go into a read-only segment. Finally, we can start writing some actual code. Our main entry point will be unimaginatively called MainEntry, and the first two things it needs to do will be to grab a copy of the program's instance handle and the text of the command line. [End of Command Line] My plan is to have MainEntry call our WinMain in much the same way as Windows does itself if we were writing in C. There is a certain amount of preparatory C runtime code that helps prepare the startup that we'll need to do on own when coding in assembly. [End of mov CommandLine, eax] There's one significant difference between how I'm going to pass the command line to my own WinMain and how the runtime code normally does it: normally, the program name is removed before you get the command line. If we call GetCommandLine directly like this, however, it still includes the program name as the first argument, so it's a minor difference but you may need to keep it in mind for compatibility. [End of call Exitprocess] There's technically a bug here, or at least a shortcoming. I'm not looking at my process's STARTUPINFO structure to see if the window start mode, such a maximized, minimized, or default, has actually been specified there. I'm just going to ask for the default, which ignores the wishes of the parent process. It's a minor thing, but technically you should check to see if the STARTUPINFO structure specifies the USESHOWINDOW option flag, and if so, then use the SHOWWINDOW setting from there instead. But I'll be content with defaults in all cases. [Go to end of LOCAL blocks] Here is our WinMain entry. It takes the same parameters as we saw in our C version of Hello World: essentially, the instance handle, the command line, and an indication of what manner we should display our main application window in. The first thing our WinMain does is to reserve enough stack space for three important local variables: a WNDCLASS structure, a MSG structure, and a window handle. In an assembly language program you can simply reserve space for your local variables on the stack at the top of any function and they will be around for the lifetime of that function call and that's it. Keep in mind, of course, that simply reserving space doesn't clear or initialize that memory - if you need it clear, you'll have to do that yourself. Remember, this is all pure ASM and there are no helpers or runtimes running around behind you to tidy things up! By and large, if you don't do something yourself, it's not happening. The only help you do get is that the assembler will use the function signature to figure out the right number of bytes to pop back off the base or stack pointer when it returns at the end of it. [End of WNDCLASS setup, mov to hIconSm] As with our C version, the first thing our assembly version of Hello World will do is to register the window class type of the main window that we plan to create. Remember that the Win32 API doesn't know if it's being called by assembly or C - you still fill out the structure, including setting that first DWORD to be the structure size, in exactly the same manner. We specify that we want to be redrawn for any vertical or horizontal changes, what our instance handle is, what the widow background color should be, our title, class name, and so on. Then we select the IDI_APPLICATION icon, which is simply the default system app icon, to be our icon. That avoids us having to create and store any resources of our own inside our application. [End of Call RegisterClassEx] Our next step is to specify that we want the standard arrow cursor and we pass our completed WNDCLASS structure off to RegisterClass. [End of createwindow, mov hwnd, eax] After the window class has been fully registered we can create our main window using that class name. CreateWindow takes a whopping 12 parameters all of which must be pushed onto the stack in the correct order. So in assembly language, how do you pass arguments to a function call? Any way you'd like, as long as the caller and the callee agree. The fastest and easiest way might be to simply pass them in registers, but no matter where you draw the line, at some point you'd run out of registers, and if you're also preserving their old values on the stack, that's a lot of stack and memory work as well. The catch is that the caller and the callee absolutely must agree, because when we use the standard calling convention, for example, everything is passed on the stack. But not only does the receiving function need to know that the arguments are passed in a right to left order, but it also has to pop them back off the stack before it can return to the caller. So, the callee has to know the exact signature of how it was called, because that determines how many bytes would have been pushed onto the stack for it to clean up. Any mismatch here, of course, causes a catastrophic crash of the process or similar. We're going to let the system position our window by passing CW_USEDEFAULT, which basically means "wherever you'd like it". The only little space saving trick I'm using here is to create the window with the visible style on right from the get-go, rather than creating and then separately showing the window. It just saves a step and an API call. After calling CreateWindow we check the value of the window handle that comes back. If for any reason it's null, we know our window creation failed and we exit the application. In the normal success case, we next call UpdateWindow to force our first paint and then continue on to start pumping messages in our MessageLoop. [Call UpdateWindow] As soon as our window is successfully created, we call UpdateWindow. That API will directly calls our window procedure with a paint message, bypassing anything else in the queue. [End of je DoneMessages] Our message loop is very straightforward - it simply calls GetMessage until that function returns 0. That when and how it knows that it's time to exit the program. [End of jmp MessageLoop] As long as messages continue to come into our message loop, we translate and dispatch them. Because we don't have resources, we therefore do not have an accelerator table and that's why you might notice that I'm not calling TranslateAccelerator and so on. [End of WinMain endp] Here's the end of our WinMain procedure, and as you can see when it's run out of messages to handle it will ultimately return the WPARAM result of the last message successfully processed, which is nominally WM_QUIT. So, your program could return a value all the way back out to the caller on the command line simply by setting the WPARAM of WM_QUIT. [End of LOCALS for WndProc] Our Window Procedure is what really defines how our application will behave, because it's what in turn defines how each window message is handled. There are really only two messages we care about for our very simple application: the WM_PAINT that we will be sent whenever it's time to paint the window and the WM_DESTROY message that signals it's time to exit. [End of NotWMDestroy] Handling WM_DESTROY is quite straightforward. As soon as we see that message, we simply call PostQuitMessage with a zero argument and our application's message queue will shut down, because inside our message loop, our GetMessage call will then return 0 and that's our queue to exit and return. [call SetBkMode] This is the case handler for our WM_PAINT messages. You'll notice you can't just push the address of a stack structure like the PAINTSTRUCT. That's why we use lea, or load effective address, to get the structure's address into a register, eax, and then push that. We save away the device context handle that comes back from BeginPaint before we set the background mode for it to be TRANSPARENT, which will affect how our text draws. With it set to transparent, we won't get that white box background around our text. [call DrawText] Our painting amounts to simply centering the text Hello, Windows in our main client area. To that, we push the address of our temporary RECT structure along with the window handle and call GetClientRect, which fills out the structure right in place for us. To actually render the text, we load up our option flags first. We'll indicate that we want a single line of text centered both horizontally and vertically. We'll push the address of the text and the HDC itself before calling DrawText, which as you can guess, does the actual drawing of the text. You might notice here I'm adding the flags together whereas in a previous example I used the bitwise OR operator. Either works, so I figured I'd show you both! [NotWMPaint] To finish up our rendering we simply pass our PAINTSTRUCT and window handle off to the EndPaint call. We then return zero because we need no further processing on this message. You'll notice I'm creating a zero in the EAX register by XOR'ing whatever is in there now with itself, which by definition will clear the register. It's actually just a tad more efficient than directly loading an immediate zero in from memory. [END MainEntry] And finally, for any messages other than the two we specifically handle, we simply pass them off to DefWindowProc for the system to do its default processing on. We then end the WndProc procedure and the MainEntry block, which completes our entire program! We can now shell out and build it and ideally, run it and test it. [Shell] To build a working binary on the command line, we can do it all with MASM. The assembler itself is called ML.exe, and the only flag we need to specify is to let it know what kind of header we want on our output binary - in this case, we want COFF format. We simply compile our assembly by giving the assembler the /coff switch and the name of our asm file, and the rest is automatic. Out pops an application. Did you know why the first two bytes of every Windows PE program in the world are and always have been the letters MZ? Just like the first two bytes of every MS-DOS program were MZ? It's because Mark Zbikowski said so, that's why! [Build the application] It's important to know that your program is not linked with any startup or runtime code. No stubs, no loaders. That's why we had to take some manual steps like requesting the instance handle and command line. Normally, the C compiler's runtime would set those things up for you, but with assembly language, there is no runtime. It's just you and the CPU. Our first attempt yields a binary that is 4096 bytes - precisely the tiny target we were aiming for! It further turns out that most of that space is not actually our own code - it is the important tables and string constants you pick up by linking to user, gdi, and kernel, the three DLLs that I rely on. That made me think those tables might be fairly compressible, so I ran the UPX packer on my binary, with brings it down to a total of 3072 bytes. I'm pretty satisfied with that, but can anyone go smaller while preserving the functionality? There were a number of optimizations that I didn't take, such as tail call elimination, smaller strings, eliminating some error checks, and so on. To me, anything under 4K smells like victory. But I'd be curious to see if anyone can go smaller that 3072. [Run the app] If we run the app, we find that it indeed works perfectly. It paints our greeting dead center in the main client area, it does it transparently over the grey background, and it repaints properly when we resize the window in either dimension. If we click on the close widget or select Close from the system menu, the application shuts down just as it was designed to do. And that's that! A complete working Windows application in 3K. Is it the world's smallest windows application? I believe it is, and unless and until someone shows me a working demo that is less than 3072 bytes, I stand by it! Notify Steve Gibson that there's a new king in town, and bring me his crown and scepter! I hope you've enjoyed this episode of Retrocoding in X86 assembly for Windows. If you did, please be certain to leave me a thumbs up and to make sure you're subscribed to the channel. I'm not certain how these programming adventures are going to be received, but if I see a bunch of new subscriptions then I know I'm going in the right direction. I'll then make more like it, and if you turn on the bell icon, you'll even be notified of them when I do. It's a win-win. Besides mugs I'm not selling anything and I don't have any Patreons, I'm just in this for the subs and likes, so I'd sure be appreciative if you left me one of each before you left! That's all the time I have today, so in the meantime and in between time, I hope to see you next time, right here in Dave's Garage.
Info
Channel: Dave's Garage
Views: 159,540
Rating: 4.9771929 out of 5
Keywords: x86, windows programming, assembly language, visual studio code c++, task manager, reverse engineering, learn to code, visual studio, software (industry), visual studio code, visual studio 2019, reverse engineering tutorial, reverse engineering software, ethical hacking, 80386, davepl, Original author, machine code tutorial, machine code instructions, machine code programming, machine code explained, assembly language for beginners, assembly language tutorial, machine code
Id: b0zxIfJJLAY
Channel Id: undefined
Length: 29min 38sec (1778 seconds)
Published: Mon Mar 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.