Hey, I'm Dave, Welcome to my Shop! I'm Dave Plummer, a retired operating systems
developer from Microsoft going back to the MS-DOS and Windows 95 days, and today we're
going on an adventure back in time to Retrocode some x86 assembly language. Come along for a few minutes as we ride the
lighting to build the world's smallest fully functional Windows application. I won't be hiding the details behind some
fancy IDE: I'll do it live and raw right in a simple text editor, and I'll explain how
every detail works. And when it's time to build it, I'll shell
out to the command line to do it like it's 1989. And did I mention that I'd be doing it all
live, without a net? [Intro]
Yes, indeed, it's true. As you can tell by my long, flowing grey beard
and colorful robes, I'm one of the old wizards that actually wrote products like MS-DOS and
I can still program in real assembly language. That's because I grew up in the 80s and 90s
working on video games and operating systems back when ringing the last few drops of speed
out of a CPU simply required that you spoke to it in its native tongue. Writing an operating system such as MS-DOS
demanded that you work in assembly lest the bloat of a compiled language consume too much
of the precious 640K of lower memory available. For size and speed, there's simply no beating
assembly language. It's a bit of a lost art, however, as programmers
don't use it much these days. Modern processors are fast enough and today's
compilers effective enough that the C++ code written for them, if done diligently, will
perform almost as well. But there is still always some certain portion
of speed and size left to be found by the precision blade that is assembly language. Let me take a moment here to clear up one
common point of confusion, and that's the difference between assembly language and machine
language. In assembly language the directives represent
the fundamental operations that the CPU can perform, and they are exceedingly basic, like
add, subtract, and shift. But naturally, the CPU doesn't use names for
its instructions, it numbers them. Whereas we represent the addition opcode with
the three English letters "A-D-D", the computer simply numbers its instructions and represents
the addition opcode with the number 4. That would be a bit cumbersome for humans
to work with all day, so assembly language uses English mnemonics instead. It's a direct translation from one to the
other and they produce the same code. Because make no mistake, you CAN code in machine
language, and it's actually how I started because I didn't even have access to a symbolic
assembler like MASM to do the translation for me. As a teen I wrote a small game reminiscent
of Galaga, and I did it all in machine language using what's known as a machine language monitor. It allows you to key in the instructions and
data manually in hex bytes. I can still do it today from memory, even
40 years later, or at least some of it. For example, without looking it up I still
know that A9 00 loads the accumulator with the color code for black and that 8D 21 D0
stores it in the screen color register. I'll likely just always know that stuff. The only catch is that you can't move code
around, at least not with the 6502 CPU I was using back then. The addresses are all hard coded once an instruction
is in place and so if you wanted to insert a new piece of code, you'd have to place it
at the end, jump out to it from the middle, execute it, and then jump back. It was a truly hideous way to work that generated
complete create spaghetti code, but it was quite an education. As a rule of thumb, then, if you're dealing
with the human readable form like ADD, it's assembly language. If you're dealing with the hex number 04,
that's the machine language. It's a direct one to one mapping that can
go back and forth without change, however, and the assembly language is just a convenience
to make it more readable to humans. In my last Retrocoding episode, I wrote Hello
World for Windows in C. It was about the most basic app possible in terms of functionality,
with a main window, menu, about dialog, and so on, all written directly to the classic
user32 and gdi32 APIs. And yet when I called it "bare metal programming",
a few folks balked at the idea because C is technically an intermediate level language
and thus isn't as low as assembly. That misses the point that we were writing
to the lowest level possible APIs, however, which is what I meant. Still, the point was well taken - as Dr. Feynman
once said, there's plenty of room at the bottom, so let's eliminate any remaining fat and see
just how small we can make this thing. The C binary was a tad over 100K, but you
can't make a direct comparison since it included icon resources. I imagine that if we backed all of the resources
and runtimes out of the image, we'd be somewhere around 16K of code. And that's actually pretty tiny. In fact, if you're storing it on a disk formatted
with the FAT filesystem, odds are it'll take up a full 32K cluster no matter how small
you can make it, so does it even matter at that point? And besides, how small is small, anyway? How small can you go in theory? Well, a single page of memory in Windows is
4K, and that's the smallest amount of memory that a process can occupy, so perhaps that's
the ultimate goal. But can you really write a working Windows
program that will fit in 4K? A Windows app that would fit into memory on
the original Commodore PET? Let's find out right now. One thing we need to decide on is what we
mean by a working Windows program. As a definition, I'm going to build an application
very much like the Hello Windows app from last episode but without any graphical resources
such as icons or menus that take up extra space. We'll have a main window with a caption bar,
system menu, close button, and minimize and maximize widgets, and everything should work
properly. It should even custom paint its client area
and render text and respond to resizing appropriately. That's the basics I expect. While I'm getting the old nano editor ready,
I should let you know that I'm now offering the classic Dave's Garage mug for sale online
in the channel store. It's the same one as designed by my daughter
for my Father's Day present this year and features my son's original logo design, and
available in four colors. Visit the channel page and get yours today,
and rest assured that I'm still just in this for the subs and likes: any net profits from
merchandise sales will go to Autism research. As you likely know, I wrote the original Windows
Task manager, and I've also got a few of these super limited edition Task Manager enamel
pins. How limited are they? I just had a small batch made for myself,
but I've decided to give away a pair of the autographed pins to (a) the subscribed user
with the comment that receives the most upvotes on this video without actually soliciting
upvotes, and (b) one to a randomly selected subscribed user who makes any useful comment
on the video. That way I reward success and yet everyone
still has a shot! But you've got to be subscribed to win, so
be sure to comment, and right now we'll dive straight on into the editor and start building
our app. Are you sitting comfortably? Then we'll begin. [Editor]
I'm going to use nano again as my editor of choice today. When it's time to assemble the code into a
working binary, I'll shell out and do that on the command line so you can see me do so
in all of the nitty gritty detail. Our entire project will be included within
a single source file, HelloAssembly.asm. That's it, that's all, no other include files
or version files or manifests or other nonsense. Since we'll be omitting icons and other resources,
there's no RC file to worry about either. This single .asm file will assemble directly
into a working windows application with no further intervention: assembler, linker, program:
run it. I love that part. No mystery, every byte has a purpose and every
byte makes sense. To get started, we'll now create that single
file, call it HelloAssemly.asm, and start coding. Before we can enter any actual code, however,
as usual there's some housekeeping to do. To wit, we need to configure the compiler
and include the required headers and libs that our project will need in order to build. This is for most of us the least interesting
part of any assembly project, but it is an important one, so let me quickly bang that
out and get it done. [Go to end of option casemap]
Our first lines tell the assembler that we'll be working in the 386-instruction set on a
standard flat memory model. In the olden days of segmented 16-bit code
you had options like large, small, and even compact and tiny, but once you're 32 bits,
it's all just the flat model. The 'casemap' option ensures that our code
will be insensitive to case unless referring to a system identifier. [Go to end of .inc includes]
Next, we need to include our .inc files which are to assembly language as a .h files are
to a C project. They include things like structure definitions
and the function prototypes for system calls in User32 and GDI. It's how the assembly and linker know what
arguments those functions take, for example. [Go to the end of the .lib includes]
In assembly language we can also specify the .lib files that will be ultimately used to
link the binary to libraries and DLLs. These would include things like the import
descriptor tables, strings, and other binary data that is needed to glue your code to the
system APIs. If you want to call CreateWindow in User32,
for example, the linker needs to know where in the DLL it's located, how many arguments
of what size, and so on. [End of Forward declarations]
If I'm not mistaken, MASM is what's known as a two-pass assembler. On the first pass everything is measured up
and the assembler figures out how many bytes the instructions will take up and therefore
where all the labels will land. It also has to know the signature of any functions
that will be jumping to or calling later, so we will provide one for the single function
we jump forward to: our WinMain function that will be launched from the main code entry
point. [End of Appname]
Here we have a couple of numeric constants to be used for the window size as well as
two initialized string constants. Just like code, since these do not change,
they will go into a read-only segment. Finally, we can start writing some actual
code. Our main entry point will be unimaginatively
called MainEntry, and the first two things it needs to do will be to grab a copy of the
program's instance handle and the text of the command line. [End of Command Line]
My plan is to have MainEntry call our WinMain in much the same way as Windows does itself
if we were writing in C. There is a certain amount of preparatory C
runtime code that helps prepare the startup that we'll need to do on own when coding in assembly. [End of mov CommandLine, eax]
There's one significant difference between how I'm going to pass the command line to
my own WinMain and how the runtime code normally does it: normally, the program name is removed
before you get the command line. If we call GetCommandLine directly like this,
however, it still includes the program name as the first argument, so it's a minor difference
but you may need to keep it in mind for compatibility. [End of call Exitprocess]
There's technically a bug here, or at least a shortcoming. I'm not looking at my process's STARTUPINFO
structure to see if the window start mode, such a maximized, minimized, or default, has
actually been specified there. I'm just going to ask for the default, which
ignores the wishes of the parent process. It's a minor thing, but technically you should
check to see if the STARTUPINFO structure specifies the USESHOWINDOW option flag, and
if so, then use the SHOWWINDOW setting from there instead. But I'll be content with defaults in all cases. [Go to end of LOCAL blocks]
Here is our WinMain entry. It takes the same parameters as we saw in
our C version of Hello World: essentially, the instance handle, the command line, and
an indication of what manner we should display our main application window in. The first thing our WinMain does is to reserve
enough stack space for three important local variables: a WNDCLASS structure, a MSG structure,
and a window handle. In an assembly language program you can simply
reserve space for your local variables on the stack at the top of any function and they
will be around for the lifetime of that function call and that's it. Keep in mind, of course, that simply reserving
space doesn't clear or initialize that memory - if you need it clear, you'll have to do
that yourself. Remember, this is all pure ASM and there are
no helpers or runtimes running around behind you to tidy things up! By and large, if you don't do something yourself,
it's not happening. The only help you do get is that the assembler
will use the function signature to figure out the right number of bytes to pop back
off the base or stack pointer when it returns at the end of it. [End of WNDCLASS setup, mov to hIconSm]
As with our C version, the first thing our assembly version of Hello World will do is
to register the window class type of the main window that we plan to create. Remember that the Win32 API doesn't know if
it's being called by assembly or C - you still fill out the structure, including setting
that first DWORD to be the structure size, in exactly the same manner. We specify that we want to be redrawn for
any vertical or horizontal changes, what our instance handle is, what the widow background
color should be, our title, class name, and so on. Then we select the IDI_APPLICATION icon, which
is simply the default system app icon, to be our icon. That avoids us having to create and store
any resources of our own inside our application. [End of Call RegisterClassEx]
Our next step is to specify that we want the standard arrow cursor and we pass our completed
WNDCLASS structure off to RegisterClass. [End of createwindow, mov hwnd, eax]
After the window class has been fully registered we can create our main window using that class
name. CreateWindow takes a whopping 12 parameters
all of which must be pushed onto the stack in the correct order. So in assembly language, how do you pass arguments
to a function call? Any way you'd like, as long as the caller
and the callee agree. The fastest and easiest way might be to simply
pass them in registers, but no matter where you draw the line, at some point you'd run
out of registers, and if you're also preserving their old values on the stack, that's a lot
of stack and memory work as well. The catch is that the caller and the callee
absolutely must agree, because when we use the standard calling convention, for example,
everything is passed on the stack. But not only does the receiving function need
to know that the arguments are passed in a right to left order, but it also has to pop
them back off the stack before it can return to the caller. So, the callee has to know the exact signature
of how it was called, because that determines how many bytes would have been pushed onto
the stack for it to clean up. Any mismatch here, of course, causes a catastrophic
crash of the process or similar. We're going to let the system position our
window by passing CW_USEDEFAULT, which basically means "wherever you'd like it". The only little space saving trick I'm using
here is to create the window with the visible style on right from the get-go, rather than
creating and then separately showing the window. It just saves a step and an API call. After calling CreateWindow we check the value
of the window handle that comes back. If for any reason it's null, we know our window
creation failed and we exit the application. In the normal success case, we next call UpdateWindow
to force our first paint and then continue on to start pumping messages in our MessageLoop. [Call UpdateWindow]
As soon as our window is successfully created, we call UpdateWindow. That API will directly calls our window procedure
with a paint message, bypassing anything else in the queue. [End of je DoneMessages]
Our message loop is very straightforward - it simply calls GetMessage until that function
returns 0. That when and how it knows that it's time
to exit the program. [End of jmp MessageLoop]
As long as messages continue to come into our message loop, we translate and dispatch
them. Because we don't have resources, we therefore
do not have an accelerator table and that's why you might notice that I'm not calling
TranslateAccelerator and so on. [End of WinMain endp]
Here's the end of our WinMain procedure, and as you can see when it's run out of messages
to handle it will ultimately return the WPARAM result of the last message successfully processed,
which is nominally WM_QUIT. So, your program could return a value all
the way back out to the caller on the command line simply by setting the WPARAM of WM_QUIT. [End of LOCALS for WndProc]
Our Window Procedure is what really defines how our application will behave, because it's
what in turn defines how each window message is handled. There are really only two messages we care
about for our very simple application: the WM_PAINT that we will be sent whenever it's
time to paint the window and the WM_DESTROY message that signals it's time to exit. [End of NotWMDestroy]
Handling WM_DESTROY is quite straightforward. As soon as we see that message, we simply
call PostQuitMessage with a zero argument and our application's message queue will shut
down, because inside our message loop, our GetMessage call will then return 0 and that's
our queue to exit and return. [call SetBkMode]
This is the case handler for our WM_PAINT messages. You'll notice you can't just push the address
of a stack structure like the PAINTSTRUCT. That's why we use lea, or load effective address,
to get the structure's address into a register, eax, and then push that. We save away the device context handle that
comes back from BeginPaint before we set the background mode for it to be TRANSPARENT,
which will affect how our text draws. With it set to transparent, we won't get that
white box background around our text. [call DrawText]
Our painting amounts to simply centering the text Hello, Windows in our main client area. To that, we push the address of our temporary
RECT structure along with the window handle and call GetClientRect, which fills out the
structure right in place for us. To actually render the text, we load up our
option flags first. We'll indicate that we want a single line
of text centered both horizontally and vertically. We'll push the address of the text and the
HDC itself before calling DrawText, which as you can guess, does the actual drawing
of the text. You might notice here I'm adding the flags
together whereas in a previous example I used the bitwise OR operator. Either works, so I figured I'd show you both! [NotWMPaint]
To finish up our rendering we simply pass our PAINTSTRUCT and window handle off to the
EndPaint call. We then return zero because we need no further
processing on this message. You'll notice I'm creating a zero in the EAX
register by XOR'ing whatever is in there now with itself, which by definition will clear
the register. It's actually just a tad more efficient than
directly loading an immediate zero in from memory. [END MainEntry]
And finally, for any messages other than the two we specifically handle, we simply pass
them off to DefWindowProc for the system to do its default processing on. We then end the WndProc procedure and the
MainEntry block, which completes our entire program! We can now shell out and build it and ideally,
run it and test it. [Shell]
To build a working binary on the command line, we can do it all with MASM. The assembler itself is called ML.exe, and
the only flag we need to specify is to let it know what kind of header we want on our
output binary - in this case, we want COFF format. We simply compile our assembly by giving the
assembler the /coff switch and the name of our asm file, and the rest is automatic. Out pops an application. Did you know why the first two bytes of every
Windows PE program in the world are and always have been the letters MZ? Just like the first two bytes of every MS-DOS
program were MZ? It's because Mark Zbikowski said so, that's
why! [Build the application]
It's important to know that your program is not linked with any startup or runtime code. No stubs, no loaders. That's why we had to take some manual steps
like requesting the instance handle and command line. Normally, the C compiler's runtime would set
those things up for you, but with assembly language, there is no runtime. It's just you and the CPU. Our first attempt yields a binary that is
4096 bytes - precisely the tiny target we were aiming for! It further turns out that most of that space
is not actually our own code - it is the important tables and string constants you pick up by
linking to user, gdi, and kernel, the three DLLs that I rely on. That made me think those tables might be fairly
compressible, so I ran the UPX packer on my binary, with brings it down to a total of
3072 bytes. I'm pretty satisfied with that, but can anyone
go smaller while preserving the functionality? There were a number of optimizations that
I didn't take, such as tail call elimination, smaller strings, eliminating some error checks,
and so on. To me, anything under 4K smells like victory. But I'd be curious to see if anyone can go
smaller that 3072. [Run the app]
If we run the app, we find that it indeed works perfectly. It paints our greeting dead center in the
main client area, it does it transparently over the grey background, and it repaints
properly when we resize the window in either dimension. If we click on the close widget or select
Close from the system menu, the application shuts down just as it was designed to do. And that's that! A complete working Windows application in
3K. Is it the world's smallest windows application? I believe it is, and unless and until someone
shows me a working demo that is less than 3072 bytes, I stand by it! Notify Steve Gibson that there's a new king
in town, and bring me his crown and scepter! I hope you've enjoyed this episode of Retrocoding
in X86 assembly for Windows. If you did, please be certain to leave me
a thumbs up and to make sure you're subscribed to the channel. I'm not certain how these programming adventures
are going to be received, but if I see a bunch of new subscriptions then I know I'm going
in the right direction. I'll then make more like it, and if you turn
on the bell icon, you'll even be notified of them when I do. It's a win-win. Besides mugs I'm not selling anything and
I don't have any Patreons, I'm just in this for the subs and likes, so I'd sure be appreciative
if you left me one of each before you left! That's all the time I have today, so in the
meantime and in between time, I hope to see you next time, right here in Dave's Garage.
I'm really surprised a how similar it is to a C program. I was expecting all kind of weird addresses and all of this, but it's actually pretty clear
Just an update! I'm down to 1448 bytes now. Sonicmouse pointed out that I wasn't merging my sections which was wasting space!
I just donβt believe heβs typing in real time. Itβs way too fast. But itβs still really cool!
Nice video. What resources would you recommend to learn more about assembly?
Hello world in x86-64 assembly for a saner operating system: https://jameshfisher.com/2018/03/10/linux-assembly-hello-world/