- Good morning everybody,
I think we're nearly in. So, I'm gonna start right on time because I have far too
much content to fit in. And we may not have too much
time for questions at the end. Don't be afraid to come
and find me afterwards and ask me questions if you have any. So, my talk this week, this week? This year is The Bits Between The Bits, How We Get To main, and it's one of those topics where as programmers we typically don't think about it too much. I mean, certainly, I'm most
famous for that website, and that website shows you assembly code. You type in a bit of C and you see a bit of assembly and you're like, that's cool, but it doesn't really tell you the why to picture, how everything fits together, how the program actually starts up and how your code
actually starts executing. And there's a lot of
stuff that goes on before you even get to main, and
that's what we're gonna be talking about, and
some of the tools behind the scenes and why some
of the things you may've learned along the way
about don't do this are so. So, first of all, the name,
The Bits Between The Bits. How many people have ever heard
of the band Ozric Tentacles? One, two, ole Brits, alright. So, the bits between the bits was one of their least-inspired albums. For some reason, the name
stuck with me as being a name for like the stuff
between the important things that you don't really think about, and that's the bits between the bits. Now, I happened to Google for this picture to do these slides,
and I found the website progrock or progarchive or something, and it described the album as good, but non-essential, and I just hope that this is not inauspicious for me, but anyway, we're gonna
start with a program. This is about the simplest
program you could write. I don't think there are
many, short of the white space between the parens
and the squiggly bracket things, you couldn't
write a smaller program. What does this compile to? I'm gonna use gcc and, for
the whole of this talk, actually, we're gonna be
talking about Linux, mainly. Just a quick note on
that, I don't know much about Windows nowadays, so
I'm not really qualified to talk about it; I assume
that similar mechanisms to the ones I'm gonna be describing happen inside the Windows linker
in runtime, but I'm not the right person to
talk to you about them. So, I'm gonna use GCC;
this is GCC 8.1, I think, just the one I happen
to have on my laptop, and I'm optimizing for size. What is your idea about
how this program compiles? How big will this program compile down to? Any kind of thing, sorry what was that? (audience member calls) 5K, anyone higher than 5K? - [Audience Member] 13. - 13K, anymore? - [Audience Member] 42. - 42K, 42 bytes, no 42K, yeah. You're onto the right ballpark. The C version here, if I
compile that as a C function, as a C program, sorry, 7976, okay. So, I think we award the prize over there. How much do you think the
C++ program compiles to? (audience members calling) 100K, 100K exactly the same. (audience laughing) Turns out, if you're
not using the features, you don't pay for them, thematic, I think. I was surprised as well,
the binaries are different. There are some things,
there's some padding going on which means that the swapping one is taken up by the other, so, it, anyway. That's where we're
starting, so what is in it, though, because 8K represents almost all of the programmable space
in the first computer I owned, to do nothing, right? What on earth is going on here? So, I'm gonna use some of the tools of the trade to sort of
poke through and work out what is in there, and
I'm gonna use objdump. Does anyone use objdump out there? Yeah, there's a whole, oh my gosh, and you're gonna be telling me. There's more people out there that know what is going on than I am. And I'm gonna say to objdump, hey, can you disassemble that because, presumably, there's code in there, and that's what's taking up all the space. So, the minus lowercase d
in there is the disassemble. The capital c means do the horrible de-mangling so I get to see C++ stuff, and the --no-show-raw-insn means I'm not interested in seeing all the bytecodes. I just would like to have an
idea about what the code is. And this is what we see, and this is, now I have to slide around
a bit, first of all, there's code that I didn't write. There's this _init function. Hmm, interesting. Here is a function called main, and it has xor eax, eax,
retq, now that's what I was expecting to see,
three bytes, right? Xor eax, eax is two, and
retq is one, brilliant. But what the heck is all
the rest of this stuff? _start, pages and pages, and pages, and pages, but, you know,
I didn't write this. But it's actually not that much. Let's work it out. The address of the last
instruction's 40054c, and at the beginning
it is, oops, excuse me, 4003b0, so it looks like
200 hex from beginning to end which is 512 bytes, okay. Well, what the heck's
the rest of it, then? So, let's just dump everything. Let's dump everything
that's in this executable, and now we start to see
what's going on here. There is a header with a load of stuff. There's some section headers
with loads more things. There's program headers; there's strings. There's more sections;
there's all this stuff. So, what is going on here? And that's what we're gonna talk about. We're gonna say what on earth does it take to make something executable and runnable. So you probably know
that on Linus the file format for the executables is called ELF. ELF is the executable and link format. And it comprises a header at the top, which explains, hey, I'm an ELF file, and this is the kind of, these are ways you can find extra data about me, what architecture I
am, that kind of stuff. It has a program header
table which explains to load me as a program, you
need to do these operations. It has chunks of binary data
which constitute sections. We'll talk about those in a second, and then at the end there's a section header table that explains where all those sections are, what their names are, what properties they have. It's not a great picture, by the way, but it was the easy one
to steal from Wikipedia. So, sections, what are sections? They're blocks of code or data in the executable, or maybe not even in the executable, we'll get to that. For example, the code itself is stored in a section with the name .text. Why it's called text is beyond me, but I don't know, for those of you if you've been using UNIX for a while, you've probably at some point tried to overwrite a program that's running and you've seen the error: text file busy. And so, in my mind since
I was an undergraduate, I was like what is this, is it like some kind of tome that the processor is leafing through page by page reading; this is the text from which it reads, but for some reason, code is text; fine. The read-only data is separated out. It's separated out because then it can be loaded somewhere and
marked by the processor as read-only; any attempt to write to it would cause a programmer fault. The data section contains
data, is readable and writable, and then there's bss. And so, there's a lot of strange things. We've already said that
text was a strange name; bss is even stranger, it's
the block started by symbol. Now, I think somewhere along the line from the mainframe
technology that this was developed on, the meaning
behind that has been lost, but bss now to everybody
means zero-initialized data. So, if you've written
anything that's a global scope or explicitly or
otherwise set to zero, it gets put into this bss section which is gonna be cleared for you. The cool thing about
that is that that doesn't have to be stored in your executable. There just has to be
something in the executable that says I need 16K's
worth of zeros, please, over here, and that way
we don't have to store all of that zero-initialized data. I know it would compress really well, but you know, why store
it if you don't need to. So, let's start, no let's
go onto a more interesting than an empty program
program, and in this one we're gonna do something
which you should not do. That is, we're gonna have a global object which has behavior that
runs at construction time. I mean don't have globals, if
you can avoid them, obviously. Although there are definitely
some globals out there. Now who can think of a
global that exists in C++ that's... oh yes, thank
you, yes, std::cout is a global variable, right,
so, presumably, it has a constructor and it has to do something quite interesting during its construction. So, we're gonna look at it from the point of view of our own objects, but again, don't do this; this is not best practice. I am not Jason Turner. (laughs) So, we've got a Foo class; it counts the number of Foos that are in existence. We've probably all done
something like this before to sort of diagnose
problems with leaking objects and stuff like
that, and I'm gonna use a static counter to
count the number of Foos that live, and C++ being what it is, I have to say it up here and then say it again down here to define it. And then, in the same
file on the right hand side, I've got it as a global object. Who would like to say what they
think this program outputs? Undefined, someone's saying over there. Anyone? One numFoos equals one. Zero, alright, what does it print? Well, if you run it, and I'm gonna print run it with O zero here just because the code later on that we're gonna look at, it's gonna
be that much easier. numFoos equals one and it is my belief that this is well-defined behavior because by not specifying
numFoos with a value, it is a global or a statically initialized thingymajig which gets
put into that BSS section. I beg your pardon? (audience member speaks) Right, there's a discussion; I don't have time, unfortunately to go into that bit, but, numFoos is equal to one. Okay, so, somehow before we get to that cout, numFoos has become one. Now there are a number of ways that that could have been achieved, right? The compiler could read through my code, and go, hey, this must be one. There's no reason for it to ever be zero. I'm just gonna write a one in
there, and it's one forever. The compiler could be inserting code at the top of main that
caused the constructor to global ahead of me, to sort of insure that it gets updated,
but which one of those is happening or is
something else happening? I'd love to know because that's the kind of weird spooky thing that happens outside of the domain of the normal program. The way that I went
around finding this out is to bust out the debugger. So, it occurred to me while I was doing this that I didn't
actually really understand this myself, and that I'd agreed
to do a presentation on it. So, what I thought would
be more interesting than me reading up on it
and then just regurgitating it is showing you how I
discovered how this stuff works. So, I have inside this live directory that global example and
I'm gonna run gdb on it. Well, I meant to do GDMSQ,
so let me just list. There is my program; I've wrote it in, and what we can do is,
I haven't even started executing the program
yet but I can ask gdb, hey, what is the value of Foo numFoos. It's zero, okay, so
clearly my first hypothesis of somehow the compiler
reasoned that I could never see the value of
it in one is not true. It's zero when the program starts up. So, let's put a break
point on main and run. So, we're now on the
line that's about to cout the number of Foos and
I'm gonna disassemble it. And for those who don't
speak gdb, or even assembly, that little equals
thingymajig there is the next instruction we're about
to run, and this is the beginning of the main function itself, and all that's happened before where I am right now is some stack manipulation. So, there's nothing up my sleeve. There's no tricky instructions that are about to do anything other than that. What is the value of numFoos here? Clearly, there's no code that's had a chance to look at the number of Foos. So, that's good, Foo, numFoos. One, okay, so somewhere
in-between me loading the program and the first line of main being called, that Foo got incremented. Now we could get clever and do some write break points and all that kind of stuff, but it's much easier just to put a break point on the constructor itself. So, I'm gonna put a break
point on the Foo constructor, and I'm gonna restart the whole thing. So, run. Yes, yes, yes, yes. Okay, so we're on the line
that's incrementing numFoos. Where did we come from? Backtrace. Okay, so there's my code, and then there's some funny looking things
there that I didn't write. So, that's enough of live gdbing. This is what the call stack looked like with nice colors so we can
sort of look through it. So, there's code I wrote, Foo::Foo. And it's attributed to
line six of global.cpp. Here are two functions I did not write. And, in fact, if I had written, I would be wandering off
into undefined behavior land because they have
two underscores in them and you can't put two underscores anywhere in the name, and I think the other rule is you can't use underscore
and leading underscore and a capital letter afterwards. So, these are symbols that
I'm not allowed to write. But they've both been
attributed to global.cpp, my file at various points,
so that's interesting. The next two are libc functions. Libc, not libc++ or
anything to do with C++. This is the C runtime
starting up, and ultimately, it was called from _start, no main at all. We're nowhere near main at the moment. So, something else is going
on before we get to main. Where on earth do those
functions come from, those funny _static initialization? Well, there's a website
for some of this stuff. Let's go and have a quick look. Sorry. (chuckles) So, we can go and find them over here in the right hand side. Here's static initialization
and destruction, and it is attributed like the thing says. You can't really see all
that easily on there. The line 18 on the far left is showing that somehow this static initialization and destruction function was generated at the same point of the end of the file, or maybe the closing brace of main is actually the end of the file. We can't tell in this example. And then some code down
here, which is calling Foo::Foo globalFoo. So, this is the global constructor Foo is in the middle of that function, and that is attributed to
the actual definition of Foo. So, again, something seems to have happened when I define a global variable, and the compiler emitted
some code somewhere into some magical function
that it's writing for me. And then there's another;
this is this other global sub_I ::numFoos
function which just, all it does is call static
initialization and destruction. Okay, but who calls this function, oh dear, yes, no, I was gonna say, how does it get to my function, that's a bad title for this slide. How does it get to this function? Well, at this point, I've been doing a lot of this stuff on my commute in and out of work and so I have very spotty cell coverage,
but at this point, in my spelunking I discovered that I had decent enough internet to Google for that libc function name, and I was able to find the libc source code, pull it down and have a look. Oh, darn it, sorry,
forgot about that slide. So, if you've ever
looked at the libc code, it is beautiful; it is wonderful. It is a testament to engineering skill, but it is not the most readable code in the world, mainly because it has to support pretty much every platform going, and every architecture going, and a lot of different compilers, too. So, you'll forgive me
for having paraphrased this down to the very small
bit of code that's going on. So, this is that libc_csu_init. It was one of the functions that had been called from the _start
that ultimately called my function through those funny
underscore named functions. And so, I don't know
about anyone else here, I can't read function
pointer syntax nakedly. So, I have to typedef
it or I should've done a using here, I suppose,
but this is a c file. So, what this is doing, for those like me who need to look at it
every time, is defining a type called init_func, and that type is a function, a pointer to
a function, I should say, that returns void, takes
an int and two char**'s. Int and two char stars, stah, chah, sorry, try saying that, two char**'s, that's sort of strange, that sounds a little bit like the signature to main, right? Okay, so that's the init_func. It then also defines
two externally defined symbols, and they've
called __init_array_start, and __init_array_end, and they are defined as being arrays that live somewhere else out in space somewhere, and then, we subtract the end from the beginning. That tells us how many
things are in this apparent array of things, and we call each in turn, and we parse in argc, argv, and envp. Who've forgotten that the envp could be put on the end of main? I had before I looked at this slide here. There's the third
parameter you can use here. Okay, but where do these init_array_start and init_array_end comes from? It's clear that something is going on where an array of function point us to things I would like to run at startup has been constructed, but I'm compiling my one file at the moment, and presumably, every file I compile that has a global would like to have its
functions initialized as well, but how do I build an array of contiguous function pointers when I'm compiling a whole bunch
of things separately. If you try and think about
how to do that in C++, I don't know if you could across multiple translation units combine together something like statically. Jason's now starting
to think and see if he can come up with a solution, obviously. Well, there's a clue; if we go back to the example we looked at before, and I know, probably, this is all noise to you whenever you use compilers probably but these funny things at the top. If we take out the filter which turns off all of the assembly directives, then we get to see all of the other things that the assembler is
told by the compiler. And if we scroll far
enough, actually, let me just zoom down to it, somewhere after all of the code has been emitted, and there is that funny function, the
global initialization function, there's this section, sorry, overloading the word section here. There's this part of the code which has a .section directive
in it and a whole bunch of things, so there's a clue there. Oops, sorry, forgot to close things. Ah, there we go, so it's this bit of code. And so what this is
doing, is it's a directive to the assembler to say,
you're currently assembling code and you're putting it into a bucket called the text bucket, right, we know that the text is where the executable stuff goes, can you now switch over and start putting things
into a different bucket, and that bucket I'm gonna call init_array, section is probably a
better name than bucket 'cause that's what they really are. Align to eight byte
boundaries and then I want to put a quad word, and eight byte pointer to you and me, that
points to the function, that function that we defined, the one that has the initialization in it, and then the .text is
shorthand for .section, go back to putting stuff
into the text bucket. So, we've kind of bifurcated
the program at this point. We've got code that's
going into one block, and we've just put
something into another thing called a section and
we've given that section a .init_array, which
was not one of the ones that I mentioned earlier
when we were talking about .text and bss, and all that stuff. Hmm, interesting, well, in order to work out a little bit more about what's going on, we're gonna have to
talk about the linker. What does a linker do? What do we think of when
we think of the linker? - [Audience Member] Magic. - Magic, that is a good answer, actually. I mean for the longest time the linker always just seemed like that annoying long step that couldn't
really be parallelized to the end of my build and that seemed to take up tons of I/O and all that, but no, it has a lot of work to do, and if you look into what the linker's doing behind the scenes, there's
a deep, deep rabbit hole. So, trivially, the
linker collects together all of your object files that you gave it, resolves references between them. We know that in one file you may refer to something that was defined in another file, so there must be some way of tracking between those two. One thing that I hadn't really thought about until I prepared this talk was that it also is
responsible for determining the layout of the executable, like where the bits go in the actually binary, like does main happen at the beginning of the block of code or is it at the end, or anything like that,
and in conversations with people prior to this
talk, it's also doing something like graph theoretical stuff where it's following
dependency nodes of like this symbol needs this thing which means it needs that thing which means it needs that thing, and stuff
like that, and then it writes the metadata, as well, which says, okay, I've finished with your program, here's the program header, here's all the bits, and this is where I put them. We've got a slightly more
representative program now that's actually in
two files so that we can see how this link process
is gonna come together. It's the hello world,
and on the left hand side we've got the bit that does the printing, and on the right hand
side we've got a pluggable getMessage function which
we've written somewhere else. Main cause greet, greet cause getMessage. getMessage is defined in a different file. We compile them in the obvious way, link them and it does what you'd imagine. No surprises there. What are those files
that we generated though? I kind of glossed over it. Normally we just type
make or we type CMake or we do whatever it is and we know that somewhere there's an object file, and odds a million there's an executable, and the linker kind of
brings them together, and what are those object files? Are they just big assembly
files or something? I don't know. Well, this is where you
bring out the Unix tool file which is like a
great, I have no idea what this is, please tell
me what this is, tool. I don't know if anyone
uses this regularly. It's just one of my
favorite things to run. It's amazing what you'll
discover, like files that you just find on your hard disc. And it turns out that
well hello, no surprises, the ELF executable we
were expecting it to be. But hello.o and message.o
are also ELF files. I guess the L in ELF is linker file, so, executable in link format,
so, not too surprising. And we just saw that the assembler is able to put things into
differently named buckets, and those are sections and so why not use the same format that we store our sections in for our executable for
our intermediate files, too. That also means that
we can run those tools that we were doing
before, objdump and ELF, and ELF dump, ELF whatever,
sorry, I've forgotten now. The other tools which will appear shortly. So, here we're gonna dump the hello.o. So this is the one that has main in it and it calls greet which
calls the getMessage which is defined somewhere else. And we can see here's the greet function. And we know that greet
is calling getMessage, and then it's gonna call
operator chevrony thing, streaming operator to stream out to cout. And so the first call is
going to be to getMessage, but it's apparently it's a call to greet which is weird because this is greet. Is this like some crazy
recursion thing going on? But, more interestingly,
if you look at what address it's calling, it's calling
the next instruction. Why on earth would you
call the next instruction? Well, obviously, we don't
know where getMessage is yet; it's somewhere else;
we have yet to define it. When the compiler was emitting this code, it said I know that somewhere exists getMessage, but I don't know where so it leaves it blank, and you can see that. So, in this instance, I
have shown the op codes. So, e8 is the op code for
call, and then there's an offset here; this is an offset to where that I would like you to call to, and because the compiler and the assembler don't know, they've put zeros there, and then said, okay, linker you figure this out, and you put it here afterwards. And it just so happens that this offset is relative to the next instruction which is why 00 00 actually looks like a call to the next instruction. And it's actually relative, yeah, we'll go into that a second, so, look, there's a whole bunch of these, and interesting as well, even though main here in the same file is calling greet in the same file, it's also just calling itself, and it's letting the linker determine where that is gonna end up, you'd think
that because they're in the same file, it could actually just call between them directly. I know where I put this
so I'm gonna call it. But it doesn't, it chooses not to. There's a subtlety to this because main is actually tagged to be
in a slightly different section name from all the other things. It's in like a .text.init section and that prohibits the compiler from doing some
optimizations that it would otherwise do, unless you're
on the highest settings. And interestingly, as
we'll see in a second, the sections are kind of
like the primitive unit that the linker has to work with. It doesn't know what's
inside each of those sections, so it has to move
them around chunk by chunk. And so, by putting things
into more sections, you're giving the linker more flexibility in where it puts stuff,
but obviously, then it needs to patch the code
to refer to, sorry. I'm gonna go to the next slide and where I actually explain relocations. So, we're gonna talk about patching. Those zero, zero, zeros
that were in the middle of the op code need to
be turned into something by the linker, but we
need to tell the linker what it is we would like to put there. So, here is me dumping it
with relocation information. So, I've used objdump again, and I've used a different flag to say I'd like to show the contents of the relocation section. So there's actually a separately named section inside the object
file that describes all of the things that need to be done in order to link my executable. As it happens, objdump is good enough, nice enough to interleave
these even though they're in separate blocks of the file. So, here we can see that the push and the call are defined here. And then there's this funny thing at apparently address five which says, hey, linker, can you
go and find getMessage, wherever getMessage is
defined, go and find that address, then subtract four from it, and then please, can you poke it in using this particular kind of poke, the R_X86_64_PLT32 at address five. This is the five here. So, address five, of course, is, well, there's four, so this one's five. So, this it saying write 32 bits worth of data that refer to getMessage and put them here in the middle
of this instruction. And similarly, for the
other calls down here, and in fact we didn't notice or I didn't point out that here is
a reference to the cout global object which
also needs to be patched up in a particular way,
but you notice there are different types of patches here. This is a PLT32 which
is a procedural linkage table which we'll get to
if we have time at the end. And this one here is a GOT, global object table, pc-relative things. So, there's some different
things going on here. The takeaways, really,
here is that the linker needs to be told how to find the symbols, which symbols to find, and then when it's found them, where to put that information inside the binary blobs that
represent the assembled code. There are different types of relocations. We saw two of them just there. It's worth noting that those types are dependent on the kind of instruction that it's patching, and the architecture that the linker is working on, so, if patching an ARM instruction's very different from
patching an X86 instruction, and in fact, there may be some things like where constants that
can't be expressed in one instruction in ARM have
to be bust into two parts, and ordered
together and so there are different relocation
types that push the top 16 bits into this bit, and the bottom 16 bits over here, things like that. They're also used, within
the same object file. We saw that between main and greet. So, if the compiler has decided to elect to put things into different sections, it can still refer to
things within the same translation unit, the same object file and let the linker do the hard work of working out how to put things together. So, let's talk about the symbols. I just sort of said,
hey, go find getMessage, but where is this
getMessage bit coming from? How do we know where getMessage is? Well, there's another section in the file that says, this is where I'm defining all of the symbols that
I need, or I provide. So, if we use objdump again, and say, what symbols do you provide, hello.o. Again, hello.o is the main function. We can see that there is a whole bunch of symbols being brought in. This left hand column
here, l means it's local, whereas g means it's global; so, these are local things that are
less interesting to us. Although, you'll notice that the static_initialization_and_destruction and the GLOBAL_sub_I magic functions are listed in there, although
this isn't our global example. Our greet function is here,
and our main function is here. This F means it's defined in this file, and you can see there's a whole bunch of undefined symbols which are like, hey, I need these somehow. Someone else has to provide these for me. So, obviously, if we were to dump a message.o which just contains that getMessage function, we'll see that, yes, it defines
getMessage as a global symbol. So, then, the linker, sort of, reads all of the inputs; you
give it all the obj files. It identifies all of the symbols that each of those object file
provides; it works out, then, which symbols
provide which relocations, and it lays out the file in some yet to be determined way, but you
start with maybe with main. We put main down, and
then we say, okay, main, oh, we need to find this
thing; okay, go find it. Put it here; now patch
the references into it, and so on; so we can
see how that could work. Let's just go through it
in a sort of pictorial way. So message.o here has two
sections, the blue sections. We've got getMessage
and, of course, there's actually some read-only data in there. The actual string,
hello world, is a string constant and that needs,
the bytes for that need to go somewhere, and then hello.o has a greet function and main. And the linker effectively is gonna do pretty much as I
described, collect together those things, output them
in some kind of order, and then we're gonna have
a .text on the way out, and a .ro-data and some program headers that say, hey, this is
where to find the things you need, Mr. Operating
System, or whatever, whatever process loads and runs it. And you'll notice here
that the greet and main are defined in two different sections, but they've ended up in
the same section in hello. So, you might just think to yourself, okay, so the linker has a hard coded set of rules that say collect together everything called this,
and emit them over here. All the sections named
this, bring them, pick them up, put them over here, and then link the things between them,
and do a little relocations, but of course, that's not
the case because linkers can be used for more than
just boring executables. The Linux kernel, for example, is linked with a linker, and so, it doesn't have a regular layout like you might imagine. For those of you that've
worked with embedded systems, you probably
know that there is some sort of magical addresses
that you like, I just need my code to start at
address one, two, three, four because that's where the CPU is gonna start jumping into once they power it on, and so I need to be able to lay out things in a much more structured way, and the way that you do that is
through a linker script. And so here I'm dumping, I'm just running gcc and saying don't do anything, but run the linker with verbose, and, oh, well, for some reason, oop, oh blast, sorry. This is the problem with these things. Linker, yeah, okay, so we're gonna quickly just scrim through this. So, the linker at the top here, if we dump it out in verbose mode, it prints out the linker script which is effectively a programming
language in which you tell the linker what
you would like it to do. You set the output format,
and the architecture. There is an entry; now this is a hint to the linker, hint, it's a directive, sorry, to the linker to say, when you write out that program
header table that tells the operating system what
to do, this symbol is where you should address the start here. So, there will be a field
somewhere in the header that says, this is the first instruction of my program, and it
should be called _start. So that's setting up those metadata. And then the most interesting
part is the sections table. So this sections table is going to explain how to take all of the sections that are coming in from the input and
put them into the output. And so, for example, we'll just pick one at random at the top here, .interp, whatever .interp is, in the output what we're gonna do is we're gonna take every source file, sorry, every object file, and find its interp section, and just basically
concatenate them together. So this is a way of
picking up all the interp sections from all the files, collecting them together, and then putting them in a section in the output called interp. And you can see there's
an awful lot of this. It's pretty complicated, but in the middle of it all, interestingly, we see a reference back to init_array. You remember the global thing we were looking at right at the beginning? This is interesting. So, here it is syntax highlighted. Now we can actually see how
that init_array was populated. So, what it's saying is that in the output I want you to create a
section called .init_array. We're gonna ignore this
top line for a second. We're just gonna look at this KEEP. Inside of that init_array,
we're gonna pick up everything called .init_array
from all of the source files. So here, every global we would've defined would've made it's own
.init_array section, and then this thing
says, pick them all up, and just plunk them
all together, one after another inside a section
called .init_array. Cool, okay, so now we actually get to see how those apparently disparate pointers that were defined in different files get put together into a
single contiguous array. I mean, it gets very
complicated with the C runtime here, also using the same mechanism. Interestingly, up here
you'll see that there's a set of things the .init_array.something which can be sorted by the
linker by some level of priority. Now, I've observed that
the C++ system by default does not use this
technique but if you delved deep inside the
implementations of std::cout and things like that,
it's possible that they're using some extra tagging and
sort of magic annotations to say, no, no, no, please
sort me to the front of the initialization so
that cout is ready early. I know that libc doesn't do this. We were talking about this earlier, but it's possible that
other implementations do. And certainly, if you look
at the c documentation of, say, gcc, there's an
attribute you can give to functions which says
put it into this section with this priority and
this sort key, effectively. The other thing that's defined
here is these PROVIDE_HIDDEN. So, it turns out that the linker script can make up new symbols as its running, which
is pretty cool, right? So, what this is saying is please, create a new symbol called __init_array_start and assign it the value
of dot and dot is a magic thing that says where I am
currently outputting to. So this is the address of where I am right now in my link process. So we've effectively put a marker down called .init_array_start,
then we've gathered together all of the, first of all those things that have a priority,
sorted by their priority, and then the everything else bucket that all of your global constructors will have been put into,
and then at the end there in that last bit, we
provide a .init_array_end. So we now have bounded
everything that we need to run with a start and
end, that's pretty cool. And it's also, you'll
notice, why you should never use global variables
because you don't get to control which
order they get plunked down in, and I've had
all sorts of horrible bouts where we've
inadvertently relied upon that, and it's been like the inode order, and the system that is
actually ultimately decided which way 'round things get initialized. And just don't get
there; don't use globals. This is not a best practices talk. Oh, I forgot that I highlighted it. So now we know how that
global process works. So, just to recap, the compiler makes a static initialization function in every translation unit which calls out to all the constructors for the objects that are global to that particular
translation unit. It puts a pointer to this function into a section called init_array. The linker then gathers
together all of those init_arrays and puts
them one after another, and the script kind of
bookends it by putting a tagged symbol name at the
beginning and the end of that. And then, finally, the C runtime walks through that init_array
and calls each in turn. So now we know how global
constructors work, hmm. Things that are interesting about this is that those linker scripts
aren't just for the compiler. There are some situations
in which you want to write linker scripts yourself. Again, if you're an embedded system, or if you are writing
a kernel or something like that, you might need to control very, very carefully the output
of these, of the order or of the addresses that
things get assembled to. It's also interesting,
and I noted while I was looking at this, that
some dynamic objects, some DSOs that you're linking against or referring to aren't actually DSOs. They are linker scripts, and the linker, when it sees something to link against, will look at it, and
if it looks like a text file, it will interpret it as a linker script, and will follow
the instructions in that. So you can have some actual link time behavior defined in those which can be used for versioning tricks. The linker as well, like I said, has like a graph theory thing in it where it can actually work out
which sections are unused, and then it can throw them away. Oh, I didn't show you, but back in here, the KEEP part, so the
KEEP that it's saying here is a hint to say, even if you think this isn't being used
by anyone else, keep it because, of course,
there's no way it can tell that these things are actually
being used by the C runtime. So, let me just go back to here. So, yeah, you can tell the linker to garbage collect
sections that are unused. It's not on by default,
and I'm not quite sure why. One thing to note here
is that the section, apart from the relocations
that poke into it and change the instructions, the sections are essentially opaque binary blobs to the compiler, sorry, to the linker which means that it
can't discard an unused function inside a section
because it doesn't know that a function exists inside of it. It's only if a single object file has a section for which
there are no references pointing into that the
linker can discard it. So, if you've ever wondered
why your executables maybe contain functions
for which you think well, why has this not been thrown away, it's because the linker
couldn't throw away the section that that
function was defined in. There are flags to the compiler to say, well, if I put every function in its own uniquely named section, and every block of data in its own uniquely named section, I'm giving the linker the ability to have a much more fine grained ability to throw things away, and you can turn those on, but it starts to prevent optimizations between functions that you
would otherwise be able to do. So, these are very advised things. If you really think you can squeeze out a bit of size, it's worth testing this both before and afterwards, and being totally sure that they're the right thing for you, but they exist, and it's interesting to know. Alright, how we doing,
goosh, goosh, goosh? Good, excuse me. So now we get to the thing that I was most interested in working
out, dynamic linking. So the 7K executable we saw earlier doesn't have the whole
of the C runtime in it. I think that's clear, right? We saw bits of it, that libc stuff, but it's not like I saw loads and loads of bits of code that
referred to the operator overload of O stream
and that kind of stuff and that's because the
code isn't in my binary. It's somewhere else, and in fact, this is the level of which we're talking. If I do a dynamic link
of my hello executable, it's just over 8K, so
the 7K was for the empty case, if you remember; this is
just for the hello world one. But if I were to statically link it, it comes in at 2.5MB,
which is quite big, right? I mean there's a lot of
stuff going on in that. There's a lot of C++
runtime stuff in there, and probably for all the reasons I was just describing, the linker can't see that I'm not using bits of it, and throw them away, so, I'm stuck with it all. So, dynamic linking is gonna help me here. So let's just rephrase our hello world and see what it looks like if we use a dynamic link aspect
to it, and obviously, the C runtime is far too complicated for me to delve into right now. So, I'm gonna split my hello program into the main and the other bit which returned the getMessage, and I'm gonna make the getMessage a DLL. Sorry, for all those who are Linux people, I'm sorry, I, DLL is
when I'm in C, and DSO, I know, is the right term for it. But I think you know what I mean. So this is just the
relatively straightforward of linking that as
shared, and then saying, please find getMessage in the libhello.so, and it works, and by works
I mean it doesn't work because DLLs are a pain in the backside. (audience member speaking) Oh, the question was why didn't I provide fPIC and it's because I forgot to put them on the slides, thank you very much. Yes, in order for this to have worked, the code must have been
compiled with position independence which means that it can be moved around a bit;
it has this more latitude and where the link can lay things out. So, I did do that, I just
haven't put it on these slides. Thank you for the comment. Also, I didn't put in here anything to do with like the rpath
catastrophe that you have to do in order to make it actually work meaningfully all the time,
but that's a whole other talk. So, let's have a look at what happened. We linked the hello executable and I'm now saying, well, I did
readelf --help and I looked through it and I went,
oh, these things sound interesting, what is the dynamic section, and what are the program headers here? So, the program headers include this thing called interp which you'll remember we actually saw when we looked earlier at how the sections had been laid out, and there's this
interesting annotation here. Requesting program interpreter,
blah blah blah blah. Hmm, interesting, okay, we'll note that, and we'll come back to it. Some mappings, and then here. There's a section called
the dynamic section, and here is a load of metadata and you can see that libhello.so is
mentioned, so this is where somehow I'm
communicating to the operating system that I need to find libhello.so and load it in before I can be run, and then there's the dreaded rpath. We won't talk about that. So, let's do some more
archeology, and what happens. I have another example
here, and we have this, the hello one, okay, live, hello. So, let's just list that. Ah, there's our function again. I'm gonna put a break point on greet, and I'm gonna run it, and then, so we're about to call getMessage, and we know that getMessage is defined in the DLL, and if I disassemble, we're seeing we're about to call _Z10getMessagev@plt, huh, there's that plt word that we saw earlier. If you'll remember,
that was one of the sort of mystical sets of
letters that were inside the R relocation, and I just
sort of glossed over it, saying it stands for the
procedure linkage table. Well, this is how the
dynamism bit comes in. So, that's not calling
directly to getMessage because we haven't worked
out where getMessage is yet. Linux, by default, is lazy about looking symbols up; we'll say why in a second. So, this call goes to
a thunk or a trampoline or any number of those funny things. So I'm gonna actually break on 40060, and actually I'm gonna do stepi. Okay, so here I am now,
I'm actually in that call. I just stepped over to it calling, and oh, that's interesting,
I stepped too far. Sorry, let me do that again. Live demos. I'm gonna just disassemble it directly. 4006b0. Oh, this worked before. I'm sorry, ha. Oh, that's meant to be break on that. Hurray, continue, disassemble. Are you gonna work now? Yay, okay, phew; I can
use a debug, honest. Alright, so, this is the function. This z10 getMessage via
@plt, and it doesn't look like my getMessage
because I didn't write it. In fact, the plt is a
section that is generated by the linker and every
relocation to a function which is defined in a
DLL, in a DSO (grumbles), is given an entry and all of the calls to those functions that
are defined elsewhere come through the plt
entry instead, and it's a very weird looking thing, right? We've got a jump here,
and then after it, there's a pushq and another jump;
that's really weird. And this jump, this syntax
is an indirect jump. It's saying, hey, look up some other piece of memory, get an address
out of it and then jump to wherever that address
told me where to go. Hmm, right, we'll I'm not gonna debug through that 'cause we haven't got time, but here's one I did earlier. So this is what it looks like. The reference to the
memory address of where we're going is that 601018
that's at the bottom there, and in the moment it's
a value of quad 4006b6. What do we notice about that address? Where are we gonna jump to? We're gonna jump to the next instruction. This is a really, really, really, really complicated way of going
to the next instruction. Weird, right? I mean we saw the call
to the next instruction, and we kind of, well, that's
odd, but then we found it, but this is absurd,
we're calling a function, and we jump off somewhere which comes back to the next instruction which then pushes and goes somewhere else, and now if we were to follow this through, we'd see that would actually happen one more time for very
complicated reasons, but ultimately, what
happens is, that 4006a0 jump at the end goes off
into the dynamic loading subsystem which looks up
what getMessage should point to, and then ultimately
goes to that address. Okay, right, so, presumably
that's an expensive process. I've got to look through
all the symbol tables of all the DLLs that are currently loaded, maybe I even have to load them off disc if they aren't already loaded into memory, and then, I jump to it. That's a really expensive thing. We would not tolerate
it if all of our calls had to go through something
as expensive as this, right? So, this ultimately resolves symbol zero, that's what that pushq is by the way. It's pushing the ordinal
in the text symbol table that it's looking at, but it has to use push because every other register has something important to the function you're suppose to be
calling, right, so it's just kind of having to use the stack as a back channel or a front channel. I don't know. The cool thing is that once it's done that resolution, once it's worked out where getMessage really is, it writes it back into that address at 601018. And so the next time getMessage is called, that first jump goes
directly to getMessage. So that's pretty cool, right? We actually kind of patch a code as we go. Every time we call a
function that's in a DLL, the first time we call
it, there's this expensive process where the lookup happens, and after that, it's
free, cheap, I should say. It's not free, an indirect
call is a little bit more expensive, the
branch predictor will pick that up, but you should
stay and watch Chandler's talk as to why that is
probably gonna not be forever. Okay, so, why does it do it lazily? Why does it not just look these
things up at the beginning? You'd think that you'd
load your executable, and it would just do the
resolution there and then, and it's got the set of
symbols; it's gonna just go through them all and it
should find out where they go. Well, there's a whole
bunch of reasons why. The first thing is the C runtime has a ton of functions and hardly any
of them are called by anyone. If you imagine every
time you type LS to list a directory on Unix, you're
firing up a new executable, running it and coming
back out again, and so, if you were to do the
work of looking up all of the functions that it didn't call, you would slow down the startup time. So, it's like a lazy optimization about how starting up your application should be fast, and then you only pay for the functions you use. Hey, hey, see, only pay for what you use. This is not always what you want, though, for example, if you work in the finance industry and, I mean
you shouldn't be using dynamite libraries
anyway, but if you happen to be, and you wanted
to make sure that every time you called a function, even if it was the first time, it's
quick, you need to insure that that happens ahead of time, and you can force it to happen ahead of time. So you can set an environment variable, LD_BIND now and if it's
set, then that's a hint to the system that it
should just straightaway apply all of the relocations
and fill in that plt with the actual addresses
rather than the one that goes through the resolver. You can also specify it as a flag in the linker which marks a bits that says please do
this, and incidentally, what is the thing that
is doing this, right? I sort of waved my hands and said, oh, the dynamic linking system, and I even deliberately said, oh and
the kernel or whatever. It's not the kernel. The interesting thing here is that that interpreter, that
I glossed over before, is actually what is doing that work. So the kernel's job is to load in the program, read through the program, header which loads in just a few blocks of your program and if
there's no interpreter set, it jumps to the _entry address. But if there's an interpreter set, it also loads in the interpreter, and puts it over here, and then jumps to the interpreter, giving
it the _entry address is like a parameter, this means that now the kernel can go away and say I've done all of the things I needed to do to set up a process to start executing your code, and now I'm out of kernel mode and I don't need to be touching the
kernel to change this. I've given it over to user mode entirely, and as an executable, that interpreter, the user libc4ld thingymajig which is now responsible for starting
the whole DLL process, and it will ensure that
the plt's aren't loaded. It will follow the
dynamic section and make sure that each of these SO's that you need can be found and are
mapped, and then it will provide effect to the service
to which they are jumping. That's why that first, I
pointed out in the slide, this is like the fourth instruction jumped to something which then did
the same dynamic lookup again. It's because the interpreter
itself is effectively mapping, yeah, anyway, magic
happens, magic happens. I haven't thought this
through and I'm thinking it through as I'm here
on stage and I'm thinking this is not a good time to
be thinking up new content. Anyway, so that's what
happens, the interpreter is responsible for doing all
of the clever machinations and reinterpreting the sections that say what dynamic stuff needs to happen. And if you've ever had
problems with, for example, your rpath and you've
wondered, hey, what is happening inside this dynamic system. It's really complicated;
I run my executable, and I've got a stale result;
I'm sure I changed my code. We've all done this, right? I edited my code and
the bug's still there. What did I do wrong? And normally you've either
edited it on the wrong computer or a wrong copy of the code, or you forgot to make or
your make file failed, or maybe you just
recompiled a DLL, and it's in the wrong place and
so you're not actually loading the DLLs you think you're loading, those kinds of things. Now, to debug that, you can run LDD on your executable
which says, hey, this is what I'm gonna resolve these things to. (coughs) Excuse me. Or you can set the
environment variable LDD bug which the interpreter
then uses to sort of print out the studr loads and
loads and loads of things that it's doing and
it's fascinating to turn that on and just look at what the heck is going on behind the scenes. You can do LD_DEBUG=help
and then you get help. Otherwise I typically use LD_DEBUG=ALL, and then trawl through it all
to work out what's going on. And, for example, you can see that lazy loading and symbol resolution thing. I actually did it on
ls and you could pause ls in the debugger and
you could continue it, and step through, and you'd see all these functions it's calling being resolved and being output out; it's quite fun. This leads us to another thing you can do which is, because it's lazily done, I can interpose myself
into the whole proceedings and say, I'd like you to
do something different, actually; you can set
LD_PRELOAD, the environment variable LD_PRELOAD, and
as long as it resolves to itself a shared
object, your shared object that you specify will
be loaded ahead of time and its symbols will be injected right at the front of the
symbol resolution process which means that you can
steal any dynamically referred to symbol
that's in an executable. You don't have to source code
to the executable anymore. Maybe you lost it; maybe you never had it. But you'd be really
interested in instrumenting code to either open or
write, things like that. Well, you write your own open and write. You compile them into a
dynamic library, and you LD_PRELOAD them, and then
you run your executable. So you do LD_PRELOAD =
mySO./executable I'd like to look at. And that allows you to actually interpose and steal those things
away from a pre-built executable which is kinda cool, right? I mean, for example, if you run a website which allows users to
arbitrarily run compilers and do anything, you
probably wanna make sure that they're not opening files that they shouldn't be opening,
you know, hash_including et cetera shadow or something like that. So, you could do this on open and say if the file matches some black list, then return e no pub, which is
what compiler explorer does. Another thing that a
friend of mine told me which is hilarious is that
they had a mathematical analysis tool system, and
he suspected that their mathematicians that
provided the service weren't really up to much programming
wise, and although it was taking many
hours to run simulations that they were getting
that actually probably the problem was that
various transcendental math functions were being called over and over and over again with the same numbers. So he instrumented some
of the hyperbolic sine functions and things
like that and replaced them with something which
just kept a histogram of what input values
they had, and it turned out that, yes, 97 odd percent were calling like sine with the same
thing over and over and over and over again, so he was able to go to them and say,
well, a) fix your code, but b) I can fix it here for you. I can put some rudimentary cache hint to your f sine and now I
make your program run faster. So that's quite cool. It's also used by some networking systems to replace the traditional
networking layers inside the operating
system so that you can do direct access to cards
that provide kernel bypass. There's a load of cool
things you can do with this. If you're interested in
that, talk to me afterwards. Okay, I have very little time left. I really wanted to get
to some of the other cool things that are
much more C plus plusy. So, weak references, you
know that if you define Foo, the function Foo in
two places, and then try and link, you're gonna get
multiple definitions, right? And that's mea culpa, I
shouldn't have defined it in two places, but when
I make an inline function, I'm kind of defining a
function in two places, and sometimes that function
is genuinely inline, like in terms of the
optimization processes inline, but oftentimes
it isn't, which means that if I've used my
getFoo function, my inline getFoo function in translation unit A, and in translation unit B,
there's two copies of it. Why does that not cause an error? And the answer is it gets marked as a weak symbol which is another tag that I didn't have time to show, and the weak symbol, the linker says it's okay to have as many of these as you like, but you get to pick whichever one you like
as the implementation of getFoo which is great provided they are all actually the same, and
again, the linker has no idea. These are just bags of bits
as far as it's concerned. So, it can't look at the implementation of getFoo or whatever and say, oh, this is the same across all
translation units, that's fine. So this is where ODR violations show up. If you've kind of got two implementations that are marked inline
of the same function defined to be slightly
different in two areas, disparate areas of your code base and then the only time
that they actually come together is in the linker,
you're in for a shock. The last thing I would've liked to have talked about would be
link time optimization where the linker actually
starts collaborating a lot, lot more closely with the compiler, and there's a sort of
two-way relationship. The compiler generates
an intermediate form rather than just assembly
that is a bag of bits. It has much more rich
intermediate representation that's then passed to
the linker and then in during the process by
which the linker decides which things are visible
or can be reached, it calls back to the compiler and says, hey, I need the code for this now, and the compiler gets to see the whole world as the linker is laid out and some amazing optimizations can happen there, and actually, some ODR violation checking can happen there that
you would otherwise not be able to catch, which is great. So that's the way I've
encountered some subtle ODR problems before is
by just turning on -flto. So I recommend you all doing that anyway. There's a whole bunch of stuff. So, Ian Lance Taylor is
responsible for writing the gold linker which is the one I've been using most; it's pretty amazing how fast and how much sophistication goes into it because linking is something you prefer not to have to do, just
like compiling, right? So, and it's the only
step that's inevitable in your whole build
process unless nothing's changed at all, so, making the linker fast is a super important thing. For the link time
optimization Honza's blog has got a whole bunch of stuff about how that works behind the scenes. Yeah, I guess that's it, so I'm sure you have questions; I'd just like to point out that Jason Turner,
myself and Charley Bay will probably be getting
together some time next year and doing a training event. So, you're welcome to ask us about that, but I invite your questions. I actually have two minutes left. (audience clapping) Thank you. Hello. - [Audience Member] During
the linking process, or during the layout
process when it's, I guess my question's really about the resolution. You said the kernel has some role it plays by laying everything out; it's starting the process, it lays out parts of memory, and then it turns everything over to the interpreter script, and does it lay out every dynamical object that is there, or does it kind of go, okay, I'm gonna lay out some of them, and
the interpreter can go back and go, hey, I need that thing, can you do that for me kernel? - It's more like the
latter, as I understand it the program headers have
a few slabs that they say, load in executable text
and put it at address, you know, this offset
of my file, put it at this address of memory;
put read-only things over here; put the other bits over here, and then, that's when it will jump to the interpreter and all the dll stuff, then, will happen in the interpreter where it will read a specially named section that's tagged as being this is the name of all the things I need, and that will continue on from there; it will in map those in and mark them
in the appropriate way and then do the link resolution. So, more like the latter. Cool, thank you. Over this side. - [Audience Member] Hey,
could you just clarify? At some point you said you should never actually use dynamically linked libraries. - Ah ha, right, yes, so
dynamic linked libraries, I say if you're writing
highly performant code, you should avoid dynamic linked libraries. They're a barrier to
optimization across units. Essentially the compiler
can't see across them even with all the clever, fancy link time optimization things that I'm starting, personally, to rely upon, and there's this resolution cost, even if you do turn on the LD by now, there's still a jump through the plt to get to your function, so, it's a little bit
of a bump in the way. That's really what I'm saying there. - [Audience Member] Okay,
so if you statically link, you never get the plt... - Correct, if you statically
link, then the linker will just write into the instruction, jump to the actual location
'cause I know where it is. Thank you, over here. - [Audience Member] In Windows, in portal executable format, there is a problem that every executable has an address that they want to be loaded at, and if it doesn't happen, there is basically similar process, but happening at runtime; there should
be like a relocation happening; is it also something that happens in ELF or Linux world? - So, I'm not actually sure about that. I know that there are some preferred addresses that are
marked inside the program header table itself; I don't know if they are required or not, and I'm not sure if there's a problem 'cause you're, I think, specifically with
DLLs in the pae format you would like this is
a DLL that would like to be at like four
million in RAM, and that's where it's, everything's
expecting it to be. Is that what... - [Audience Member] Yes, funnily enough, like in Windows world if your project is built of many, many DLLs, there is an optimization technique that allows you to come up with some algorithm how you assign those addresses based on alphabet, or something. - I see, so, I don't
think that's happening anymore and mainly
because of address space layout randomization
which means that every symbol wants to be shoved into some random place just to try and make it harder for the bad guys to get in, so, I'm pretty sure everything now tries to be put into, like, make it as easy as
possible to map it anywhere. Thank you, oh, another one over here? - [Audience Member] Actually, thank you, following on that question; how different will be all this in Windows? - Yes, good question, I
think at the beginning I put my massive disclaimer that it's been about 20 years since I've touched Windows, so I think I will defer and say there are probably experts you can find here. If there's anyone who considers themselves an expert, and clearly I'm not an expert at any of this, as you
saw from me just like discovering process kind of by myself, then I invite you to raise your hand and speak to my friend here. James McNellis, yes, if we can find him, then, yeah, he's a good person to find. Thank you, sorry I can't be more help. Another question there behind you. - [Audience Member]
Could you explain what's the difference between a
fix-up and a relocation? - Ahh, no. (laughs) No, I don't know,
actually, is there anybody who could in the audience who
would know the difference? I would assume that they're synonyms, but these things are subtle. - [Audience Member] It's something like the compiler emits a
fix-up which the linker then emits the relocation
or something like that? - [Audience Member] I'm
not sure how you're using the word fix-up here, but
I do know all about ELF and stuff 'cause... - Right well, I defer
to the right hand side of the room in this instance; so the short answer's I don't know, there, oh, this one is having a conversation, but you've got a comment or question, or
you can help me out here? - [Audience Member] I
was just going to try and answer the question. - Oh, I see, thank you. - [Audience Member] I don't
understand the question, so. - Right, right, I mean there's a ton of interesting terminology
like trampolines and thunks and other things that get used in this scenario. - [Audience Member] In
ARM the linker will emit sort of trampolines to
deal with out of range branches 'cause on ARM
there's a certain distance that a branch that a call
instruction can go to. So, the linker will see
that, oh, I'm trying to go too far, and then it'll emit a lot of code between sections that it can, and then jump to that which'll then do a farther jump to the
target location-y thing. - Ohh, I've often wondered
how that happened. That's clever; I don't
know if people got that, but there's the encoding and format of ARM instructions is such that the branch, everything's a 32 bit instruction in ARM if you're not in some mode, and so there's only a small amount of
space in the instruction encoding to put the
address of where you'd like to branch to which means
you've got like a plus or minus some amount... - [Audience Member] I
think it's two megabytes. - Two megabytes, right, which when I was doing ARM, two megabytes was more memory than I had in my computer,
so that was great. But nowadays, obviously,
two megabytes doesn't get you very far, so
the linker has to know when the destination
of a branch is too far for it to reach, and then it has to put a function whose only job it is is to be like an intermediate post along the way to get to the final destination. Cool, thank you, I learned something. There's some heated debate over there. Have you got any comments
or thoughts there? - [Audience Member] No, it's
fine; good job; thank you. - Thank you very much,
thank you, I'm E&OE. (audience clapping)
This was one of the best talks at cppcon. I'm the one who asked him about never using dynamic linking.
I am a simple man. I see Matt Godbolt, I upvote.
https://github.com/CppCon/CppCon2018/blob/master/Presentations/the_bits_between_the_bits/the_bits_between_the_bits__matt_godbolt__cppcon_2018.pdf
I have a question related to this: What parts of stl (or the language itself) can be guaranteed to not create any global objects with constructors?
In my application the memory is not fully set up when main() starts executing, so any constructors run before that would be problematic to say the least.
I had to figure this out when working on my MIPS emulator. Took a while to figure out how to properly load and execute the binaries.