Hey everybody! Our topic for today is
libraries. If you are a programmer you use software libraries all the time, but
you may not think about it. And, many of you probably have never made your own.
Today we're gonna change that and write some of our own libraries in C. A
library is a collection of pieces of software that you bunch together and you
want to distribute—either you put them together in a collection so that
you can reuse them in different programs or maybe you have a favorite data structure—hash table, linked list, queue— whatever. And you want to be able to use
it all over the place. You want to give it to your friends, then maybe that's a
good candidate for something you might want to put in a library. The library
that you use all the time but you probably don't think about is LibC,
otherwise known as the C standard library LibC is home to malloc, calloc,
realloc, free, printf, and all the other favorite functions that you call all the
time but you didn't write and you didn't really think about where they came from.
They're mostly all in Lib C. But, today we're interested in making our
own libraries so let's make a library in C for Linux. So let's start with a header
file. This is the header file that programmers will include when they want
to use your library. Let's add the usual boilerplate stuff, and let's add a
function that I'm going to put in my library, and we're going to need a .c
file. Okay, that's where I'm going to put the function's code. Okay. So, the function
of the day is reverse. It's going to take a string and reverse it in place. It
doesn't copy the string. It's destructive. So, it just reverses the bytes—just in
order—takes the last byte, swaps it with the first byte. It also returns a pointer
to the string, and that's really just for convenience. So, this is the function I'm
going to play around with today. I could really use any function though, and I
could have more than one function in this library. I'm also going to make
another test file that's going to test my library. It's going to call this reverse
function, so that we can see whether or not it actually works. Okay. And, this test
program is just going to print out the first argument and the reverse of the
first argument. So, it just takes that argument I pass to the test program and
it reverses it, and prints it both ways. I've also made a little Makefile to
compile my code, and—first off—I'm compiling my library code into a .o
file. Now you've probably seen .o files before. We usually think of a .o
file as an intermediate step in compilation before you get your final
binary—we usually link together a bunch of .o files, but you can really think of a
.o file as a simple...library...for lack of a better term. I could take that .o
file, copy it into another directory, into another project, or I could send it to a
friend of mine, and they could use it in their projects. So, let's do that. Let's
link our .o file with our test program, and it works. Ok, now where do we
go from here? Well, for one I'm going to add a clean target to my Makefile, so
that I can clear out past compiles. That's just for convenience, and then I'm
going to add another rule to compile my library another way, as a .so file—aka
shared object or a dynamically linked library. If you're on Windows and see a
.dll, file that's we're talking about. Now shared objects or shared libraries are a
little different. They still hold code. But, while the
linker actually put that .o file into my final compiled binary a .so file
is separate. It's designed to be separate, and it's designed to be loaded at
runtime. And, when we build our .so file we need a few options you may not have
seen before. First, -fPIC just means we're going to
generate, or the compiler is going to generate, position independent code.
That's code that can be placed anywhere in memory and still run correctly. And, because at runtime you're going to load this program into memory and we don't
know where the library is going to be put in memory, so position independent code is
important. The other option is "-shared". All that means is I want a shared
library, and we've already talked about what that means. OK, and then I'm going to
add my new shared library to the default "all:" rule and we can compile it. OK. Now, I
want to use my new shared library. So, let's make a new program. It's actually
just my old program but I'm going to compile it differently. The first
difference is that I'm not going to pass my .o file to my compiler. Instead, I'm
going to add a -L option telling the compiler to look in the current
directory for libraries, and then I'm going to add a -l (little L) option to tell it
that I want to link the program with libmycode. Now, this might be a good time
to mention that this -lmycode is shorthand for (-)libmycode. My compiler
is assuming that all libraries are beginning with the letters "lib". Ok. So, libC would just be -lc libmycode is just -lmycode. This is just
telling it I want to link this program with this library, and once we specify
that that linkage is supposed to happen then the compiler can figure out the
rest. OK, so compile that. Good. OK. And, then
I try to run it—not so good. The problem is the program loader is
looking for libraries and it can't find our new library. So, we're going to have to
help it. We can tell the loader where to find our new library by adding it to the
LD_LIBRARY_PATH environment variable. Now this variable tells the loader where to
look for libraries. So, I'm just going to add my directory to the front, and then I can
run my program, and it works. But, what a pain!?! I don't want have to type that in
every time I run my program. So, the other option is, I can install my library to
one of the directories that the program loader automatically searches for
libraries at runtime, like /usr/lib, for example. If I put our new library in one
of these directories then I won't need all that LD_LIBRARY_PATH business. I can just run my program and it will find it. OK, but this still seems
like a little bit of a hassle. Why would I want to use a shared library? The
reason is code size. If I use object dump (objdump) to look at the symbol table, you can
see that the first program assigns an address to my reverse function, but with
the second one—the one that uses the shared library—the address is all zeros
and the section is undefined. That's because it's going to be assigned when
the program runs. And, if we look at the two different programs, you'll notice
that the one that uses the shared library is smaller. Now in this example
it's not a huge difference. It's only about 600 bytes, and that's because the
amount of code in the library is really small, but when you're dealing with large
libraries and large code bases with a lot of code, it can make a big difference and
save you a lot of space. So, think of it this way. On the machine I'm currently
using, LibC takes up about 2 megabytes of space. Now, two megabytes is not that
big of a deal, but keep in mind that every program on this machine is linking
to LibC. So, if I don't use a shared library that means that every program on
my machine is going to be 2 megabytes larger, and it also means that for every
one of those programs that that could be up to two megabytes more that I would
have to load into memory every time I run a
program. So, that could really add up. The other advantage of using a shared
library is, let's say that we find a bug in LibC. We can patch that bug by just
installing a new version of LibC on the machine, and I don't have to patch every
program on the machine that uses LibC. So, that's a huge advantage in terms of
maintenance. But, all those advantages aside, let's say you still don't want to
go the shared route, and you really want that code from your library to be
part of the binary, so if you don't need to worry about whether the shared
library is there—whether it's installed properly. Then once you want is a static
library, and as I mentioned before, .o files you can kind of be thought of as a
static library, but usually when we talk about static libraries—when we're
packaging up static code that's going to be linked statically—the more typical
approach is to use a .a file. A .a file is made with the "ar" command (that
stands for "archive"). So, let's add one more option to our Makefile, and this is
going to compile our code into a static library. Now, I'm going to give it a
different name so we don't confuse the linker. If I didn't have the shared one
in here we could just name it "libmycode.a", but we do have the shared library
in here with the same name (different extension). So, I'm going to use a different name. And, then we can
just use the ar command to make this .a file using the following options: so "r"
means replace—means it's going to replace any existing files that exist in
the archive with the same name. "c" means create and that means we're going to
create the archive if it doesn't already exist. And, "s" means we're gonna generate
an index that's going to be used by the compiler to make sense of this library.
Why "s" is for index? I have no idea. In this example, I'm giving it one .o
file, but if I had a bunch of .o files I could just list them at the end and
then they would all be bundled up in this new static library. So, let's compile
it, and there it is—our beautiful new static library. Let's also add a rule
to our Makefile that compiles our program with the new static library. It's
basically the same as it was with our shared library. The linker just looks for
what kind of library you're using and then if it's a static library it stuffs
all that code into the final binary, and if it's a shared library then it won't.
OK. So, let's add our new static library to the list of things we want to make...
and compile...and there it is. Notice again that the static
version is bigger. The dynamic version is smaller, but the bigger static
version doesn't need the library anymore. All the code is inside of it. So, I could
just throw the library away, at this point, and the static binary is still
going to work just fine. And, if we run it... oops sorry...if if we run it. OK. It works.
And, now you know how to write static and shared libraries in C for Linux. The
process in Mac OS and Windows is going to be a little bit different. You're going to
have some different compilers, different compiler flags, the extensions are going to be
different. You're going to have DLL or .dylib, but the idea is the same. The
concepts are the same. Really, what you're doing here is the same. All of these
libraries are just different ways to fundamentally accomplish the same thing—
which is help you to package up code so that you can reuse it, and you can share
it. And, I hope that helps, because that's all I got for you today. Tune in next
time for my next video when I...well I don't know what it's going to be about, but
I'm sure it will change your life. So, happy coding, and I'll see you later.