Steven Rostedt - Learning the Linux Kernel with tracing

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so Steven here is a kernel developer littles kernel developer and we rarely have people like him here in Bulgaria I was really hoping that we would have him for open Fest this year but unfortunately he couldn't stay so the next best thing was to invite him here to Sofia University to show you a little bit of love for the Linux kernel and try to excite you about the Linux kernel so we were discussing what he should be talking about here and this talk was for me it is the best thing that he can show you here and help you understand why you should join me all right okay let's make sure that make sure yep my son okay so let's do it Hey so my custard my mo what I always do is I have this camera now how few guys know what this is it's like an ancient thing so and this is the way you make real selfies smile perfect so I'll upload it so as marina said I'm Steve Ross Ted I'm one of the Linux kernel developers I've been I've first played with Linux in 1996 probably before some of you were born the 98 I started playing around the Linux kernel just had a part of my masters to these uh you actually even did the Linux kernel for my master thesis and then I can't advertise coca-cola then I but we can't taste VMware no that's so then in 2001 I got my first job working actually professionally on the Lynx kernel porting the time since real time Linux kernel to various board support packages so you know different architectures like mips PowerPC arm fun stuff like that and then I became a contractor and then I got hired by Red Hat I'm one of the original developers of the real time patch which makes Linux into a time operator so that's what I got right over here long story that's not why I'm here to talk about I'm basically here guy introduced you to you know there's kernel via tracing or F trace how many people have played with the Lynx kernel okay good Bob so you got a few people good so so some of this I kind of broke it down for people that haven't done much how many people have programmed in see most people okay good so first I'm going to go through the overview of the computer one thing I try to tell people you know when they get into the lower aspects of programming and not just web development or you know application development but actually getting down to the kernel the a computer is nothing more than a Turing machine you've all know what a Turing machine you're taking classes you talk about the Turing machine you have the infinite amount of tape and just state machines that's all anything a Turing machine can do computer can do and vice versa so they're equal and I try to tell people that a computer is extremely simple it just takes a simple command add subtract shift compare it will jump to a different location it will move that tape which is usually what we call memory to something else so you have everyone's used to the green part on top the application that's where you might be programming and you might do something like printf where that talks calls into a library function that's going to do some magic for you or if you're open you know file read file write those application or those operations that you do and the libraries gonna do some work for you so it makes it easier so you don't have to interact directly to the kernel but you can't bypass the library and talk directly to the kernel and you can even bypass the kernel and duck directory directly to the hardware but that's beyond the scope so basically what you're used to a lot of people are used to is talking to the library but I'm not going to talk about that today that's not this talk if you want to use gdb it will show you your interface into the libraries everyone's use debuggers gdb I'm more interested in talking to the kernel like I said I don't want to talk about that that's another that's beyond the scope so everyone's familiar with this program I hope it's the most one of the most famous programs in the world it's my favorite I use it all the time it's useful especially for talks like this and GCC you know the canoe like all its canoe C compiler but I guess it's the compiler I came over the exact acronym it's not that and you compile your hello dot C which is I call the hello dot C now hello world that's C so when you run it it says hello world everyone sees this very obvious how many people have done this oh ok ok a lot of people how many people have not so basically I'm talking for you guys because I'm point focus down to this is what you see when you do ab jump it creates it into a what's called an elf file executable and linkable format it's the way it's written in it's just a binary file written in on disk that when you execute it you know you type in your bash command line or however how it gets executed it goes into the kernel the kernel will read the header file and know that this knows how to parse elf files and it's going to load parts with a file into various segments of memory and then it's going to jump to a location in fact there's the start location that it's going to jump to if you do read elf which is another application you actually let's find I'll tell you where the starting point is but look there's a lot of code before that start and that's added by GCC there's no hello world right there but I go to the next page or oh before I do that there's the start file function there's notice right there it calls into Lib C so start jumps in and then it halts after that so that's your actual program it jumps into Lib C that's going to do even more magical stuff which will call these other functions and then jump to main code then it will look for where main is here's some more this is the next page if you do if you're doing a less and you hit and turn you look at the next page this is what you see a lot of more junk that you don't care about then finally this is the last page it's only three pages world are these three pages and we do a knob dump and up here at the top that's your program that's main by Zhu min this is what you see so this is addresses this is a machine code this is the assembly this is a bureau right assembly code and machine code now a lot of times I talk about machine code and people freaked out I actually have to know machine code sometimes for some of the various things I do in the kernel that's because I do I do some strange things in the kernel that most kernel developers don't do so I'm kind of the oddball among kernel developers okay so I need to look at the machine code I actually have to know things like EA is a call function how you when you call a function it does the opcode e8 that's very important from this and you see it right there I ate it followed by a four byte offset but that's the machine code for this so this is what the machine will actually read heck that's an hex I think I didn't have enough room on my slides to make it into binary you noticed something here put s put s is another call printf and you have put s but we didn't type put s we type printf so why is put s there now your compile with optimization on but GCC will actually auto quickly automate optimize B's will look out and say hey there's just a string format with no parameters there's nothing to parse it actually will know that in the compiling and say hey you just printf a string so it says why call printf this is really a put test all I'm doing is going to write whatever is in that string which was hello world and straight there I'm not going why spend computer cycles you know reading and parsing through the format to find parameters to inject into the code because that's expensive a put s doesn't do logic it just says write out whatever I write so GCC optimized so I want to printf I don't want put s I want to printf so I have to go back to my program and this time I'm going to add a parameter now I'm going to do something not normal instead of just putting on any old parameter and a variable like that's boring if you I could put an int X sign X to something and then call it then GC might even optimize it if it notices that int X is a what's a called if two local variable that doesn't get modified it might actually optimize it put it into the string modify the string and then call put us again so I don't want that I want to do something different I want to see mean what address is this program actually running it so now what what we had was this put s I recompiled it did a knob dump and now I got this I got my printf so you printf is here it puts it in the ri part it moves the first parameter this is a string it's which is the string is the offset of the instruction pointer into RSI the RSI register which you have to remember for this actually RSI is the second parameter so this must be the address of this guy in fact minus B from the are IP from here probably jumps right to there so this is the address of our RP minus B which is 11 yeah which is minus 11 will bring you back 11 bytes to main that's that's the main all right reason why RSI is the second argument we'll get back to that later our di is the first argument which is a string and then it calls printf so let's look at the main here 1 1 3 5 remember that so what do we expect when we go compile it rerun and we run our hello with main we're going to expect 1 1 3 5 right makes sense makes it obvious that's the address this is what I got you know on older kernels I would have gotten 1 1 3 5 but on this kernel I got this crazy mess funniest part is I ran at multiple times I got a different output every single time so this was confusing me by the way I wrote these slides yesterday so I ran it I have to go look why did I get a different thing every different value every single time I did this you could just as long as you have a new coat I ran this on the 419 kernel by the way the 419 are five colonel and I looked at this but one thing I noticed all these guys are the same one three five is the same every single time so I looked into the code and I'm like this is a security feature and I found out if you go and I buy echo zero into a control file into the kernel it's a kind of an optimization that you could have so I echo into /proc / / kernel randomize VI space a VI space virtual address space so basically by the way don't do this this is a security feature it's basically so every time you load a load anything into the curl like execute something it's going to randomly place it in virtual address space because it's a real look elf files are made so they can be executed anywhere that's why you had the offset from the instruction pointer and not just a hard-coded address space for those parameters the string and for the values so when I ran that it still put in some strange number but every time I ran it was exact same thing each time so I'm sure you guys seen these things this is what a page table a page tables look like inside the kernel this is what if you ever do anything inside the kernel you have to be very much aware of this this is how virtual address space gets mapped to physical address space now in physical address space that's the right here I know I didn't I didn't have much room on the slide so I made this 32-bit although this is 64-bit so just pretend this is 64-bit but it just made it easier to put on the slide so I made this 34 phase this is the whole edge X address space now don't think it's memory it's not because address space also includes access to devices or access to something else so it's just basically a space that you could tell the computer or the CPU go look at something at a line and something's mat there so at boot up the bias will actually map things ill map memory to certain locations so it mat 0 may not even be memory so if you do a null pointer that's why you'll never have to worry about I knew something goes zero a lot of times isn't mapped to anything so if you're right there this will actually fail cause a fault so the way we do this is this is a virtual address space and if I split this up which I did this is 5 6 4 so I 5 6 4 I put it down here and I broke it up by fours but I split it in colors because this is 9 bits so I had to break it down into binary so I translated the hex into binary these are 9 bits 9 bits 9 bits 9 bits followed by was that 12 bits so this is an index 9 bits is 512 so the first one is your global page directory which is put into you guys don't need to care about this by the way just let you know I'm just telling you things just to get your so you're aware of things you know this is not a lecture there's no test there's no exam I don't expect you they'll remember any of this we'll learn it it's just sort of get your interests if you want to know more then all you have to do is go and search the web's and do this so I'm just just guys showing an overview so ideally this is the page descriptor table this is the start of your basically your address your virtual address space for your application which is held on x86 in the cr3 register so there's a special register on the CPU that's called cr3 you load something in there the cpus go ahead go and look and look for it's going to be a physical address space so what you well I put in the cr3 register this is actually mapped somewhere here so it's going to jump there and expect this format and this is going to be a table so if I write into a virtual address space it's going to take the very first byte by the way down below it is the hex of the first nine bits and which is AC so the index of AC goes down here and then it reads this physical address space which is somewhere here jumps to it and it's going expect another table so I guess the page upper descriptor so then it does the same thing zero and seven jumps down here looks up you know page mill descriptor come down here jumps up to if I get to your page table entries and then here the last one jumps to actual where the physical space where this will finally map right where you are so that's just overview of physical edges don't really have to worry about this but just know that what you write in virtual address space is not what you need to know for or is not going to map always to the same spot in physical address space in fact it may not match - anything and when your right to it it may send a cig a fault to the kernel the kernel will say oh this memory is actually in your swap partition and then it will read the swap partition load it up into real memory fill in the page entry table and say and then go back to the kernel and you just happily move on so that's how swap works so if you ever wondered about your swap partitions that's just mapped out and all you do is fill removed entries here or put a flag there saying this guy doesn't exist and it's got a fault and the kernel is going to say ok I'll just memory of open and it could put it someplace else so you can move these things around anywhere you want as long as it's mapped properly correctly here so if you have two applications the same virtual address space could be mapped to do two different locations so this same address space if you have application a p-- application b application a this points here in application B that same address points here so there's two different locations same virtual address space so in agile space usually again I'm still doing 32-bit because it's easier I don't want to spread there it's like so you have zero to some number is your user space anything higher than that will be your kernel space so your application is your maze when we mapped an user but the question is what's in kernel and that's what we want to find out so user space and kernel space now there's a special flag set so what we call in x86 world there's like ring 0 ring run ring to bring three I guess ring 3 is like is 1 and 2 is not used 3 is user space 0 is kernel space and it's a mode of it's just a bit that basically is in the CPU and when you're not in ring 0 you don't have access to anything in the page tables that say this is ring 0 only so if you try to access that it will kill your process if you try to access anything in these ring 0 kernels although with Spector and meltdown you can get around that but that's another lecture anyway so if you have user space you have your applications and yes I know I put word a word and outlook I it was the first thing to come up on the Google search for images and I didn't have much time I wrote this yesterday so each one has its own virtual address space mats by page tables and they don't have access to the kernel and then say you have something they share a library like Lib sees you know dynamically object library so what we do is those are that one file is mapped somewhere does have to be the same place somewhere in the virtual address space of each of these applications and when these applications need to do something special like read a file read networking read or send a packet show something on the screen I'll take it interrupts or be able to read your keyboard they need to talk to the kernel because all that is done by the kernel the kernel is the service provider for anything that the applications want and they do that through system calls so everyone here should hopefully be familiar with system calls systems causes the way to access into the kernel to figure out or to be able to get something accomplished that normally you can't accomplish because so you have to access something else so you open a file over e networks files and it's an application it's a application programming interface API so that it doesn't change whenever you've how many people have seen or heard about leanest revolves yelling at someone he's actually a really nice guy now no but one of the things easily yell at was when people break applications you don't break user space you modify one of these things of the system calls you will break user space although the whole thing is if you break user space and no one notices did you break it and the answer's no Leena's actually said that is as well as no one complains you could change user space so you get some class have changed and no one noticed and they never broke so how many familiar with s trace okay okay I should probably say how many people not familiar a vest race you know they're too lazy to open beer or raise your hand or you don't want to embarrassed or some like that anyway s tres is the system called tracer it's a very very useful tool it's very slow because it uses P trace P trace is what is a utility in the kernel to be able to hook into other applications you become its parent and you could monitor it you could step you could stop it you can step through it gdb your debugger uses P trace to stop an application - you could do everything you do in gdb where you could read memory change memory it have something move step by step or run to a certain point and stop that's all handled by P trace it's a horrible interface don't ever use it we're trying to find other ways to remove P trace in fact we're trying to work with perf - in fact if you go to a Linux plumbers in a couple weeks is going to be having our session on trying to get s trace to use the perf infrastructure so that it doesn't need to use P trace and it could be much much faster although they said the percentage of speed that P trays or the s trace has slowed down normal not to use the s trace has actually gotten much smaller the percentage is actually compared to the way it runs applications run normally and compared to when running on s trace is actually much closer and s trace never changed what happened was the fixes for meltdown and Spector slowed down the interface to the kernel so much but did it affect s trace so it just hey it's we went from being 30% slower to only 8% slower not realizing that the whole thing just went a lot slower do so Iran s trace on this this actually is a full screen this is actually every single system call called by hello world that's that hello world does a lot doesn't it there's a lot of things going on when you ruin hello world and at the beginning it here's the exact bigot BRK is where it creates the memory address space for you and then it's going to check the dynamic linker to load things this is a library and these things don't even exist so it gives you a nope oh we found something let's go in check and check the status of it we're going memory map it okay close it XS the another linker dynamic linker stuff oh that doesn't exist oh do some other things and memory map a whole bunch of stuff over here close it arc PR control L is basically I have no idea what that does I looked at it at the code it's like a little thing it's just some info I'm like what the heck is this it's useless anyway it protects the memory for security purposes it will do M protect F stat BRK and boom there's our hello world we finally got to something where we actually write hello world you know if I wrote I go write hello world and assembly and probably five lines if I just wrote it with one system call I don't need all that well actually you would get some of the stuff the BRK would actually happen cuz I think that's some of the things that might does just by loading it but I could probably just Dunham's much quicker I did have time to try it so that's s tres what about F tres this is the official tracer of the Linux kernel it's what I developed it's I maintain it this is kind of like what I do I'm constantly making it better and the way to get to it is in it's in here it's probably on any laptop you have or any box you have that runs Linux I'm almost guarantee it's been there since 2008 2009 2009 I think it's been in - just started distribution so it's almost been ten years nine years it's been like everyone has it and it's in the sister kn'l debug tracing actually I moved it to six just kernel tracing but since I can't break user space when your mout debug if s it'll actually mount the trace of s directly right on top and if you want to access this if you ever get into embedded programming and you want to do use busybox if you anyone knows what busybox is it's basically a very very minimal okay if not I would say user space environment so it's very very small it's about as small as hello world that as you can see and you just do you don't have much there's no libraries you know everything's linked as one big blob a memory everything you do LS and all that is just one thing so you could use F trace with only busybox that's why I wrote it because I was an embedded developer way back when and I always have a soft heart for embedded development so I kept it so you could just do it just using echo and cat so if you go to the F trace control directory to get there if you want to mount it it's this command like I don't remember it and this I'll send out that the slides so they're available so we just do the mount the dash T the tracer fast no dev to the SIS carl tracing the sis Karl tracing directory we'll just it's a pseudo directory that the kernel creates for you if F trace is enabled you'll see that directory so you just feel look there to see if F trace is enabled you'll see that directory there in newer kernels that created the tracing directory otherwise it's about to look at us so then I mount it I just do LS here's all the things that you could do you know for tracing I highlighted some of the things now what's really special that Gregg I hope you guys know who Greg Crowe Hartmann is he's the maintainer of stable he was so impressed by F trace because we're the only one that created a file system with a readme the funny thank you thank you the funniest thing about the files I removed this just because it was kind of a joke but it takes a memory in your space like I removed it but it was a great joke was the first thing in the Rimi was how to mount the trace FS directory which is kind of pointless because if you're reading it you already mounted it so if you cat trace you go in that directory just cat trace you see a header and nothing so that's not this boring we want to do something when I see something some action what do we want to do so we echo function this is actually you could enable will enable Tracy on pretty much all functions in your kernel and this will go on forever if you there's two files there's trace and trace pipe trace well when you retrace that's a iterator it's it's a non consuming read so if you pause Tracy and you could read the trace file over and over again it's always going to be the same but to do that I had to when you read the trace file I actually pauses tracing when you're reading it so you could do it over and over again if you want a producer-consumer where it's not pausing Tracy and Tracy never stops trace pipe you could there's a trace underscore pipe in your directory you read that it's a consuming read but also it doesn't stop tracing so you can run that if you just did echo function cat trace pipe it will go forever it will never stop you'll never constantly the cat itself reading it will cause events into the trace of it buffer and it will just read its cycle itself to disable Tracy and you just do no op current tracer done so the thing is that we have now is trace command this is why I want to do is some people help back there to help me develop that develop this trace commit as a command-line utility so you don't need to do all the work of everything because yeah busybox great I'm very still I make sure every is still functionable for everything I do you do a busybox echo and all that but for more normal users like myself to I actually don't want to go in and type things and I could make scripts that's not useful so I wrote a trace command by the way I hate the name there's a historical reason behind it I'm not gonna explain it here but this historical reason and I was going to change it I even had on Google+ if you guys remember what that is it still exists I think anyway I had this Google+ poll about what I should rename it to and the number one thing that people picked was keep it the same my scripts are using it can't break user space I was like I hate the name so he had to be root to do anything useful because tracing is really kind of the funny part is I my friends case cook who is like the head of the security of Linux you know I always laugh to him and I are friends and it's why cuz he's always trying to secure the Linux kernel and I'm always trying to crack it Bret crack head open I want to see what's in there I want to see I want to modify I have dynamically modifying the kernel function Tracy think about how heavy that is you know how the function Tracy works it actually puts it puts no upset to start of every function which is very quick so it just reads the know up and disappears but when I enable function tracing I modify dynamically modify the code so it calls a tracer in fact live kernel patching have you ever heard of that where he actually could send in patches or you actually can patch your running kernel without ever shutting it down that actually uses the F trace infrastructure because of the dynamic modification of it because you could once you put in that jump to a trampoline from the entry of a function you could hijack that function because you go to the trampoline nothing says you have to go back to the function you called you could go back to a different function so you could actually that's how they that's how life kernel patching works it takes one function or it takes a function looks at it we need to put makes a fixed function into memory and when you jump to this function it calls you f trace infrastructure and then it returns to the new function and that's fixed and this function just sits there a memory taking up space well anyway trace command you could download from up there like I said and it's simple you do make make dock by the way there's a but I try to keep all the main pages of the date I'm trying to they're pretty close so instead of using the busy back way where you mount your filesystem your CD to the tracing directory you do echo function to current tracer and cat trace you do this from your normal directory you do trace command start - P function which means P is okay it's a misnomer it's a plug-in but it's not a plug-in so it really is a tracer it's still - P - P function tracing man show ed boom you see the exact same thing if your trace FS directory is not mounted it will mount it for you you don't need to worry about that it takes all the work away from you so I'm recommending people using trace command trying to do less talks about accessing the trek to the the direct the function of F trace control directory directly directly it's a tongue-twister and so i rather use trace command because it makes it a lot easier so let's look at the syscalls from f traces point of view so I'm running my instead of doing s trace I'm going to run retracing man record which records it into a data file there's nothing that's great about trace command because you can actually not worry about the trace well actually actually you stream the data right into a file that you can analyze later and it could be as big as you want it to be you fill up all your dis space if you want it's good at crashing machines so you do this trade summary report and this is you see this is all the things that kind of does it's all the center's successes exits and I cut this so I get to the bottom to show you here's their sis right this is the buffer this is if you count that I don't know 16 plus 13 but that's the hello world address so this is the buffer - hello world so back to this whole idea of system calls we have s trace up here that does everything from user space with help of the kernel but F trace doesn't need user space it's in the kernel it does the view from that so it's much more powerful so let's say the whole point of the talk is I want to learn the kernel I want to know what does write do ok we know printf did assist call to write I want to know let's follow that let's see what write actually does so I trace our hello program and with a function graph tracer which is kind of cool but here I do - - max graph death of 1 now function graph tracer will show you all the functions that are call in like a C format all the way down all the way up it's pretty cool I just I didn't really do a snapshot of it but if you tell it give me a max depth of one it's only going to trace the first function that goes into the kernel and then back that's it it doesn't trace anything else it jumps in because that's all I'm interested I want to see right now I would see what this hello world to do how does it get into the kernel because it shows you something that s trace doesn't page faults you didn't see that in s trace so when you the way remember the page table so we had and what we did before so when you execute a function it doesn't load that elf file into memory it just sets up the page tables to say this memory location here represents this part of this elf file and it doesn't actually read the elf file it's much quicker Chris thing buddy magic what you have to think of when you open up chrome how big chrome is I've had to load that all into memory at one shot no that would take forever so instead it only loads on demand unless you do em lock all but don't do that so the loads aren't demand so when it goes in so here's you actually see it in action so let me see here so when it actually did a load here did some system calls here and it did a load it actually faulted a page fault they actually signaled and went into the kernel mapped by a system call but because it tried to read memory it wasn't there they try to execute memory took a page fault the colonel said oh this this memory area belongs to this page it will load the page of memory in and then go back it go to the next page fault again so slowly lazily fault on on demand pages so that's what that's happening right there and then it's doing syscalls I see okay here's my file enter access let me see if I go there oh let me keep punching I jumped down this does a lot more functions because I had a lot of page faults and finally I get to here's my right here's my sister right here's my sin Center and I get oh there's a do page fault before and then I get do sis call 64 well I want to know more about what functions being called but do syscall 64 isn't very useful because I want to know only about the right function that's only that I care about but all the all syscalls seems to go through this function called do sis call 64 so that's actually the first thing that gets called and then it probably reads the what's called I think it's a AIX one of the registers has the information of where of what system call you're doing because the system calls usually put with the value of a system call which is just a index into a table and then it does a call then calls do sis call then jumps there's a jump table so I need to find out more am I talking too fast No so what I did instead was I did max depth to so I want to see not just the function I called but I want to see the functions that it calls so you can actually see a do page fault does a tree block find VM a that's a virtual that's basically the find VM a finds is where you're it's looking up where is this memory located it actually finds your memory mapping because it's got to load something so right there you got any kind of see what the things handle mmm fault so it finds the VM a then it calls us hey my fault that's going to pull in the memory for you into where it got it has to go and then you know it does unlocks and then comes back so if this is a little thing you can see a little more of what it does without overwhelming you of all the patches the data and here oh look it called something unique I want that that underscore underscore 64 cysts right so let's look at that function you only look at that function so I ran it and now this is a function graph tracer tracer is true glory I put in function graph tracer - G means graph this function only ignore everything else because they'll be too much data I can't I can't read it so I'll just graph this one data so I said I only want this file or this function and it graphed you can see all the functions that are called and one of the first things because you know it's writing into this kernel and the kernel is very very paranoid it does not trust user space so anytime you go and write into the kernel it's got to verify are you okay - right but honestly I don't care about that I want to see where the printf goes so what I did was I hello did I do it well here's the ver full right all the verify right I'll functions I want to ignore this I don't care about this via the verify area so I put in - n RW verify area which tells me to ignore that and actually the whole graph has disappears it doesn't trace into it so that function actually disappeared you don't see it so now I get to look in here and oh look I got this well I got an interrupt so I put in an option don't worry about this actually it would've been easier just to put the SNP editor up is don't trace that but I put in no option and this sometimes works sometimes it doesn't it just kind of get things away but every so often it interrupts its buggy I have to fix that code and I ran through it so now I got local me one more variable okay so VFS right so this guy calls this thing called VF s right VF s stands for a virtual file system which basically is a handle for all file anything about systems whether it's you know xt3 butter FS a virtual file system they all go through the VF s layer so the VF first layer will do all this stuff it does some paranoia checks for this is the tty right so the council is part of the tty so don't worry about teletype whatever it's called old ancient things we got to rewrite we gotta get rid of it but anyway it's still there so you get to see what that doesn't does all these things and this is n TTY right that's an interesting function by the way a little trivia I think in 2003 my very very first patch I ever got into the Linux kernel was for the NTT right function call I had to go look it up again so I go to the next page and it goes down further and does this PT y PT Y right and goes and inserts it into some string flag does some flip buffer thing and then accuse some work and then call some black book you work on something so basically I think why I'm assuming is it's putting in some place putting data somewhere and calling something else to process it it goes further down and then there's the t2 this is a try to wake up so it's actually waking something up whatever it's waking up I don't know and then it goes on further and say okay wait Oh here's do output character so that must be important I don't know what it does but it must be outputting some character and then if it flips flips you off some more and push it does some more insert work add some more stuff goes now it looks like I'm coming out of things there that's it that's all the right system call you think it was easy that was a lot so what do we learn first of all we learned system the system call right is way too big so it writes the santur and puts us in some buffer and it wakes something up to do something with that buffer but we still don't know what it did with our buffer like okay I wrote to the buffer so system file right went somewhere what do we do we need to dig deeper deeper so let's say I want to start seeing what else is there so instead over the way let me just go back one thing I didn't notice tell you that - capital F that's there that means follow this program it don't trace anything else so actually filters everything else but does it does it only follows that one program so now after I go up back here I removed it I want to see all programs but I'm only trace certain events anyways trace when something gets woken up and when something gets scheduled out so it's CPU you know it's a scheduler CPU can only run one task at a time even though you see your Chrome in your clock a lot of those things look like they're running at the same time multiprocessors yes they are but not always if it's a single processor may still look like it's running one at a time but it's just basically do we know very little bit then a little bit a little bit and in your mind it's going so quick it looks like it's simultaneous when it's really just very serialized really quick so I want the scheduler so I could see actually how things switch this is a lot of data you see everything everyone understands everything that's in that slide right good TMI it's too much info so when you do trace command record you get a lot of information and that was just on what three events that I traced was it yeah three events scheduled sched switch and these are not even common of it I mean they it's common but there's other events that are even more busy or busier so sneak peek so it's too much information we need a way to visualize it it takes an expert to dig into something that complex and we don't want to be experts we just want to learn so the introduction to Colonel shark which has got a new facelift thanks to yarn on over there - for rewriting it in cue I started it a long time ago and I never had time to work on it thanks to VMware we got funding to hire someone full-time to rewrite it from scratch and it's going to be doing a lot more in the future but right now the first thing we had to do to get Colonel shark 1.0 out which we're still at point nine because we're tweaking still fixing bugs but once we get that out it's going to be just as equal to and a bit better than the original and then we're going to do a lot more after that so instead of this horrible type of okay a lot of text that makes me go blind we have this a little bit more visualization you still have your text the switch here so and here's the information you see four CPUs on this box and over here I put the mouse here which shows you that this is actually the hello world you can see it so then I went through I'm like okay I'm interested in certain there's not only these the sketch switch showed me that these were the only things that actually woke up in a sketch oh these are the only things that actually ran during hello world so something else did some work for me so whatever did that work for me must have ran so now I could look at it and do a task the selected tasks it'll show you all the tasks and I'm like interesting well trace command obviously that's running we don't worry about it so we ignore those RCU preempt i just know that i could you you might not know what that is he might have selected looked at it I know better so I know RCU preempt does so I'm like nope I can't ignore that idle task idle does nothing so it's well look we have our terminal i by the way i use xfce if you haven't noticed so I use my it's my terminal so the terminal is important because at most that's what displays printf and then we have two worker threads and those are kernel threads they're not user space droids those are actually the kernel created them so you can't get rid of them you can't kill them so now you'll notice that it shows the CPUs horrible resolution but squint that's the terminal these are two worker threads here and here's all the way the work happened now I'm whoops right yeah oh but there's not much information there and by the way the reason why this is I had at first I thought this was a bug and I fact I called your nan up saying there's a bug here that's like wait a minute this here's left you can't really see it but that's the wake up or that's the actual right that's the right system call that the kernel that printf or the hello world did but that's not we saw all the code before that that wasn't the first thing it did was s trace or the even s trace showed you it did a lot of work before they did that first right so why did it start here and then I just realized oh because we only trace scheduling a schedule switch wake up sketch switch and right and when you run trace command wait trace command works is it does a fork exec or before it does a fork an exact but what does a fork but before does the exec and enables tracing does the exact so it never did a sketch switch enable tracing and started executing the first event to show up by hello world was that sis right so according to the data we got that's what it started but since we have visualization screw it let's record everything so I just said give me everything run hello Oh got a lot more information there it could be overwhelming and actually I had a you don't see this but I actually and this was a debug kernel I ran this on and I had irq disabling and preemption disabling enabled and basically this was basically just IQ and preamp disable you didn't see anything else I'm like that's all I had like turn off those events and run everything so this is a lie this is what Colonel shark looks with that data now you see this here is hello its way I mean that's where the I've marked where that the right was and you see it does a lot of different other things by the way the preempt on and offs are enabled here so I want to zoom in so I take my mouse slide it over boom it shows you this so here's where the right is and I'm going to look I look over this is you move over here this guy my hello world woke up this kernel thread so that's to wake up and this is where it actually started executing I can see here's a transfer of information this guy woke up executed so that and it's following it but why did notice I didn't look at the trace but just looking at kernel shark I notice one more thing this guy woke up something else and it woke up you can't really tell the terminal so somewhere in between there's a handoff the my hello world past something to a worker thread who passed something to the terminal what is this terminal doing so I went back in turn enabled all system calls and ran the hello world only looking at what the terminal doing believe or not this is all did you did a few reads they did actually was exit poll because obviously it's in a poll loop have you ever know it's select and pull it's just wait sleep on a file descriptor and when some information comes in wake up and run so it does a poll I wake out of there it roped something by the way I've looked at that before and it's just garbage I don't know what it wrote but then it read something that looks in mmm interesting it read a lot I know what it ready and then it was aged so then I extra read X reader wrote some more that it did a receive message and then went back into poll that's interesting well let's run the function graph tracer on sis read cuz it read something I want to see what it read so it goes in it goes there's all this formal stuff it does more man wait I said ooh look a copy from read buff that looks like an interesting function let me go dig a little deeper into that so they looked at the code this is a Linux Court source code so if you go to 419 rc5 I didn't download the latest and greatest which is 419 Ori released we're in a new merged window so 420 or maybe he'll call it 5.0 because Salinas can't count more than his fingers and his toes so I saw this function I was looking at the code and it's this is actually the full function that's fits on a slide I love this I love functions that fit on a slide I didn't delete it I only thing I thought was there's a huge comment above it I like huge comments it usually lets you know what the function is doing and it's reading something from user space because we've noticed it has that little unsigned car user so it's reading something from you or it's reading into user space so it's actually this is copy from read buffer but it puts it into the user space and notice down here I noticed this thing read buff adder some crazy thing from some tail so there's some look at the head list I'm like hey this is some sort of buffer maybe this is where my printf went to and it's reading from it is it well I want to know what this character string is so I look at this function and the second parameter is the the thing I want to see okay what I did here was now let's get into something a little more magical let's put in let's create our own event remember in the beginning of the talk I said the RSI register is the second parameter remember that si this is a cape Road is a dynamic probe what doesn't it modifies the colonel coat you actually put in a trace point anywhere in the Colonel woman almost anywhere and then read information from it and I put it at the start of that trace audit data file and I said okay buff equals this is when you look at that register and convert it into a string and that's nasty code to do we were trying to make this look better that's one of the things I'm working on I gave a talk in Edinburgh last week about how ugly this code is so this is the second register string and put it into cysts kernels Racing K probe events it created then I ran trace key memory core - e data hello and traced my report this actually was a lot bigger a lot full of crap but look what I have here hello world with the edge resetting it worked the problem is this was a buffer and I said call it a string and this is why I said don't do this at home because I put if you go back to what this is actually this guy is a void pointer that from and it's actually passing n which is a how much to read so it's a void pointer telling you to read this string has no null character at the end so when I did it's told that the K probe to read everything it just read a bunch of memory into a found a null character so this actually was much bigger but it made the slide so ugly that I got rid of it but there I found the full path from where my hello world sent something and then we're at read he'll never look at hello world the same again thank you questions oh we throw and by the way okay if you to encourage questions we'll give you one of these no questions that must be my talk was so good that no one here I got one sorry I will repeat it for them if I want to start with system programming how where where I can start where I start what with system programming with like programming for the kernel if you want to program for the kernel where we also start to which is a good starting point that's a good question because it's been so long since I did it and when I did it I did have my first introduction was actually in school we had it was for the to zero kernel so that tells you how old I was and I had to replace the tcp/ip the tcp/ip stack from a protocol that does send an acknowledge to a credit- acknowledged protocol where we rip out all the things because tcp/ip is very much a bloated protocol it mate to run over a unreliable network so it's made it's waiting for acts and it's we've got Windows changing and all this to make sure things move nicely but when you have the way of this box connected to that box going through a single switch there's no bayit packets will never come out of order there's no rewrite there's no routing it's just here to here go fast and my job in my school was simply to write write the negative remove everything and the credit negative acknowledgement is just say this guy will say or you say I want to send you data and this guy will say okay I got a megabyte of buffer just start sending me so I'll send everything and went done him say I'm done and this guy will just receive everything if it misses a packet drop a package drop it just sends a negative acknowledgment says I missed something give me this again so resend it that's all the protocol basically is I sped up FTP by like 40 percent almost doubled the price speed by transferring large data's so that's how I got involved with AI once I start doing this I fell in love I said I had to find something to do this I looked for jobs doing this so one way is basically notice what I just did this whole talk I did was just about something as simple as hello world is so complex think about that so why tell people when they've wanted to learn the kernel and they want to become a kernel developer find something so simple and see what the kernel does with it follow it see if there's anything else that's interesting and see if maybe you could do something better so what find out your own you have to find your own scratch to itch that's the problem people come to me like do you have a task for me to do and I tell them start playing with my code or whatever start playing with the code and find something that you don't like this what the guy who the guy who wrote function graphed tracer Frederick vice Becker he started when he was in his 20s early 20s 21 22 and he he didn't like his current major now a lot of people like you know your Don who does he was a physicist he switched from he did a lot of work at CERN programming and said he likes programming better than physics I don't blame him and and so he switched over Fletch advise Becker was theater and acting that was his major and said oh I want to be a programmer start looking at the Linux kernel breeding figured it out sure writing code it's really up to you so do the little things F trace is great because F trace is a way you could see I wish I had F trace when I was learning the kernel I actually had to use grep you know seriously you hit a function pointer grep and then you find a function then you say you what you hit will you be hitting a function pointer so basically you're following the line you get to a thing and it says okay this guy's calling this function which okay off of some so basically it has a it takes a data structure that has a bunch of function pointers and one of the functions in the function data structure and I searched that data structure there's 70 or 80 of those data structures all over there different functions and I'm like oh okay so I would put a print K which is like the printf inside the kernel to print what what that guy called run my program again see a print K come out saying oh that's the function now going trace it two steps later another function of the structure of another like a hundred different things might it's spent weeks trying to follow the flow here where today you just turn on function functions are being called because I try to debug the tool or try to the photocurrent ok the tools broken no I write perfect code I'm sorry it's not perfect and what I saw that this to actually interact with this today's file system so my question is is possible to include some obscure new header in some user space program we did so do so I mean - it's possible to include some Linux kernel header in some user space program to be able to receive all of those data which is directly to do so basically taking a kernel header and include it into I mean I guess that's what BPF trace kind of does it takes a kernel header and includes it in the thing now the question is here okay you could do that the other day I gotta yeah got a little feedback on the other one so it takes the mice time okay it takes the the header file from the kernel you compile it into the other one you get the same header files now what does if you run on to a different kernel where the header file is different so it doesn't work oh yeah what I would like instead is the what we're working on - by the way by the way Colonel shark and F tres are going to be wrappers around libraries so all the functionality that we have could be it's gonna be a LGPL library and you'll be able to attach that to any application you want so once I get to shipped out to various distributions you'll be able to if you write a program you want to do tracing you have to find a way to turn the route but we have like a way of we're having ways to help you switch to route but you have to type the password and then execute Tracy record the data in the file and analyze it and any tool you want and even works with like Python tools so Python will do this but say that doesn't answer your question about the structures and the headers to find out the data inside of a the kernel what we want someone to work on is getting a dwarf parser so all you need is access to or a kernel built with dwarf information and then you have a dwarf parser that elf file that I talked about earlier the debug dwarf you know elf dwarf the guy was of it people are obviously Lord of the Rings fans so dwarf is away is a thing inside the elf file that tells you where all the variables are the structure layouts and everything else so if you have a parser you could say okay give me that function give me that second parameter and just tell the dwarf dwarf will tell you where those are how to look at it what registers they're in and then you could add the trace points and be able to dynamically do all that stuff oh we're working on that that's that's our goal is actually my goal is like in kernel shark to pop up a yard and privatization know this yet he's finding about feathers now one of my goals is this pop up a bring up a file of the kernel like so say go into the kernel directory of the currently running kernel pop up a file say click on this I want this variable recorded it will go and say start and record it will create a trace point using a probe onto the code get the variable reading dwarf find out where it is and you run your code you actually see that variable pop you up so you don't have to figure out what the variable is you just look at the code it just kind of point and click that's my where I want to get to so okay anything else oops box here throw that part yeah [Music] that's not a tracing question yes I know okay so so the code of conduct it's basically at the I was in the maintainer summit now how many people have heard of the code of conduct and leanness this whole thing I'm on the Linux Foundation technical advisory board maybe we should stop the recording no that's fine you would record it basically this is public knowledge basically we're saying is it's in the code of conduct is in there what was said was basically a lot of people are upset or you know are afraid of everything and right now we're going to say let's see how things go and let's not let's not panic and do things over here say or you know these hypothetical problems let's see what actually happens if there is a problem we're going to then address the issues when a problem happens because right now really most likely if you go back two years seriously it took about two years for a lot of people that are criticized leanness and all that to find something that would break the code of conduct two years so that means for the last two years we've been following the code of conduct we just don't really and black people ever realize that and this is basically the code of conduct is basically to show everyone we actually have changed we were not much different than we were a year ago I mean we are actually better but for the last two years the Lynx crow event the Linux kernel development community has been rather tame like we are very much more professional I think we just got older we have kids you know we've learned how to deal with things that's my own personal opinion can we expect Torvalds code of conduct and developers I mean sure the Linux kernel before and the new code of conduct not this one like I said what we're doing to is see where issues are excuse me we noticed that the commits in github small these days so we heard that committers will take the work back and will set up a new project under there that's new to me rumors only like mock the rumors you know I don't see that changing I don't I haven't heard that that's the first I've heard of it I don't know anyone that's pulling out and there and doing things I mean they we have an interpretation document that's very much explains the way we're going and it's one of those things where nobody likes it which means it must be written right so it's not and it really comes down to the real is a perception problem like I said the Lynx kernel has been really really good and we have we're not changing but there's a perception that we need to change the code of conduct helps change the perceptions that same telling people we've been like this for two years and it's not weenus may not be swearing as much but here I turn things down already he's already said no you can't do it this way and yeah he won't be as colorful he won't be as you might won't be as quotable and that's a good thing the problem okay the one thing I would like to you everyone know with one the problem with the Linux kernel community that no other open-source project really has we have a celebrity lena Stovall's was even invited to the oscars okay so he is a celebrity people watch everything he says every day whenever he sends out an email there's at least a hundred reporters reading every single email and we are in a glass bowl no other project is there a hundred or some reporters reading what like the project leaders email so when he gets goes off on something if it gets into the headlines and that makes us look bad and the problem is we are trying to be inclusive we want people from China India the Asian countries lots of countries there and we can do a lot without having the big you know retro actively aborted comments it's clever so maybe if you'll see leanness or something just you know tell them one time hey come on insult me please thank you yeah anything else okay well thank you my way you to get them but first come first serve you guys want any dice there's a mistake on it you have to figure out what the mistake is I'll give you one mm-hmm
Info
Channel: Openfest Bulgaria
Views: 30,291
Rating: 4.9559903 out of 5
Keywords:
Id: JRyrhsx-L5Y
Channel Id: undefined
Length: 67min 24sec (4044 seconds)
Published: Wed Oct 31 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.