eBPFTrace - Finally Dtrace Replacement on Linux

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and today we will talk about BBF trades now first let me ask anybody here from University trace Wow a lots of hand looks like I'm in trouble and what is about familiarity bps okay fear mountain as well so what you're going to talk about here is is I'll do quick refresher on Addie trades and the beauty F right for those who are not symmetrical technologies will look more particular about ETF to landscape and specifically about EVPs right and how it compares to the district's so let me start with instrumentation basis now maybe on the entrance you've seen we are in the observability track right what is observability it's really ability for us to see inside the Runyon system so we can be our observers right citizeness brakes right or to make it better even if it doesn't very right right and that is understood I think pretty well those days as something is very essential especially as rated and it's increasingly more complicated system strategy for microservices you know multi-layer architectures and so on and so forth right and if you look at the approach to that observability is done through virus instrumentation right which you can see those like little holes and windows in the system right there you can see what is going on inside right now if you think about them instrumentation right they are you can slice and dice it right from a different ways but a wonderful would be to look in the static and the dynamic instrumentation right so what is the static instrumentation well that means there are some sort of very specific you know counter switch right or other ways which exist in the system for example we are all here familiar Flynn ups right it has wonder and health and for years had a wonderful scene called protests which expose all kind of wonderful counters and so on and so forth right which are essentially the part of static instrumentation right dynamic instrumentation is and you as a user can choose what exactly you want to instrument right now right right here right now another difference I think in instrumentation approaches for cenotes is trades and versus something if you think about the tration what that means is you have some instrumentation points in the code which is amid some sort of events and that's the point is triggered and then something happens from this event right when I speak about mention about static instrumentation in progress that you can think about was in many places in code we have some very simple program write something like simple as implementing the counter right being being called right and that you can think about event being generated right and processed all in one place just very simple code flow point right with sampling is is another approach when you can say well instead of looking at that in the code we can sample an observers page at the system right for example many times a second right and through a state of a system for example program stack kernel stack right all kind of you know CPU diagnostic registers we can also understand what is going on right that is pretty much another approach the instrumentation right especially in terms of performance analysis which we can see pretty well all right so this is a quick comparison between our static and dynamic instrumentation right now if you look at the static instrumentation it is of course very helpful because it provides us a lot of data which is in standard format which is ready to use which is always available but obviously it goes in the only so far right because if you will try to do add too much of those kind of static instrumentation right then we will create a huge overhead right there by doing so in dynamic instrumentation though you can choose a word to instrument based on certain conditions right so for example if you need to divide with these two more stuff and let's go into the wrong you can enable that right or you may have a multiple systems you can have a one of em like a canary right to where we want to call the system and you can route smaller portion of traffic to that system will have more Diagnostics enable 3d through dynamic instrumentation right so you can understand what is going on right and for in American fermentation you can go very very deep right okay now deterrence what is the trade right and why why is it is cool we do trade right as the name implies that is a dynamic trading framework and it has been around for well look at that almost 20 years now right so it started developed in Sun Microsystems in 2001 right was released in Solaris 10 in 2005 for us you know [Music] fifteen years ago right now right and the approach that is what we are having specific trade points defined in user land and kernel right the rich can simplify of iteration as well as that you can also do free dynamic trade and simplify okay or trade any function calls right and some other available well it's very cool and important about the D trades right which is Martin in red here is what the design goal was what if a trades pointer is not enabled there is no overhead right so it actually does something you know pretty smart then if you enable a trades point is essentially it rewrites some of the code right to enable that evaporation right but if we're event is not an enabled when a program just you know precedes with no overhead no specific you know if trace enable then call that function right to some other overhead like that and for details for defining the program right for the instrumentation pointer with a special language defines called D language right which is and something like see married Ark or Christ and they had a beautiful baby right maybe not so beautiful depends on what your point of view but yeah that it is now D trade has gained some popularity for years well beyond Solaris right it was integrate in Mac OS in freebsd net with the Oracle Linux like forever supports support G trades right and the code was even a real license the relatively recently as GPL v2 right so Oracle would say hey guys who want now everybody to include that and last hell there was also work to include the trades in the in in Windows now if you look at the Linux D trades is kind of an has an interesting story here right he trades is not available in a stop line of zero right Oracle has provided patches hey guys you can include them but we have sorry too little too late right if it would be done you know decade ago right or five years ago perhaps with mirrors now the Linux community has pretty much standardized on using btf right and related technology for progress right so do trace and Linux while you can get it in Oracle Linux you can perhaps a priority patches right or it is not really readily available in many major Reno distributions like last time I've seen okay let's take a look at the duration in in Linux specifically now now I think what is interesting is what Linux through here has a golden large number of those kind of Campeche and tration tetration frameworks right and if you look in this case Lansky did has been getting getting you know pretty pretty complicated now I'm not going to go and review all of all of that I'll just send it to this wonderful blog post about the Linux tracing in pictures or jewelry heavens I think that's very fantastic representation to see what is out there exist right but if you think about that about the trace and there are several kind of components which are important right you can break it down in the type of kernel interface right essentially what you can kind of connect to right as a probe the type of program right which you can connect to that program which can range from let's say building the the built-in kernel buffer right so for example some of tracing framework with the support when native support and clear another can say hey you know what you'll have some buffering key another all events would be logged and then you can consume in prototype from user space as you like right another approach which existed in a while like through a system tab for example the CD grades and some average user approach in the past is having a kernel model right hey you have a kernel model that's how you connect that program which does a process well that is good right really do anything you want with it at scare no moral right and it's you know very powerful no restriction but that also comes museum problem but you can do anything you want right that means potentially by inserting that kind of instrumentation moral right into your kernel to divert something right if there is some bug in that tomorrow you can pretty much screw up all the kernel and crash right and then you also have that kind of interesting challenge potentially as keynote version changes doing kind of do do recompile what if you forgot right and all the kind of risks right coming fat so now that wasn't very much a lot right and the last piece is the front-end tools right so even if you look at the very basic static creation right three dimensions engine of like a proper fat we have whole bunch of tools like SAR table VM starts which essentially consume that information and and present that in convenient point of view for for for user view right the same applies to a dynamic dynamic tration right under having whole bunch of front-end developed through years right sandwich I I lived here okay now let's take a look at ETS right as I mentioned here is has been a lot of different approach tried in Linux but through last few years is a lot of standardization being done specifically ads where ETF framework what is EVF and where it come from it's interesting what when named BTF really start throwing berkeley buying a packet filter right have nothing to do with observability monitoring right instrument or what's not right and indeed that is where it came from it was really developed in 1992 as efficient virtual machine for packet filter right pretty much for firewalls and stuff like that right and then extended dirty packet filter that is new and improved version right witches exist and Linux now what is interesting is when it comes to a Linux a BBF is what exists in Linux all right so you often hear people just drop an e and talking about simply vtf in Linux they have reference to exactly the same thing right I mean I haven't seen particular original we see have been used in Indian Oaks contact it developed in a general event processing framework right so it is used by for observability but it has other uses right even in next network space for example people are using a bt f4 here for routing for DDoS protection right and some some other stuff but we're focusing on in particular or on another ability also through here the JIT compiler was developed for for in ETF and that means what even if vpf used its own kind of byte code right it doesn't use the native in instructions it can achieve very high very high efficiency so how does ETF via CBS compare right this as probably only like historical interest right for four people but that is how in DTF I only had you know 32-bit registers and very limited number of them and they had a scratch pad in ETF you have 64-bit registers many of em we have a concept of a stack and what are the most important living here is those kind of Maps right what is the maps is different data structures which can be used by use a program to store information you know like like hash map right if I want to I don't say compute you know some sort of distribution right or whatever I keep the grams all this kind of stuff right which is really allows us to do some very powerful data processing EGF was in Linux kernel side since 2014 right and this is great that means is what you can buy now find the EVF in pretty much all the modern Linux distributions right because typically takes time for them to embrace the kernel right and then often to companies it takes time to emigrate to some particular version right so for example five years ago while there it was in Linux kernel it was not really readily available for most of us in production right you can could install some you know like seed or write and play with it but in majority kernels which we're running in production did not include et has now changed it still been actively developed and improved and what is also very cool is now it's a integrative Linux perf system right so some of the the you know basic the dynamic situation can be can be achieved using using that's--that tool chain so here is some example what have been what have been done in a lattice linux kernel's right and you can see there is a you know whole bunch of improvements continues going on okay now I have mentioned what EDF is actually like specific byte codes and then they refer to eg a program that is a program which was compiled to that particular custom in a byte code right which is where to be loaded in the kernel as I mentioned that is something which is satyr then getting your you know random C code from there you know can a load load your model but it actually gets better the programs if your programs they are verified before the load right there is some sort of code analysis so it prevents misuse right for example you cannot really have there you know the never-ending loop right the in video UTF cold right and stuff like that so it doesn't mean you cannot screw up your kernel right people are ingenious pieces when it comes to screwing up but but there is a significantly lower likelihood of you doing that right and if you don't yeah at least by the accident it's much less likely to screw up right now while you can get a probably check out some ETF code right you're probably not going to ride in that gannis vpf assembly in your life right why well because actually the llbmc one can now compile to the VFR target so you can actually write the limited of code right well some C programs which compile to to ETF now what is also important to note in this case is what does compilation is if the kernel dependence right so when you are using VTF programs then often you would want to have a kernel headers right because like I mean then you compile some some probes right it it needs to know the types of data offsets in the care no structure and so on and so forth which I mean which can change in different color options right that's why headers are needed okay so if you look at the kernel space sources user space how do those all all play together right so if you use the if you look at the ads right from the user space you can use the user program right like vpf trade for example to generate that byte code then it will cause a load hole which will figure to Colonel which will provide a verify the program if a program is the valid if you have all the permissions like that you'll be loaded that program can connect those all the different data points Cape review pros straights pointer events right and as runs it would store the data in the map and still the user space right and then this data can be read as synchronously to their viral statistics by again by the front end right and that can also be an output right through perf outputs directive EDF is needed right that's kind of how it all works in in a natural another fancy graph which highlights oh oh what different versions of Linux has gotten different instrumentation instrumentation points this is maintained by Brandon drag right anybody heard about Brandon Greg okay yes so he has a fantastic blog gratis writes a lot of ODT effort has those fantastic illustrations I would suggest you to check it out if you want to learn more and he has also released the book about etf recently right so if you really it is about this topic that is something great greater e now another project which you want to know is a visor so our visor is where a lot of tools like VCC vpf trace right even in part for that project and while a lot of those tools would have abortions we which may exist in the latest Linux distributions they are kind of related outdated right because it takes time and because that is the such actively developing ecosystem and framework if you want to use like the best feature it's often best to grab your tools from a wiser rather than to use what comes a broken system okay now enough information about EVF is the overhead we did something with data from cloud layer right in terms of the they are using EVF extensively and they wanted to Timon you to to look at the overhead so what we have done in this case is looking at you know no probe right to see how much certainly separation takes some simple program and more complicated frog right some update and some staffing in in maps right and so on so forth I think this is very interesting to see is what you should expect with single EVF program call to take somewhere in order of hundreds of nanoseconds right and of course very close can be executed concurrently right and multiple multiple CPU threads right that can give you an idea how frequent event can you really potentially trace the video straight right you can see in this case is it you can you know probably trace something you know up to a few millions a second right especially if you have multiple multiple CPU cores right with tolerable overhead which is a pretty high number right so if something just happens I would say 10,000 50,000 a second right that can be usually profiled vtf trades with no visible performance overhead which i think is very very very cool okay now let's take a look about EVP Efron's hands right what exists out here and how they can be can be used again let me steal the picture from Brandon Greg right which outlines all bunch of four different tools right by in this case by the scope and compatibility right on the bottom and how the ease of use is right so if you look in this case why we are talking about it is right here because it's kind of in this upper column right it's a it can allow to do a lot right at the same time it's pretty user-friendly at the same time kind of using this bar it's actually pretty good in terms of stage of development right it's a pretty stable product at at this point now another very cool set of tools is right here below is VCC right anybody have a use that so what so what bcc is right that is sort of like I would say like a previous generation framework which allowed to to write some code and everything's like combination of C and Python to develop to divert to develop the profiling tools but what was great about BCC is there is a thing called BCC tool collection which provides our knowledge maybe 50 tools which you can use to for most important like interested in discrimination right you want to look at your i/o latency distribution very the toolbox right you want to see a which connections called like returns meter the tool for that that you want to look into your file system cache efficiency rates do that right but if you want something on your to write a phone you own that was relatively brutal and let me show how exactly brutal that is right that is essentially compiled a comparing code for a program which is implemented in BCC and DTS Ray's right I think in BCC right there is a lot of pretty complicated stuff right while VP of trades gives us something which is pretty you know pretty user readable right and and you know and quite quite simple so as I mentioned BCC is fantastic why because it has very large amount of tools which is available to instrument all kind of different different things right and this kind of again provides very good map right which shows what kind of tools allow us to get inside and in in different areas right so lots of lots of fantastic fantastic stuff out there nothing what is also interested in this case is what and now when DTF trace became mature enough many of those tools have been modified and also rewritten to use with es trade language because that makes them simpler it's an easy to maintain right and so on and so forth now in this top we are going to talk about DTF DTrace and it is race right so let's compare now between between those two right so if you think about their [Music] landscape comparisons right we can put our instrumentation programs probably like in those kind of two buckets right of scale and sliding scale ISM some very simple one-liners and some more kind of complicated analysis tools that each can do some complicated analysis right and by the and magic if you look at DTrace framework you typically would it can use D trade for your like simple one-liners and more or less simple things right and if you need very complicated stuff you can use the trade /l right or little de tres plus whatever programming language right you you want to do to do that data it regenerate if you compare that with ETF or framework UTF framework right they have a b-tier trades which can do a lot of editors can but you can see the trace kind of can do slightly more right but if you need more complicated stuff you can use VCC right which is pretty much powered by Hill on interface to or to write to VPS white codes you can pretty much implement anything you want right in in in a framework right and you can see the fat is BCC framework is kind of slightly fell powerful more powerful right than what you can do it shell and did reasonable now if you look from comparability there is no Direction comparability right so PDF trace doesn't support the programming language you cannot really use the same tools just in order place back-end and that wasn't the goal right the goal was to have a tool which are similar in spirit but if the expectations are what a lot of programs will need to be poor it right and that kind of thing sense because you look between different rating systems and between different you know even different versions of tracking system chances are what you're you know diagnostic program will need changes right because system calls change right or or right or wherever here is the comparison for some of you know most important functions right and functionality and you can see what in in many cases right is you can see what that is he is not direct comparison comparison but it really has been heavily inspired by the VD trace and I think we can look here is a compare two scripts together side by side right and I recognize it's hard for you to to see that on the back but for you can also see what if you look at a whole concept of the script again it is relatively similar so if you have been programming for the Dre's using D you should be able to get start with Lydia strays right in a relatively short time right and I would say if you are moving some of your you know programs right from Solaria so FreeBSD right indie trade to Linux probably figuring out what probes to connect right and have a names that would take more time right and actually converge every syntax [Music] okay now let's take a look at a very vtf dress so what our requirement now if you look at the requirements a lot of time actually depends on what exactly probes and interfaces you are going to use in the NOC schedule right and you can see if you are for example only using keep-keep robes right you you can run it as easy right as we from well not at it as early as can on four at one right if you if you really want to use more advanced stuff right for example hardware events right like if you want to trace you know cache misses right and some other stuff right then you want to know you want to use to have a Colonel 4.9 alright so again similar image right I think this has a very good example which highlights about what kind of different the probes right you can you can use for a different part of your of your of your genetic system now let's look at habit of traits works alright if you look at from from this standpoint right you can think about there is vtf trades that's your your basic program right and then here is a kernel what operates in the kernel level so when you run with your trades right then you would have obviously the program which is going to parse writes growth rousillon faster right and in the end we are going to get a video byte code right a dimension for all there are the origin Arabs all the other over generations which is going to be loaded to the colonel and again similar to our in general we fear for summary right you'll see are the maps butter right and so on and so forth being available to VCF trace for processing right and and and printing right so again very similar to be framework but kind of a little bit more complicated because there is more things going on so now if you want to go ahead and use bps phrase today right what can you do well first not all distributions currently have packages right it's kind of relative all to and again as I mentioned because there is a highly-paid hi face development even if you have we clear straight in your distribution you may want to check and have a you know packages from from Iowa the right for the latest version now another way you know to install if you the video straight is also to use a snap right but I will write on rate latest versions but I will also show right worthwhile very recommendation to do that in in documentation you may have some surprises right because how snap operates you may not have a full functionality in the tool depending on what you want to do okay now let's look at a simple PDF program ready clear trace program right for one needed our kind of well relatedly symbol simple one-liner what is going on here we are calling it a traitor for providing minus e to execute right and all of that is pretty much their program which is passed the program for us it's 881 to attach for for this probe right and then for kind of given tid right one tree to the corn current time not a second right and then you can see the key red probe right that says now we also want the instrument than VFS read all returns right and these are right go into or well to essentially get get food that in a histogram right and delete what he had stored as a sauce I still have a relatively simple program right basically store store current time Tracy right and then put in a histogram right when you know the total distribution time when you run this program that is what we are going to to find out right for each process it's going to give us the various tribution of of a histogram right in this case I am showing one for for MySQL process now question for you why do you think you have a kind of interesting distribution writing there is a lot of items coming here when there is very little there and then another here oh thanks okay and you have ideas [Music] yeah these are all foreign right so anyone else yeah [Music] yeah so if you look at a lot of Io right it's very frequently would have what is called the modal distribution right so this is VF s retreat from file system a lot of theories here are very fast right the Rev is it going from a file system cache and those are physical disks right so so we are approximately call it in terms of how long these care takes right so that is a femoral distribution is very very common right when you look at over now you know file system file IO right or anything else which involves involves caching okay so me Rick yeah oh thank [Music] well yeah this is K and M suffixes right the histograms it doesn't know what you're doing that's right well they're the histogram here is pretty much the output for histograms right it's kind of designed in such a way right because in with your trace it just have a function to output histogram it doesn't know where ever your store and buys or seconds right so it looks in this case like these are 256 K like that the size now that is actually number of nanoseconds which corresponds to well whatever I'd it is like froggy ha half a millisecond right which cutie watch are you from SSD right here sir so yeah that's kind of a little bit confusion right they probably cleaning it up in the future oh do you want to have your histograms for time and histograms for for data science right that currently is not there again but yeah thank you for catching okay so I have a pen you can save it also as a script file right no surprises here some little bits very for margin right it just does the same the same thing and then you can write it pretty much as Aditi right or even you know Footwear proper header in the file right and make it executive executable right and run BTW programs energy run Python right wherever so and that's how you convert to us so if you look at a concept over you right going through the same if you look at the very high level things internet concepts you have a probe like a serial probes you attach to then you can apply the filter right for example you can say hey I want to attach this probe but I only want to attach it for given processor at the speed one two three and then in a bracket you would have action right that gives us a program right in this case you have no filter right that is the probably attached to and then right that's that's pretty much where well the action right which is done right so pretty much that is a call right what you do in the program right hanging you can in the same program you can have a multiple of those bits and pieces right like in a previous program we attach to these probes and that probes right and then you kind of combine that data together right in a simple way and you can obviously do it more complicated thing okay tools there are number of tools which are available right for with EF and Santos tools like the essentially Cote right or replaces BCC tools functionality right this bo latency right if somebody submitted VCC they also violate and see right which can show you blog device latency distribution now and everything is I mentioned if you guys are interested in the bytecode for BCC right I don't think there is a particular way you why you want that right and typically ask and a for developers which may want the bucket battery is this - V flagon VCC which can give you you know fantastic representation rights as you know bytecode for for BCC programs right well I don't maybe you want to kind of get some more ideas right about to what health is really going on now let me show some example I wanted use this tool and say hey well thank you may know I'm MySQL guy right so I wanted to see how we can use v2 a fair trade to straight up and why as well such as there is this function which is called dispatched comments right that's where every my skill query is run from add function now first I was confused gives me this error oh there is no such file or directory while there was obviously in a file right and that is what I would say with a snap passenger it becomes a problem right because it's not packaged it operates pretty much in its own file system namespaces will not read the same same file side by by default right so I installed VCF trace using up right instead in this case and I have another problem right now where symbol cannot be found now anybody has an idea why huh no actually the very binary work right with VPS or tracer at it not not particular particle problem if you look in this case right each other to look at what exactly exists in MySQL that you can look at in M you would see what we do not really have this patch comment that has been made it has this kind of some real name right which is called like C++ named master right so and reality is I have to use that real function a function name in this case to connect that right so finally in this case what I can connect to that function say hey you know what I want to print the string writers you know of a second argument that alright it finally shows me over query to my scholars running obviously I could now do the same thing as I did connect to the start of a call end of a call right and measure the time it takes to run the queries right and all they all just worried I may want to okay uh-huh yes well I haven't looked into into that right front the other end the suit or do you still have to use use weights right a reality is if you look at that you know some are witness applause Rosco comment right you can potentially have those kind of function methods multiple functions having the same name by different signature parameters right so that makes it kind of impossible in simplest plots to identify the function by the name alone right that is why what smuggling he is needed to begin this make sense so okay so with that I would as I mentioned I would encourage you to check out what I would call ETF Bible right somewhere on Brandon and Greg's website has a lot of about material out there which is goes much much further than I can do in this talk and also I'm going to distribute this the slides like when I was preparing for this presentation I put together some of helpful links which you know I found helpful so maybe it will be helpful for for you as well yeah Oh aa-are the map file all the awkward yes my good are they correlated on up of its per processor or are they system-wide it correlated well so it's a look at again I would say I am NOT via fair experience in terms of the implementation right semantically you can create the maps and these are really designed to share the data right from all the traits and depend on what processor they are on right or of other processes so for example I mentioned this to the bio bio licensee right for example which would provide you the histogram of responses to the time distribution for all the i/o requests independent on what what process regram right and obviously they can be run on all the other CPUs now if you look at the example I showed I also can create this but the histogram your process right if I want to to do that right or measure let's say performance the device something I can do that right as well I don't have to just have one global histogram right so you understand a bunch of different data structures which are which are available and what concept is what you can get information from all your system right you can slice it dice by o2 you know whatever data you need okay okay and then finally I want to also highlight that you folks are interested in open source databases we've the corner around the corner live conference later in in a spring in Austin Texas coronavirus permission I think of any any conference relays and the facts I am happy to take the questions we have any yes [Music] yeah well that's something if a trace point I disabled then there is no overhead no no what I'm saying is if you have a static tracing right because how people if you write a program without a GPS trace and you want something to be able to enable and disable some trace in runtime right we often have to put in the code if if if login enabled then log something right for the dynamic station in ETF or DTrace when you do not need that right because you do jump the robot that is what I meant right if I wasn't clear when thank you for clarification okay where else did alignment with your Red Hat shot you probably find whole bunch of other things I wasn't corrected it was okay okay thank you okay any other questions Corrections yeah Deepak - hmm okay how do I get the function which which which allocates memory well look I I do not know exactly from what I've seen in some examples or even access to this to this decorator exam in in the region series and CTF right so you can actually look at the allocations by a by a function right so even the the chain right to put in a hit miss case so for for memory allocation debugging I haven't seen it used for that particular particular purpose right I think in this case for memory leak checking right there are probably some you know some better where to filter right because there I mean it kind of becomes quite complicated right in this case order like the intro so forest data right which is allocate and manor may be needed and waters definitely you know was lost right okay thought yes I mean your good point right in this case what we are I show a common line tools right but you can also use for example if you have exporter right which allows you to basically have those ETF programs right and get the output in the premises exposition formats right then you can load in your in your monitoring system and I have seen some other tools like I think like Sadiq rights has also the in monitor interface InfraGard right and I think there's you know many monitoring tools the iDEN now support for you here as well right and any ready now tried some other monitoring tools which is what we have now okay yeah you had a question mmm yeah yeah well yeah so if you look at them at the fly right in this case I think to fly what in my opinion was very cool tool because it provides a lot of vpf traits like our functionality right in this case vehicles are relatively easy to use right I think with developments momentum if you look at the gauge right now is probably much more of a VPS VCF trace Simon apply side right I hope you're not apply developer and I'm not defending you yeah but yeah there is that to to Volterra zone and I think that's what is wonderful an open circuit system right when you find there are often many tools to to do the same stuff right and then through the years they have like Darwinian evolution right and some of them died and our continued okay well there is no questions then thank you for stopping by and enjoy great sponsors
Info
Channel: Southern California Linux Expo
Views: 368
Rating: 5 out of 5
Keywords:
Id: HPDuc3ze_UY
Channel Id: undefined
Length: 57min 41sec (3461 seconds)
Published: Thu Mar 26 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.