Exploring distributed tracing with ASP NET Core 6

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey folks welcome to another video today let's discuss a bit about distributed tracing namely in.net and net six so it's not a new topic but there are some new interesting things going on so let's take a look so just as a quick intro distributed tracing it's very useful when we are in this world that we are right now building microservices distributed systems and all that kinds of stuff and we want to relate the things that are happening in the system so for example a user makes a request to our web application and that causes a bunch of requests to other services stuff like that and it's interesting for us to help us debug and find issues on the applications we can relate everything from the start of that request from to everything that it impacted so as i said this is not new if you are building applications for example in azure there was already application insights even in other there are other vendor-specific offerings like i used in the past elastic stack which has an apm application performance monitoring solution which includes distributed tracing so it's not something new but today i want to talk specifically about some new standards that have emerged in the last couple of years namely uh w3c trace context which is like a standard for propagating the that context that tracing context as it's the name between all the services so when we start one service calls the other so this is a standardized way to pass information and basically uh i'm not going to go into a lot of detail but imagine that we have like a an id for the trace itself and then we have like an id for each step of the trace which normally is called span not not in the standard itself but in other cases it's typically called spend so traces the whole thing and spend are the uh various things like a span is within one service and then the other service as another spend that because it was called by the first one it's like a child span of the first one so if i go somewhere i will find in here talking about the stuff that go inside of it and we have the trace id and the parent id so the parent span so this is one of the things that is kind of new like recommendation from 2020 and also another thing that's recent is open telemetry which is a standardized way to implement this distributed tracing collection of information and then representing it so these are things that are rather new and for example telemetry.net reached 1.0 in march this year so it's kind of new and what's interesting is that it's built into the framework and we'll see how this support we need to add some external packages but the thing itself uh the way to collect this information has been baked into the the framework so it's easier to integrate not only with open telemetry but eventually with other things it's probably also possible and yeah and what's great about this is standard so no vendor locking so you don't need to be locked to application insights or elastic or wherever you can use open telemetry and you can use it anywhere so for example if you use application insights as far as i'm aware i think you can import open telemetry data that you collect into application insights so if you are in azure you can create your services like using open telemetry to make them not vendor logged in but then use it with application insights and then at some point you want to go somewhere else some other cloud or on-prem or whatever you don't need to change your your application or for the for that reason at least because it's open telemetry you can just then export it somewhere else and it will work so let's see how with this supporting.net in the asp.net car six now to use these things more in terms of open telemetry not so much in trace context because it's kind of it's like underpinning all the things we don't really see it a lot but i'll show you in a bit so let's take a look at that i have a bunch of things but i will show you along the way so let me hide the picture in picture and let's take a look at the demo application and the code let's go here so the sample application is the simplest thing ever it's just an excuse to see things in action so let's start by seeing it in action we just have a text box we put the name and we get hello name so this is not interesting what's interesting is that to get to this i over engineered the hell out of this and we have four different services that are working so we have the web client which is this application that we saw we have a web api which this web client calls it's just a web api we call with a get or post but remember we'll see then the web api calls a grpc service just because of reasons and it also sends a message through rabbitmq that the worker eventually processes so it's a bunch of stuff going on that uh it's completely unnecessary but good for demo so let's quickly start with the application just let's go with the program cs just for the first thing is that the first time i did this um this code i prepared it it was still in asp.net core 3 1 and at that point the w3c trace context format of the storing things was not the default at this point i don't know if it was in five or in six but now it is the default so i add this code setting the default id format to wwc but now it's commented because it is default now nowadays so it's no longer needed but i prepared this a long time ago i have this demo in github for like two years maybe but only now i thought about uh creating a video so this is for w3c trace context and before getting into all the distributed tracing stuff let's take a look at the w3c where we can see it so i'm just here configuring logging and i'm using seek for for logging for no particular reason other than it's it was the easiest i found to create a like a docker container to host because if i wanted to go with the elastic stack i wanted to need to host elasticsearch and kibana and all that stuff and i just wanted one container that i could shove all the the logs into and seek worked so it's the first time i tried it seems okay so maybe i need to investigate more but in this case it was just for that reason i just wanted to have a quick way to send logs to a centralized location so let's take a look at seek just for us to to see this w3c trace context show up a bit so i already made this request so if we go to seek we have a bunch of logs i can for example open the this log and let me put here let's find all related to this trace id so this choice id are all the logs correlated to what request so we can go to the first one you can see that for example parent id is zero because we are in the first part we we will see this better when we see the diagrams generated by the distributed tracing but for now let's just take a look at the logs so it has no parent id it has a span id which is which will eventually become the parent id of the thing that's invoked by this one and we have the trace id that's the whole end-to-end request so this one then made a request to the other service still parent id is still zero because we're still in that that context so you see span id is still the same so start processing sending request and then we have executing endpoint so in this case we are this log is from the the web api and in this case we should have a parent id which should match hopefully i think so span id no no we will see in the distributed trace we will see it better i think we'll see how it works because sometimes there are spans created so a span is not mandatory like something that happens for example the whole request within the web client it's not mandatorily the same span we can internally create more spans so sometimes when looking at the logs we might find weird that this parent id here is not the immediately before span id but when we look at the at the distributed tracing diagram it will make more sense but yeah you can see how choice id span and the parent appear here so this is the w3c trace context being stored here and being part of the logs but now let's dive into the distributed tracing part which is probably more interesting let me go back here so this is the application open telemetry i will open it here so it's crash platform standard all of that pretty cool works with all the things it's part it's a cncf project and we have the opendlammer3.net which is in version i don't know what so a bunch of things but let's immediately get to look at it so this is just open telemetry is just to capture and export this this information but then we need to use tools to look at the at this data again as i mentioned before you could use application insights and probably other things that support it in this case i'll use a couple of tools just to show more than one because they are mostly the same from what i saw so far which is jaeger and open zipkin so these are tools that support exporting traces in more than open telemetry format but support open telemetry format which is what is interesting for us right now so let's see for the same request as we saw in the logs let's take a look at how it shows in jaeger if i go here i select you can see we have a bunch of all the services and jaeger query i don't know what this is but let's select the web client which is the base for everything and just find traces so there are a bunch of things here because the every request that we make is logged by default so by default after we configure open telemetry in the sort of thing in the application of course but the interesting thing that has all those interactions is this one and we can see now all these services working together and how they correlate so we can see we have the web client which gets the allo request and then we are still in the web client but it's in the new span we have a bunch of information here so this this one i think is the webcl the http client support for open telemetry or for tracing in general that creates this part and this one is when the request reaches the web api this one is when the web api is making a request a grpc service request this one is in the grpc service when it starts to handle the request this one is like a i will show you but this is a custom one that i created and then back to the web api making a database call this is the postgres endpoint that it's making a query and then web api again is sending a message to rabbitmq and it's also a custom implementation id and finally the worker receiving the request and it's taking five seconds because i put task delay in there just to take longer to see what happened if what if this task took a long time but as we can see it's all here so let's take a look for example over here in the web api you should have a bunch of information here like hosts http methods stuff like that process i don't know what this is and there's probably more information somewhere we can for example see this again this is assuming this let's say this is zipkin so it's similar web client let's say when query i want more results are these the only results if these are the only results something went wrong last two hours okay there's this pan here should be the same let's see the url c88 c8 so it's the same thing it's just in a different place so if we see web api here a bunch of stuff show all annotations and see a bunch of stuff we can see the span id here and the parent id so let's see let's see so this parent id here is ends in 8 f3 if we go to seek when it gets here parent 8 f3 so we are seeing the same thing and now we can see that this one which is a support for the web client should have here so the span id here is eight f3 so it's the parent of this one and this one is the one that we didn't saw in the logs because it's something it's the the support from the http client for tracing and probably doesn't log and we didn't see in seed in the log so in the logs we might not see everything but in here we should see all the connections so you can see everything working together and i think you can already imagine why this is helpful i found bugs in applications because of tools like these ones because i saw that for example in one request we were making more database queries than were supposed to so we detected these kinds of bugs so that's very helpful even in development time so even more when it's in the server but even in development time it's already pretty pretty interesting to use so okay so now we saw what we get now let's take a look at how to get this working back to visual studio code let's see the configuration so the configuration for to get this to work in for most of the things that we see here is just configuring some stuff in the startup uh importing some new get packages and configuring the startup because it's all like built in so let me just show you for example the cs proj and using here like open telemetry instrumentation asp.net core http so these are the parts for so the http is what gives us this view and the asp.cor is what gives us this so whenever we get a request asp.net core open telemetry is connected this way and then this way and then the same for other parts in the other projects we'll have something for the grpc part something for the database or in this case i think it's specific for entity framework but yeah we can plug it in in any way so if we have a specific library we can add support for our library yet as well so we import this these packages also extensions hosting is to to give us the extension methods to use with di esp.net car sdi and then we have these exporters for jager and zipkin just to have built-in support to shove everything in there so back here in the program we can see so add open telemetry tracing and then we can configure stuff so in the set resource builder we can for example create something if we don't pass in anything it will just infer some name and that's why i use the create default and then add service name web client so i can easily distinguish all the the services then we add the asp.net core instrumentation we can add stuff like this if there are some things that we don't want to be part of the trace we can filter that's an example i would i put here so for example when using i don't know if it's just i think it's because of how to reload in net six it was always making requests to this one and i was like let's ignore it and another thing interesting is that besides we the possibility we have to create our own spans for things for example like we will see for to add support for rabbit mq we can also hook up to the existing ones so i went on debugging and saw that we had like an event called onstartactivity and that's a topic for a few minutes so let's just continue i'll tell you about this part but there's an event called onstartactivity and we saw and we can check if the raw object is a request message if it is and we see that it's an http get request i'm adding a tag just to see that this works so if we go back here and we see here in the web client here there should be a tag here random demo tag adding some random demo so i was able to hook up to the http client tracing capabilities and just shove things in there because of reasons so it's possible we don't need to like customize everything we can just slightly tweak or we can customize if we want then i just have the add zipkin exporter and the jager exporter you should when you're in production and you have an actual endpoint for the zipkin and jager apis configure the uri but it already defaults to the default so it defaults to localhost and the right port so don't need to put this but i just left it here to remember the the values but these are just the default when i created the docker containers it's the default is this so and that's all in here let's just take a look for example in the pages i think it's in the request is being maintained hello and as you can see i didn't do anything this is just a client i'm getting a client making a get request and getting the output of the response which is just an output model and this is because i'm ignoring the nullable reference type stuff i should i should either disable it or respect it but it's a demo code so i don't care but as you see there's nothing related to tracing here because it's all using the built-in stuff that was configured in the in the program cs so in this application nothing else going on let's take a look at the api where much more things are happening so web api program so the web api if you recall we can see here the web api gets the request from the web client and then makes a request to grpc service makes a database access and sends a message to the worker and only after that it responds to the web client so we have a bunch more stuff happening here so we have the grpc client add the b context and then configuring open telemetry stuff and in here again set resource builder just to give it a name add asp.net core instrumentation http client instrumentation then at grpc client instrumentation and this one is an interesting thing so suppress downstream instrumentation true why is this because grpc client also uses the http client so if we don't suppress this we'll have the same request twice because it's the http and the grpc was showing in different ways so when we are using grpc we don't care about the underlying http because this gives us enough information so we can suppress it after this we have entity framework instrumentation if i show you the cs proj so we have here instrumentation instrumentation lots of instrumentation hp.net card grpc stuff and entity framework which is not like official so it's a country package but it works so all good which means you can do it as well for other packages uh and over here we can configure set db statement for text equals true what does this mean we can show the query and this again is interesting to see if the query for example maybe we are checking that we have a slow request handling we can see that there's a query and maybe the query is taking too long we can immediately see the query and see why it's taking so long so i put true so if we go here see the query and we see the statement insert into hello entries stuff so some basic thing but this is interesting maybe we can we can figure out issues just by looking at this without even debugging just by looking at the tracing we can see oh that's why it's slow because ef generated a bad query or something like that just a side note if you are using orms it's a good practice to check the queries that are generated so even if the rms are good and your application doesn't have a big performance constraints it's always a good thing to to check if the queries that you are creating are not some weird things that may cause problems particularly in the future when the database grows so just a quick side note so back here empty framework then add source this is when we want to create a source for traces for spans we need to put it here and we'll see why this is the message publisher so this is to implement the the support for rabbitmq and then add zipkin inspirator and the jager exporter okay so we need to go to the message publisher to see it in action but before let's go to the [Music] let's go before yeah even before that so just for you to see i'm using the minimal api so web the api gets a yellow request gets the username then call the grpc service with the username insert that username into the database over here and finally publish the message to rabbitmq using the message publisher class that i created here okay but before the publisher class this is a bit more complex so let's start with the simpler variation which is in the grpc client i created a custom implementation of of a span so let's start with that one before going to rabbit mq so i'll keep this open grpc service service implementation is here you can see the program as well but it's more of the same add exp net core stuff then add source for the twitter greater service which is manually so let's go to the greater service so i think this is like the default thing that the template generates i just added the the custom stuff so this is what i wanted to mention before the all this thing for the distributed tracing is built into the framework so this activity source if we see is system diagnostics activity source so it's part of net that's cool because for example if you are building a library and you want to [Music] make your library a good citizen in this distributed tracing world you just need to use this and even if the persons are not using if the users are not using the distributed tracing you don't care you you have the hooks if they activate stuff it will work if they don't don't care so this is good the only kind of bad part is that you get used to the naming conventions of open telemetry like the trace and the span and stuff like that and then in dot net it has other names so a span is an activity so that's a bit strange you need to map the namings but besides that it works well that's the most important part but it's strange because normally if you are using um telemetry you would expect to for it to have the same names but yeah whatever it works so if you want to create a custom activity or span or call it whatever you go here you get the activity source with the name and you start activity and it's disposable so when you when it's finished it stops the it's disposed and stops the activity so it ends the span that you see in the over here so this is a custom made one and this is how do i put this down yeah so it's this one see this is the custom span doesn't have good information because i just created that and this say hello if we go here it's say hello so this is the that custom thing task delay 250 milliseconds for reasons and return the result it's done so for a simple custom span slash activity it's just this now for the for the rabbit stuff it's a bit more complex and let's take a look at that back to the web api program message publisher so the message publisher is just an abstraction i created to abstract rabbitmq so i'm using easynet q which is a bit simple simpler to use than the the default grabbit library but besides that it's not very interesting so i'll ignore some of these parts which are rabbit specific and let's go directly to publish async which is the interesting part mixing together the rabbit stuff with the open telemetry and stuff like that so as we saw before for the grpc service we want to create a new activity so we start an activity we called it message sent let me just close this part and we can pass in an activity kind in this case producer we have more options like client consumer stuff i don't know if this is very important to pass or not but just tried it and i got this information when i was going through the samples in the opentelemetry.net repository so i used it so we start the activity we can see here in the activity if i collapse this and we see like message send so web api message sent is this one so we are starting a new span or activity whatever we're creating message properties this is uh rabbit stuff so if i f12 is in that queue so this will be the way we will store the telemetry related things in the rabbit messages so that the consumer can get them on the other side so create these properties then we grab the activity context so from what i understood from this part so the activity context has like the the information about the current trace like the the trace id and parent id and stuff like that and we are grabbing it from the the activity that we created grab the context if there isn't one we are going to activity current which is supposed to be this one as when we do start activity it's supposed to be this one but just in case otherwise default and we are doing this because it says that it can be if we check activity when we start activity it might be null because maybe we don't if there are no listeners maybe no activity is created from what i understood so we can we need to take care of this in case no activity started so we need to consider for this scenario so the activity context should be like the w3c part then there is also this thing here the baggage which is where we can store more things we are not really using it so far but it's used here for the propagation context which is what we will use like to propagate the information from this publisher to the consumer on the other side and we have this propagator.inject that will inject the information in this propagation context which is created from the activity context and the baggage and it will inject them into the message properties which remember is from rabbit mq and to do this it will use this inject trace context we can see the over here so because diagnostics and dopa telemetry stuff like that don't know anything about revit so this is generic so we get a t carrier and then we get the setter that will set the values that we get into that carrier so our carrier will be the message properties and we pass in a function to set those properties this case we got string key string value and we have the message properties again this is for rabbit and we will use the headers to put there the information so if the others are not there just create a new one and put the key value so it's easy because it's a dictionary and it's done so we put this in the message properties and then when we are publishing the message to revit we pass the message properties along with the message itself i'll just pause or stop the worker so that we can see that if we send another message and we go here to the queues and we get the messages we see in your in the others this trace parent that if you recall if you go here to w3c to this context transparent so it's the that thing from w3c trace context that takes the trace id parent id stuff like that and because we didn't add anything else to the baggage namely we have nothing else but this is where things would end up so let me just restart this so we see the message being delivered and it's there so that's it from on this side so we use those things we created the activity so it shows up over here so it's this activity it's very fast but it's this one uh and we propagate all those this context the activity context package using the message properties so on the other side in the worker we'll have a handler that can extract this from the message properties so we can start an activity on the other side so let's go here and worker so for the program nothing really interesting just open telemetry search resource again add source because it's a custom one adding the exporters so then we have a message handler which is like a background service just for handling these to be running in the background of the dot net application and again ignoring the rabbit specific things we have to consume method where we can pass in a function to consume the messages and now we do the opposite of we did of what we did on the other side so if you remember over here we have a propagator inject and on this side we have propagator.extract so we get the provocation context using default because we don't really have anything so on the other side we were already in the context of an activity so we were creating like an um a child activity so we had something to get from but in this case we are in the worker in the background service and there's no activity going on here in on the other cases the activity is like created when the request to asp.net starts to be handled but in this case we have nothing so we need we are doing the same work that asp.net does on the other side we are doing for the rabbit the case so we are the default no need we have nothing so we grab the message properties because those we receive this is part of the rabbitmq library we got the properties and we have an extract trace context which is like the opposite of inject trace context where we go through the properties headers and try get the values and we decode those values if there are nothing we just return empty it's all good so this does the opposite work then as we extracted this par this propagation context we can grab the baggage and put it in the current baggage which should be empty and then we start an activity on this side and in this case the other cases because we already had a parent activity we didn't really have to pass anything but in this case we didn't have a parent activity so we need to pass besides like the the the label we are saying that it's a consumer and then we are passing the activity context so it because it can't grab the activity context that is like omnipresent i would say so the activity context that's implied because it doesn't exist we are grabbing the activity context that we extracted from the message information and then i just felt like adding some more information so i added some tags so just to see server machine name so we can see this activity here it's this one that takes five seconds because there's a task delay here five seconds and we take a look at the tags we see the server jd home zero which is the name of this computer and then we have spent kind which is that other thing and these things is just some other information that could that we could ex put so add messaging tags it's been called over here so just adding some extra tags directly to the activity so we can either pass them here when creating the activity as i did here or we can add them afterwards as i'm doing over here and that's mostly it so in the other in message publisher we put things in the headers so we can extract them in this side and we use this propagator stuff to help us doing this so not very hard some things are a bit weird like this propagator inject is a bit strange all this thing maybe it probably makes sense it was probably the best way to do it but still kind of aware that we pass in a propagation context some message properties and then a function to insert things but yeah besides at first lens being a bit strange it's not that hard to understand and yeah so these kinds of things probably we would do let's say if you are implementing a library for example and you want your library to work well in this context and maybe besides in the library if you are creating your application and you see that there are some spots where you feel it's more likely that it may become a bottleneck or you are not sure or something like that so for example in the past i worked on the project and we were doing like document transformations and starting reading from file system writing stuff like that and we were not sure maybe that place could be like might be a problem so we put some activities in this case we were using another technology but still we wrapped that up in this sort of pattern so that we could see what was going on so and it was helpful as i think i briefly mentioned we found some bugs due to these like because more queries were happening than we expected and more time was taken right into these stuff like that so it's very helpful even in development time but even more when your application is is in production so that's it for this one i hope it was helpful to understand how this works and why this is very useful let me know in the comments if you are already using these kinds of things open telemetry specific or other ones let me know any ideas any comments as always and yeah these tools are cool jaeger zipkin probably maybe there are some more more pleasing to the eyes or something like that but these seem to work well and open telemetry as well seems like a good choice so hope you enjoyed it and see you in the next one see yes

Info

Channel: Coding Militia

Views: 694

Rating: undefined out of 5

Keywords: dotnet, dotnetcore, coding, programming, c#, csharp, aspnet, aspnetcore, distributed tracing, w3c trace context, opentelemetry

Id: l1_i8p2hVlE

Channel Id: undefined

Length: 45min 59sec (2759 seconds)

Published: Tue Oct 26 2021