Introduction to Tracing : OpenTelemetry & Opentracing

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] distributed systems and microservice architectures can be really complicated in terms of networks when a customer clicks around on your website there would be many web requests going from the client's browser to your web server and on the back end there would be network requests flowing all over the place between microservices databases proxies and more with the customer clicking around on your website and traffic flowing all over the place to back-end systems it's often difficult for engineers to find out where bottlenecks are and where all time is spent time could be spent at the networks a database surfacing up all the data or applications processing business logic today we're going to run a micro service architecture the user will go to our website in the browser the website will show our youtube playlist each playlists having videos the website gets the playlists from a playlist api the api gets its data from its own radius database for each of the videos in the playlist the playlist api needs to gather the video data so it calls the videos api which has its own radius database it makes a call for every video in the playlist before sending the complete video catalog to the browser if the system becomes slow how do we find the bottleneck is the problem with the videos api the playlist api or the redis database today we're going to be taking a look at distributed tracing and open telemetry tracing helps us instrument our system so we can figure out where time is spent so we've got a lot to cover so without further ado let's go [Music] so if we take a look at our github repo we have a tracing folder and inside the tracing folder i have a folder for applications which is going to be all the applications that make up my micro service architecture i have a docker compose file to start up all the applications and i have a readme and this readme is our introduction to distributed tracing which is going to cover the example microservice architecture what it looks like the traffic flow how to build the applications how to run them and how to access the application in the browser as well as the tracers so be sure to check out the link down below to the source code so you can follow along so let's take a look at our micro service architecture so what we're going to do is run an application that returns playlists with videos in the browser so we have our first applications which is called videos web which runs in the browser which is a simple static html page running on an nginx web server then we have a simple playlist api so the videos web will make an api call to the playlist api to grab a list of playlists the playlist contains a title or description and a list of videos the playlist api will make a call to the playlist database which running redis and that holds all the playlist information and to add a little complexity to our architecture we're also going to add a videos api which holds all the data regarding videos so here's an example of what a playlist looks like it has an id it has a title and it has a list of videos now for the playlist api to get the video data it has to make a call to the videos api so the videos api has its own database running redis which has all the data about videos so if we take a look at the full application architecture we have the videos web makes a call to the playlist api which loads up all the playlists from its database for each one of the videos in the playlist it makes a subsequent call to the videos api which loads up the data from a video's database now if you want to dive into the source code it's very simple on the left here i have the tracing folder and i have the applications go folder here as most of the applications are written in go so we have the videos web which is a simple index html page and a docker file of how to run it the docker file is simply using nginx copying in all the html in css and serving it up the videos web makes a call to the playlist api and if we take a look at that one it's a simple go application with a docker file and the playlist api will make a call to the playlist database this is simply a rarest database so i have a database file here that contains pre-built playlists the playlist api will make a call to the videos api which is again a simple go application and has a docker file on how to build and run it the videos api also has its own video database as you can see here with another pre-built database file that contains all our videos then we also have a docker compose file which helps us build and start all the applications on the same network so you can see here i have a docker compose file services i have the videos web playlist api database videos api videos database and jager that i'm going to show you in this example now feel free to try this out and have a look at the source code to run it the readme shows how simple it is all we do is change directory to the tracing folder and then we go ahead and run docker compose build and if i run that that'll go ahead and build up all my container images and to run all of it all i do is simply say docker compose up and if you're running this locally just make sure that port 80 is available so i'm going to go ahead and run that that'll start up all my applications it'll also tell me how to access the application so i can just go to this url on localhost i can paste that in the browser hit enter and notice there's a bit of a delay and our application comes up so you can see here's a bunch of playlists and in each of the playlists we have videos now notice when i refresh this page that there is a couple of seconds of delay now from an engineering perspective i would like to know where the time is being spent and what is causing this delay if we have many services like the videos web the playlists api the redis videos api another radius and possibly more in reality these services could be written in different languages perhaps managed by different teams in different parts of the world therefore we need a standard way of doing tracing so this is where open telemetry comes in it helps us provide some standardized way of instrumenting our systems and producing traces and that helps us measure application performance and find out where time is spent and where bottlenecks are so what is open telemetry open telemetry is a set of apis sdks and two links and integrations that are designed for the creation and management of open telemetry data such as traces metrics and logs so why do we need open telemetry now i think we've covered most of the reasoning but there's some good points down here firstly it's vendor agnostic so you're implementing an open standard so if your company is using provider a for metrics and tracing and they decide to move to provider b you don't want to have to rewrite all your applications to change your instrumentation code so open telemetry provides some sort of standardization that companies can follow making it easy to move between vendors it's a single collector binary that can be deployed in many ways so it's very flexible and dynamic it provides an end-to-end implementation so you can generate emit collect process and export telemetry data it gives you full control with the ability to send data to multiple destinations it also provides an open standard semantic convention to ensure vendor agnostic data collection it's also important to know that open telemetry is not an observability back end like jager or prometheus but simply provides some standard open source practices of dealing with metric data so in this demo we'll follow open telemetry practices to instrument our microservice architecture and then we'll choose a supported backend system like jager or zipkin to collect our traces and visualize our traces as well now in order to get traces from our applications we have to instrument it using code so if we take a look at the open telemetry documentation they have an instrumenting section which talks about the facilities of instrumenting applications so you can see open telemetry provide a bunch of repositories for different programming languages so they provide a core which is the implementation of the open telemetry api and sdk and that is used to manually instrument an application that's something we'll be doing today and then they also have an instrumentation library which contains all the core functionality plus automatic instrumentation for variety of libraries and frameworks for example if you're running.net and you're running a dotnet web server you might have some automatic instrumentation capabilities for instrumenting the web server itself and then you can also use the core which allows you to instrument manual functions and business logic so in this documentation they talk a bit about automatic instrumentation as well as manual instrumentations and you can also see on the left hand side here they have all this different programming language supports if we take a look at go they point you exactly where the current release of the library is so in a bit we'll take a look at how do we instrument our code now let's talk a bit about collection once we've instrumented our application and we're producing traces we have to send it somewhere that somewhere is a collector now the collector can be deployed in multiple ways now the collector contains a vendor agnostic way to receive process and export telemetry data and it has some very good documentation we take a look at the getting started section we can see that it provides two ways of doing deployments we can either deploy the collector as a single instance running either as a sidecar or a binary on the same host or a daemon set the applications will then point to that collector and send all the traces there or we can deploy it as a gateway which is basically a standalone service such as a container or a deployment and that means you can have like one per cluster one per data center or have it running in some other region so it provides a flexible way to deploy the agent and they provide some good documentation on getting started with running it in docker or running on kubernetes now if we click on the documentation link here we can see they provide us a reference architecture and here we can see an example of two types of collector deployments so we have the application sending the traces to a collector on the same host this could be a sidecar container it could be a diamond set or a binary installed on this machine or the application could send it to an external service running on another node maybe one per cluster or region so we can deploy this as an agent on the node or as a service they also talk about greenfield environments and brownfield environments so if you're testing this in a green field you can use the open telemetry collector as a deployment agent or if you're going to a more brownfield environment you should be deploying a collector which supports many popular open source wire formats so something like jaeger prometheus or fluent bit so in this example i'll be taking a look at a jaeger deployment now we spoke briefly about instrumenting to create traces from our application as well as collection where we receive and process these traces but what is a trace and what does a trace look like when a service a calls service b service a will look in the request headers for the trace context the trace context holds tracing information and acts as the glue to link all the spans together service a can create a new span which has a name and a timestamp of the entire network trace it adds the span to the trace context and makes a call to service b service b receives the request gets the trace contacts from the header and may create its own span for whatever work it needs to do like calling a database it creates a span to record its overall work it then creates a child span to record the database network call and it gets its records back and creates another span to track the data processing finally it adds the spans back to the trace context and sends it to a collector all of these spans together form what's called a trace which paints the picture of the workflow for service a to b so the trace context is the glue that holds all the trace data together it basically links all the spans and the services passes it to each other in the request headers so it provides a common way of network calls to be traced now usually each service will add its own spans to the trace and a span is a unit of work that has a name a start time and an end time this could be a database call could be a call to another service or it could be processing some business logic the task of doing instrumentation is basically the task of creating spans to track any operation you want in your code so let's take a look at our code to see how we can set up all these trace contexts and spans and take a look at how we can visualize it in a system like jager so taking a look at a code example let's look at our github repo under tracing applications go and let's take a look at the playlist api first since that is the first api that will receive traffic now web request will come from the videos web browser and hit the playlist api so let's take a look at app.go and this is where we have our code inside of a main function so let's briefly run through the code and i basically just set up and instantiate a collector so i create this configuration object that connects to jager on this port i then create this tracer object called tracer and i set it up using the open tracing library as a global tracer and then i proceed to define my business logic and i have this router to handle get requests on the root path so any request to this playlist api will run this get request code now the first thing i do regarding tracing is i create a span context and i basically read the tracing context from the headers i do this because there could be some services that do some instrumentation before it hits my service so it's always good to read the tracing context first from the header and then extend it further this is just in case the videos web is doing some instrumentation or there's some proxy in front of my service that also does instrumentation so i don't assume that the service is the first one to receive a request so the first thing i do is create the span context and i read the context from the headers and the span context is something we pass down to subsequent functions that we want to instrument so you can see here i create a span and this is a span that will trace the entire operation of this get request so we start the span we give it a name called slash get so we can identify it in the jaeger ui and i run defer finish this will ensure that the span finishes whenever this function is complete i then proceed to add the span to the context and i pass the context down to a function called get playlist and if we collapse this function main we can see here's the function called get playlist and this is where the playlist api makes a call to the raiders database and here i'd like to instrument that call to see how long it takes so what i do is i start a new span from that context that we pass in and i also do a defer span finish to ensure that this new newly created span that i've created here finishes whenever this get playlist function finishes so that's the reason for the defer statement and then i go ahead and pass the context down to the redis get call as well so this child span will track the call to raiders and will be appended to the parent span and then if we continue with the main function and we scroll down we can see that once the playlist has been retrieved we then loop through each of the playlists and loop through each of the videos inside of that playlist and what we do then is make a call to the videos api to get each video data from the videos api and append it to our video catalog that we pass back so i'd like to create another child span to instrument that work to see if there's any network latency between the playlist and the videos api so i create another span like this by passing in our context so that creates a new span and i call this one videos api get since it describes the call that we're about to do i then proceed to make a network request and then what i do is i actually inject the context into the header and pass it along when i make a network call to the videos api that means that the videos api can then read the trace context and proceed to instrument from there so we'll have a view of the entire workflow and then we say span.finish so this is the playlist api if we go ahead and take a look at the videos api take a look at the app.go we'll see a similar implementation over here so firstly we import all the open tracing libraries as well as the jaeger client if we scroll down we can see our instantiation of our configuration here and we tell it where the jaeger instance is similarly we create a new tracer object and we set it as the global tracer and here we have a separate route defined for handling the get call of the videos api now the videos api basically takes in an id of the video and then calls redis to get the video data for that video and return it as json so the first thing we do here is create a new context and we get the information from the header and remember that the playlist api would have injected the trace context into the header as well so we receive that context over here and we're able to append new spans to that context and then we start a new span to track our overall work for this web request so i start a new span called slash id get and i say defer span.finish which will finish the span as soon as this entire get request is done and then what i do is create a new context here from that span so i pass the span in and i pass the context to this video function and this function is basically the function that's going to be making a redus call to get the video so if we proceed to collapse this main function we can see here's our video function and this is the one that makes a reader's call to get the video data so you can see here it creates a new span from the context that we pass in and we call this raiders get and then we also finish the span so we do a defer here so the span will automatically finish whenever this video function is finished and what we do here is we just do a simple red as get by passing the video id to readers get that video back and if there's no results we simply inject the span context back into the headers and return no results but if there is results we're gonna basically return that video data but we also inject the span context into the headers so this is how we trace the code now if we go to our tracing folder applications go we go to one of the apis like the playlist api and we scroll up to the main function we can see that this is where we connect to jager so we connect to jager on the address jager and this port and if we take a look at our docker compose file we can see that we're also running jaeger here and we give it a container name called jager and that's how they can connect to jaeger and then also we've defined the publix ports to connect to jaeger on the user interface so you can see in the docker compose file i'm also running the jaeger tracing all in one demo image and this image contains all the jager collection and the ui and all the stuff we need inside of one image now this is not a jager introduction video but if you want me to do a deep dive on jager please let me know in the comments down below and we'll proceed to do so and if we look at the ports on this jaeger image we're exposing port 16686 and this is the port for the jaeger ui and if we take a look at our readme i've also put the link in here so you can copy this link and go to the browser to access the jaeger ui so if we do that we can hit the jaeger ui automatically we can search for our traces and you can see we're getting two traces here already the playlist api and the videos api so we know that the playlist api is the one that receives the request first so we click on that we hit find traces and we can see here we have our playlist api tray so this is one single trace of a single network request and that is me going into the videos catalog and hitting the enter button and notice that there is a delay so we want to find out where this delay is coming from so if i go to the jaeger ui and i click into this trace we can see the entire length and all the network calls that's happened as part of this trace every single span is on the left hand side so what we can do is if we go to the service and operation side here and we collapse all we can see that this is our main span so this is the main the first span we've defined in the playlist api which is the main parent span if i go ahead and collapse this one we can see that we have next up is the playlist api readers get so this is the call that goes from the playlist api to redis as a child span we can see that it only took 2.6 milliseconds to return data from redis then next up we can see that the playlist api makes subsequent calls to the videos api so these are all the video api get calls that are happening and you can see they're all relatively quick if we expand one of them we can see the videos api appending a trace to that and this is the operation of the videos api so we can see that the entire videos api operation took 2.7 milliseconds but the playlist api outside took 5.8 milliseconds so you can see where time is being spent by expanding all of these little spans and if we expand that videos api get we can see that this is the call to redis so redis is taking 1.6 milliseconds for the result to come back to the videos api and the remainder of the time is processing that data the difference between this time and this time is obviously spent by the playlist api for processing that data before sending it back to the browser so we can see each subsequent call to the videos api and here is where the alarming bit comes we can see the videos api get request happening here and if we expand that we can see that it's taking very long to get data back it's almost six seconds and if we expand that one we can see the redis call is really quick so we're getting data from raiders really quickly but what is the videos api actually doing on that get call so now we can jump back to our code and we can actually go and look at the videos api we go to app.go and if we scroll down we can see i've injected some dodgy code here that basically says if the video contains this id and delay is true then sleep for six seconds so this is the delay we're seeing inside of the tracing ui and if i go to the docker compose file and i go to the videos api i basically get that delay from an environment variable so this is code that i've injected to show you how you can track delays similarly we can also track errors so if i comment this out and i create this flaky equals true and we go back to the videos api app.go i also have this environment variable that i read called flaky and basically what i do is i just generate a random number and as a if it's less than 30 i just say flaky error has occurred which will cause a 500 server error and if you go to the playlist api into the app.go section and you look at where we create this web request you can see here that if the error is not equal to null we set this tag called error on that span so that is telling the open tracing library that an error has occurred so it'll mark that span as an error so what i'm going to do is i'm just going to stop everything for a second and i'm going to say docker compose up because we've now enabled that flaky environment variable and then what i'm going to do is head back to my video catalog and refresh it a couple of times and we can see that an error has occurred here it's not rendering properly and if we go back to our jaeger ui refresh select the playlist api hit find traces we can now see that errors have come back and if we click that trace we can see there's three errors here and it highlights where those errors have occurred so this gives us good visualization into the network traces to look for delays as well as errors so hopefully you have a better understanding on what open tracing is and how distributed tracing can help you find bottlenecks in your microservice architecture now remember all the source code is down below so be sure to take it for a spin and learn about tracing now if you liked the video be sure to like and subscribe and hit the bell and also check out the link down below to the community page and if you want to support the channel even further be sure to hit the join button below to become a member and as always thanks for watching and until next time [Music] peace you
Info
Channel: That DevOps Guy
Views: 11,606
Rating: 4.9388146 out of 5
Keywords: devops, infrastructure, as, code, azure, aks, kubernetes, k8s, cloud, training, course, cloudnative, az, github, development, deployment, containers, docker, rabbitmq, messagequeues, messagebroker, messge, broker, queues, servicebus, aws, amazon, web, services, google, gcp
Id: idDu_jXqf4E
Channel Id: undefined
Length: 23min 38sec (1418 seconds)
Published: Mon Apr 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.