Golang Microservices: Observability using OpenTelemetry

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
when building microservices we need to care about the internals of our system we need to measure what's happening behind the scenes record that data and also react to those values to understand the bottlenecks and perhaps make changes hey my name is mario and today i'm sharing with you how to do that using open telemetry jager zap and prometheus so what is open telemetry open telemetry is a set of sdks apis tooling and integrations that in the end are designed for management of telemetry data such as traces metrics and logs those are the three pillars in open telemetry and the whole point of this project is that it is a vendor agnostic implementation it doesn't provide a back-end but it does provide all that specification and configuration and ability obviously libraries and packages that you can use with any different programming language obviously because we're building micro services in go we're going to be using the open telemetry go packages so what is the support currently in go well tracing is beta metric is alpha and login is not even implemented but that shouldn't you know discourage you of using this package the cool thing about this is that you can actually use it and the way they actually have for go is there a bunch of different either exporters or the packages for instrumentation or even external vendors that you can use and they actually work with the currently implemented packages that exist you know available on github now the cool thing about this specification is that because we are using a or the whole point of defining all of these configurations and sdks is that all of the different vendors will have or are going to be supposedly going to follow the same specification and the same metrics or rather the same way to define metrics so you can use different tools to do different things but at the same time you can use the current documentation to start defining those values like key value attributes now i have in the slides that again will be in the link the link will be in the description if you want to check them out you can look at those and you can see how of this is implemented the way they have it is that they they even have different conventions for different type of calls like database for example if you look at the at the example i'm giving you right now you can look at how open telemetry is telling you you know what use db system just to take an example use db system for defining the database that you're using which you can be postgresql redshift and so forth but the same idea applies to some other things like for example if there is a statement or what operation you're using and i'm pretty sure you get idea now all of these in the end make sense when we are defining the different segments when we are defining our own interactions with our own api and i will show you that next so let's jump into the code so as usual the code to all this repo or actually to the demo of this repo uh will be in the link you can clone it and play with it i'm use i'm using the most recent open telemetry uh version or i believe is 0.19.0 and just one thing to keep in mind is that depending on the version that you're using it's most likely that the examples are not going to be making compile at all what this means is that you really need to look at the source code and it's one of those things that i i have a problem with uh with all use when using open telemetry is that because it's changing so frequently um you may need to change the way you are calling open telemetry and i could be like a turn off for some people because obviously if you upgrade to the new version it will literally fail and not compile so that's something to keep in mind now the changes i made specifically for this video for covering open telemetry are related to important everything that we need for exporting data or metrics through to prometheus tracing data using gorilla mux log in some data which like i said before is not actually supported by open telemetry but still i want to show you how to log data using zap and the last one we'll be using jeer where we're going to be doing some up some tracing distributed tracing between the requests that we are receiving the request that we're getting and i will show you the user interface so you can get an idea of what's happening now after you import all of these different packages what you have to do is look at the two functions that i have defined here one of them will be for instantiating the exporter that happens to be using prometheus and the other one will be for jager which is this is a tracer and the configuration for the tracer and the exporter and the and then i mean the the for the metrics is literally just what you see in on the screen is just instantiate the prometheus exporter you set it and then you go to jager and then you specify that tracer and what i added as well is for adding some extra metrics is added metrics for the runtime the runtime being the runtime of the service that i'm running locally and i have been running later on i have been running it luckily for a while so you can actually look the tracing in prometheus or on the prometheus user interface rather and you can see how how cool that is now other than that everything else is the same oh so um one thing i forgot because we're going to be adding segments or rather we're going to be specifying a few different ways to clearly call out the things that we are using when interacting between the different layers that we have out on our implementation we can allow to define those those configurations when we are using open telemetry and an example i can give you is that if we jump to the service package and we look at task you will notice that we have a span right here and what this pan does is that um when we get a request let's say we're using in this example is this will be the create request or the post request we get the post it creates a segment that then we can specify expands which if you think about it will be like think about this we have a segment and each one of the different calls that we're making we can define expands and the idea is that you can actually measure those spans and perhaps depending on how long or how fast or how slow you can make different different changes let's say maybe the database is taking too long perhaps it makes sense to add a caching layer in between and then you can determine by looking at the actual results that are coming from jager and all those tracing metrics that we're doing so we're doing this on the service layer and if we go to postgresql and you look at tasks you will notice that there is something similar right here let's go up and you will see that is something similar here we are defining a new expand that we is getting from the context and if you are not familiar with the context this is a nice way to sort of pass values between different layers sort of and i will definitely cover how context works behind the scenes but in the case of open telemetry the context parameter that is passed in is the one that is actually uh passing down the the actual value for the open telemetry segment or trace that is created in the originally in the beginning so that way we can refer to the same one and then we can concatenate that one with the previous one that was created so if you look at all of these you will notice that in the end in main i'm actually actually i'm also defining the uh what is it oh here right here the middleware for actually doing that gorilla mox uh instrumentation that i was telling you so if we look at the user interface that we built previously all right we didn't build an user interface we're using the what is the name i did i forget to add it here yeah i forgot so let's add it to the swagger ui if you remember we had a usual ui that we implemented preview previously in a different video and again the link will be in the description if you haven't seen that one and what i want to show you is that if we look at one of the tasks that we have right here what we're going to be doing is we're going to calling it we're going to call it using the swagger ui you know user interface obviously and then we're going to execute it and you will notice that obviously everything is working as expected i didn't actually change anything in the domain or anything in the repository or any of that we are just adding new values to actually define different um spans or or way to identify where the calls are being made now if we look at uh jaeger for example where are you oops it should be right here if i refresh this one you will notice that it something just happened if i jump into the task specifically because if you notice we're doing the get request you will notice that there is a request that just happened right here and it has three spans and this is what i was telling you previously that there is this request that is happening on the handler layer or the http layer and it goes then next to the service layer and then finally it goes into database and you can measure how slow or fast there are how long it's taking from looking at the way the user interface is presented so in this case this one is taking about three milliseconds the one for the service this one is taking about also three milliseconds um and in the end everything starts taking about three milliseconds in oh actually you know this is 20 i start microseconds and okay so you get the idea this is some some sort of uh time that is taken for for each one of the requests um that we're making and the same idea happens if i go and decide to let's say i create whereas my here and if i decide let's say i'm creating a new and let's call it new example oh it's a type no i don't know how to write example it's a high priority and if i run it it's giving me a 201 that's expected if i jump into my jager and i find the traces you is not going to be here because if you notice the operation that i selected is not the right one but if i look at tasks and i find traces you will notice that it's right here and again it has three spans which are the different calls that we previously defined the call to the service dot create the service task create and also the task create on the postgresql package which is defined right here which again if you notice i was adding if you paid attention to the way the task is implemented right here there is this and there is an attribute called db system that indicates uh the the database system that has been used in this case is postgresql and if you notice it's actually right here is a tag that is coming through swagger through swagger through jager via the open telemetry sdk and it's been displayed here back you know on the swagger user interface this is really cool right now in the case of prometheus because we're doing some metrics you can see that i decided to define a runtime which is an implementation that is available part of the the official repository that is coming from open telemetry and the way the open telemetry team has defined a few different exporters or metrics is that you can actually go and look at the the official ones and you can start using them they have for hd the net http package they have for the manchester um package from uh brad they also have a few for kafka also as you can notice they have the one for gorilla max and they have a few other ones for jake zipkin and prometheus so there are a few different ways to interact with the open telemetry api at the moment having a specific on concrete back-end systems so going back to the one i was telling you which was run time which is right here this one but it does as you can guess at the moment this is it literally takes the runtime values like uh go routines available how much memory is running you know how much time they took those kind of things those kind of things like stack heap those kind of values and you can actually use um a oh no another one and you can actually use that to render different graphs so different graphs so if i do i don't know go routines go routines how many goal routines are running and execute well at the moment i'm looks like i'm running eight core routines but i don't know let's see hip a location and there is some hip allocation that is happening on right now on my server at the moment so all of this is super cool and and not only that because all of these keep in mind that we have an sdk coming from call open telemetry that is not defined in a back-end it's been vendor agnostic so you can literally connect this to different providers at the moment exist a few examples are like light step i will all of these will be linked in the description if you're curious i i know that datadog and your relic also have some sort of packages that allow you to interact with open telemetry as well and if you look at the way this is implemented is that you can actually not only doing that but it also allows you to measure different metrics between different services when you are connecting and interacting with them now specifically the thing i'm differing to is when we're calling a let's see if you remember back previously a few videos ago we were discussing about using the cli we built a cli using the open3 api then again the link to that video will be in the description but if you remember it's just basically using the api that we built before and then creating a new to do a task a task updating it and then getting it so it's literally doing a few different different actions using the open three api which is our rest api now if we go and jump and look at the user interface what is just what you're going to be seeing is that there is a new cli app that i use executed right here with this argument with this environment variable and then is posting to our own rest api and if you notice this is the cool thing about the tracing option that open telemetry gives us gives us is that you can actually trace and connect different services and in this case a cli that is interacting with our own api so you can determine hey this api this request went through went through different steps that happened so you notice that here the cli is interacting with the to-do api is using the tasks endpoint is doing a post and then it's calling the create and then finally is doing the database call that is happening that happens to be creating the record in the database so just consider when we are doing working with distributed systems how they connect to different things this is a fantastic way to determine how all of that is happening now this is not the only thing that is happening if you go and let's say there are errors because there are always going to be errors and let's say i do a priority an evaluation let's call it invalid right and that is going to going to trigger an error is it just trigger an invalid request all of those values are going to be also saved in in our jager user interface because we're doing open tracing which is right here so there is an error that we are measuring it as well and it says that is a con converter unknown value which is what we're expecting which is super amazing all of this now how cool is all of this well um if we think about it we have a way to oh i forgot something the the other thing that obviously because an open telemetry is not clearly implemented just yet in go is the idea of logging there is also a specification for actually determining and indicating how to log of values properly depending on on the fields depending on what we're trying to save but sadly at the moment the package the go package doesn't support that it's not implemented at all so the way i am trying to give you an idea how this is going to work in the near future is that i am using the zap package that is coming from uber i'm just literally just defining a new logger and what i did in this specific simple use case is to define a middleware that happens to be logging and recording the requests that are coming that's why you see this now you might be wondering what is this metrics endpoint well the way prometheus works or drier the way i implemented the integration with prometheus is that i define a new handler right here that happens to be the exporter that was defined down here in the tracer and that one allows prometheus to actually pull the values from our instance and if you look at and highly encourage you to again look at the code and you will notice that the way it's configured is that literally just pointing to our prometheus and rather prometheus is pointing to our local service uh our server that is pointing to the port 9234 slash metrics and that's how prometheus is literally pulling the values from our server that's how it actually can save and render that data on there on its end so that is uh really cool so that's a that's how we cover metrics we cover traces we cover logs now going back a little bit to logs again is that sadly there is no way to use those thing those things in open telemetry but you can consider that if you're using something like a log stash or maybe you're using something like a splunk you can actually um receive the data ingest those logs and do something with them perhaps build some dashboards or maybe filter the values depending on the the level that you're using or maybe depending on the message which in this case i'm using message for the method but maybe you can define a different field for indicating different things and and react to those values the cool thing about zap and logging is that it allows you to do some sampling so you can literally you don't have to save all the records all the time you can sample the logs depending on what you're trying to do and again this depends on what what is what is we're trying to achieve with the logs most of the times logging is not really really necessary unless you're trying to really look at something that is happening happening happening frequently and perhaps you need more information of what is happening but again maybe you can use an instrumentation for those cases as well so again it depends so but it will still we have the option to log with open telemetry now with all of that being said what are my final thoughts about open telemetry i think open telemetry is a fantastic idea because it allows you to specify and define the things that multiple services can use and communicate with each other literally think of in cases where i'm using go here and somebody else somewhere else is using java or somebody else somewhere else is using c sharp and all of us all the different microservices can communicate with each other and pass down those values for tracing purposes to indicate how our data or how the transaction happened down the different requests and that is i think the whole point of doing all of this the the idea of having a way a concrete specific way to define those kind of metrics and a way to visualize those values as well and because it's vendor agnostic you can literally build your own tool or use something like prometheus or jager uh and and use it like really easily now the problem with this is that literally there is no the support that currently exists is you can see that it's not as as good as it is i mean it works definitely but it is still a work in progress that's what i'm trying to say that's something that you can need to consider as well now that's all i have for now thank you for watching and again if you have any questions just let me know i will talk to you next time take care you
Info
Channel: Mario Carrion
Views: 1,874
Rating: undefined out of 5
Keywords: golang, microservices, golang microservices, build microservices, golang web development, golang rest api, golang tutorial, building microservices golang, golang microservices tutorial, golang newrelic, golang opentelemetry, golang error wrapping, golang jaeger, golang zap, golang prometheus, golang opentelemetry tutorial, opentelemetry golang, golang lightstep, golang tutorial opentelemetry, golang observability, golang monitoring, golang logstash
Id: bytCFQJ43DE
Channel Id: undefined
Length: 21min 45sec (1305 seconds)
Published: Fri Mar 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.