Monitoring, the Prometheus Way

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so yeah I'm Julius I'm one of the creators of the Prometheus project and I basically yeah this is not really a black belt talk it's more of an overview intro talk to Prometheus with some darker stuff tacked on to the end so I'm going to talk a bit about what primitives is why it does the way it does think and then at the end show you how you can monitor stuff on darker with Prometheus so first of all what is Prometheus it is a systems monitoring tool with a built-in time series database and we care about the entire path of monitoring so not just storing data so we give people client libraries to instrument their services or other ways for getting metrics out of things that you care about and then we collect those metrics we store them and then we let you do useful things with them so query the metrics through some kind of ad hoc debugging or doing a lodging based on the metrics or of course you know dashboarding having nice graphs on the wall on the wall and we really care about all levels of the stack it's not just a network monitoring system or host metrics monitoring system but it goes all the way up to service application level metrics and Prometheus was especially made for dynamic cloud environments where things are moving around a lot so that team will come up with a couple of times there are sometimes people confused about what primitives is and what it isn't so we very explicitly do not do logging or tracing where you kind of care about individual events or what path with what timings in a request took through your entire stack we only do numeric time-series metrics if you want logging use something like elastic search if you want tracing maybe you know open tracing with Zipkin typically you you want to have all three you know metrics as well we also don't do any kind of automatic anomaly detection where the system just looks at the collected data and automatically alerts you thing looks odd but we allow you to define very explicit possibly complex alerting rules also parameters itself only has a local storage so that of course not infinitely durable or scalable but that's also by design so primitive started in 2012 both mad proud and I back then came from Google to SoundCloud and then we started at open source from the beginning but only really published it in 2015 and shortly later last year we joined the cloud native computing foundation as the second project after kubernetes and since then you know everything has just been growing and growing so why did we build Prometheus SoundCloud back then was already in a kind of unique situation in that they had an in-house cluster scheduler self built before docker existed before kubernetes and so on existed and it's already used containers and it already scheduled services around dynamically and some clock back then already had hundreds of micro services and thousands of instances and these were moving around on different hosts and ports like every day and the monitoring tools dead sound clocks to use back then we're graphite ganglia statsky Nagios and so on and they were really from a different age that wasn't really made for this kind of dynamic world it was for this world where you know one host runs one type of application and you kind of know where things are and you can configure things pretty statically graphite data model and scalability also wasn't really enough anymore to pinpoint exactly you know if a problem like a latency spike happens somewhere in your service is it only happening in one service instance or is it happening across your entire service so having that ability to really drill down but also aggregate up and do fancy kind of computations on your time series data what's missing so for those kind of reasons and a couple more we eventually decided to build Prometheus the architecture of Prometheus is kind of like this so typically you run one or multiple of these central primitive servers in your organization depending on your scaling needs or your organizational needs and then the primitive server is the heart of everything it's where the magic happens it knows where the things are that you care about and we call these targets Prometheus grapes metrics from those targets over HTTP in a format that we define and these targets can either be your own web applications that you can directly instrument and put these metrics on or they might be stuff that you don't want to stick like metrics code indirectly like you know the Linux kernel or a my sequel daemon and for those we have things we call exporters little sidecar jobs that you run next to the things you care about and they do the translation of the one matrix format to the primitives format so primitives then stores the data and you can either query it over Griffin HTTP API or the built-in UI or you can also configure prometheus to calculate alerts based on the collected data and if there are any alerts hiring there will then be sent to a component we call the alert Manager which then does final aggregation routing and dispatching to email flag page of duty or so on and how does prometheus know where the things are that you want to scrape the answer is service discovery and again more on that later so I would say that the main selling point of primitives are these for the dimensional data model that allows you to really track in detail what each metric and times use is about and where it's coming from then a real query language to work with that data collected in that data model then the simplicity and efficiency of running a single node and then again in the theme of being able to handle dynamic systems service discovery integration let's go to all these four - one by one first the data model we stored time series that's nothing really new time series are usually you know they have some kind of identifier and then they have a timestamp value pairs the time stand just goes up and the value can go up or down in our case the timestamps are in 64th millisecond precision and the values are just flowed 64th because that turns out to work actually pretty well for operational systems monitoring but the big differentiator to previous systems is how we identify time series so taking a look back at for example graphite or Stata T you would typically see metrics that look like this a single big long metric name with dot separated components so in this case for example the total number of HTTP requests as served by engine X on different host and paths and different with different status code responses and the problem with this is that a it's kind of user level semantics encoded into this metric name and it implies a hierarchy that doesn't really exist so for example why should the status code be lower in the hierarchy then the path or why should the host IP be lower in the hierarchy than nginx so it's kind of artificial you kind of have to squeeze it in to that it's also hard to extend if you want to add another dimension maybe you know tack it on to the end aware and does it break queries so what you really want is or what Chrome it is at least starts and is a label based dimensional data model where you have a metric name and then just key value pairs attached to that metric name and every unique set of key value pairs gives you one time series and this is the same kind of label data model that you know talker has docker has labels kubernetes has labels so it kind of fits nicely together if we look at how we query this this becomes even clearer in the graphite case you would have to know exactly you know if you wanted to select only those HTTP requests with the status code 500 of nginx you would have to know which dimensions to wildcard out and you also have to know kind of the meaning of the others in the Prometheus model the simplest query is just a metric name and that would give you all time series with that metric name typically you want to then filter so in this case we filter by job equals nginx and status equals 500 giving you the same result but in a bit more explicit way and so you really want these kind of first-class key value labels so now we collected all that nice data what useful things can we do with it Prometheus has its own query language and it's not sequel style which sometimes trips some people up at first it's more of a functional kind of language and we actually think that it is better at the kind of time serious computations that we typically want to do then any of the sequel dialects that we have seen for that out there yeah so let's have some have a look at some example so let's say you have a metric called node filesystem by its total it tells you for every partition in your infrastructure how big it is and you know it has a couple of labels attached like the device mount point and so on and now you know you want to ask the monitoring system give me all of the petitions that have a capacity greater than 100 gigabytes but that are not mounted on route so first recruit for the metric name we then add a negative filter on the slash mount point route and then we divide all these time series that result from that by a billion to get from bytes to megabytes to gigabytes and then we can actually filter the set of results and series further to get only those time series that are greater than 100 so we get a fully labeled result list of the partitions not mounted at route bigger than 100 gigabytes the different example is you typically want to know you know what's my error ratio error rate ratio let's say you have metric that tracks all the HTTP requests it's a counter it just goes up so you know that's not very useful so let's first take a rate over it in this case on the left hand side of the operation we're taking only the 500 status codes requests taking a rate as averaged over five minutes over that summing out all the time series that come out of that and then dividing it by the same thing but for all requests not just the five hundreds so now you get a single number out of that that tells you the entire error rate ratio but typically you'll want to preserve some of the dimensions so for example let's say you want to have you want to see this split out by the path dimension which is also attached to this metric name you could then just add a bypass modifier on both of these thumbs and it would preserve those dimensions and the / operator or any any binary operator really in Prometheus knows how to match vector elements on the left-hand side and the right-hand side based on equal label values you can also do more fancy things there in a grouping by one direction and the other depending on where you have what dimensions you can go fancier so you can collect histograms of Layton sees and then add query time calculate quantiles from them where you can say I want the zones of quantile with this in this aggregation level and over that much time so these are all things that you can decide at query time and then once you know that language yeah it takes maybe a while to learn but it really pays off now you can use it in different places so for example I usually start out just building expressions in this built-in expression browser in Prometheus where you can just plug in expressions experimental bit and it will always give you the latest you the current value of all the result time series once you've narrowed that down a bit you typically want to graph it so there's a graphic interface is very simple the built in one and then at some point you'll want to share nice graphs actually with your colleagues so you switch to Griffin ax for that graph Anna has native primitive support so you just stick in a primitive data source and then you can use the exactly the exact same promise pointing at this is actually from we've works an example dashboard here so then you can use the exact same prompt URL expressions in graph Anna you can also use prom QL to do alerting and the way this works is that you define some kind of alert name in this case many 500 errors and then you provide any kind of prompt URL expression that output a list of time series in this case all the paths that have an error rate ratio larger than 5% so I'm doing the x 100 to get from a ratio 2% and then filtering by 5 and all of these results and series each of these will become an alert and they will inherit the labels of their time series you can add extra labels like severity critical that can be used later in a lot manager to actually route the alert to maybe a pager instead of an email and you can add annotations for little actually you know message snippets that get put into your notifications cool so for the Third Point operational simplicity Prometheus itself is the thing of static go binary I don't know how often you've heard that already and it has only local storage we explicitly avoid clustering in Prometheus because we think clustering is hard and it's also the first thing that might get broken if you really need your monitoring system in the network partition or something and the way you get high availability instead is running two identical parameter servers that pull in exactly the same data but they know nothing about each other and then they can compute alerts in apparently they send alert independently to highly available a lot manager and that then actually do dupes things also the local storage is often enough for some smaller organizations I mean like even having a single big Prometheus server so we currently do around a million samples per second ingested with a new storage that is coming soon we'll probably have you know a larger number of vendors it's also important if you want to dimensionally really track in detail where metrics come from to support many time series on a single host so we support a couple of millions of series currently and that will also go up in the future and we implement a variant of Facebook's guerilla time series encoding and end up you know using roughly 1.3 bytes per sample and the local storage is great for keeping you know maybe a couple of weeks maybe even months of data but don't treat it as a durable storage or really you know infinitely scalable storage it's not meant for that instead if you want that we recently with Prometheus 1.6 introduced a full read/write protocol where Prometheus sends every sample that it scraped to some remote endpoint that you specify and then you can implement any kind of adapter at the other end that for example right into influx DB or open to use DB or whatever and if you want you can also implement the read back path so that you can read back from your long term storage system through prom QL and we have an example bridge or an example adapter that currently already does this for influx DB open th DB and graphite for the reach for the right case for reading back we only have it for influx DB at the moment but there's also cortex from we've works which is a hosted parameters in the cloud which also uses exactly these protocols so this is a full native from easiest long-term storage read write implementation so I also encourage you to take a look at that and yep on to the fourth point the service discovery so as I said in the beginning dynamic environment nowadays pull some new challenges the ends are being scaled up and down on top of those you have cluster schedulers also you know just scheduling things very dynamically nowadays and micro-services means in general you'll have more services than before and more service instances so how do you make sense of all this mess and prometheus is answered to that in general is to use whatever service discovery mechanism your environment gives you to discover the target and because you know you can't really configure your Prometheus to know about everything statically anymore you can do that in small environments of course but that's not realistic in cluster environment and and then prometheus uses the service discovery data the discovered targets a to actually know what the world should look like because that you know a monitoring system should know that and then also to know what HTTP endpoints to actually pull metrics from and it automatically gets a help signal you know if it cannot pull from somewhere then it already knows something is wrong there and you can alert on that so the target is down if the service discovery mechanism is a good one then it will also give you some metadata about objects that you've discovered it might tell you this engine x over there is an environment equals production instance for example and then if you so choose you can map those metadata labels into your time series or you can even say like once they have this label don't monitor them you know drop them or manage them and so on four meters has built in support for a couple of for discovering for service discovery for a couple of VM providers for discovering nodes also for some of the cluster schedulers kubernetes marathon notably dr. messenger and some generic mechanisms like DNS or console and if you if permitted does not have exactly the mechanism that you need you can write your own and plug it in with a file based interface but of course we are at docker con and you might be interested in how can I discover my docker things with prometheus so unfortunately there is no settled answer to this yet it's still early days but we're talking around about it and there are some approaches one already works pretty well today so Prometheus among many among some of the services Kaveri approaches supports discovering things by DNS a record and it just so happens that if you have a doctor swarm cluster and you run services on their docker will expose a tasks dot service name a record for you and it will give you all the IPS of the individual instances so that you can go and query them this works today the downside is it doesn't give you any port metadata it's just DNS just gives you an IP address and I don't think docker gives me any SRV records yet that could include that information it also doesn't give me any kind of other metadata like labels also and the big thing it doesn't solve yet is pulling across docker networks so typically if you run many micro services you might want to put them on different docker networks so that they are all isolated from each other but your monitoring system is really supposed to be you know able to reach all of those so it can pull in the metrics and this is already good for host and container based metrics though so because it's so yeah so with container metrics I mean metrics that are externally observable metrics like the CPU usage or memory usage also on of a container but not really serve the specific metrics because then you need to go go to all the different of actual micro services there is a proof of concept by the people from container solutions in collaboration with people from weave works that's a little demon that runs that has to run on a docker a swarm manager note to be able to talk to the docker socket and talk to this warm API over it and then it basically discovers the targets for you over that writes out the target file for parameters to consume with the file based plugin mechanism and primitives need to run on the same node to use that file and then this demon also handles the attaching of all the required d'arnot works to your primitives so this is obviously a bit of a hack and not the best solution yet but it also it gives you some metadata already so this provides during the service discovery phase you're all the you know docker and service labels for the discovered tasks so ideally we'd be able to reach from Prometheus just some kind of centralized service discovery mechanism that works in kubernetes for example where we can just talk to the API server and it just you know gives gives that to us and then also we need to find a viable way for parameters to get to all the metrics endpoints there is a docker issue about this to seven 3:07 feel free to chime in there and study it finally docker with 1.13 has added native prometheus metrics to the docker engine this means that docker will actually serve an HTTP endpoint where it will provide metrics about its own internal state so not really metrics about the containers which running but about you know what docker engine actions have i taken and so on and to enable this you will still need to set the experimental flag to true and also then provides this other flag that actually specifies the address and port on which to serve this metrics endpoint and then you'll get metrics looking some something like this so now i'm going to attempt to demo i hope the network is with me basically i'm going to go into the case where we do the dns a record base service discovery and only monitor the cluster and host matrix components so I am going to run three global services one that is just a little soak at TCP for water that basically on each machine allows us to reach the metrics endpoint of the docker engine a node exporter which is a Prometheus component that gives us node metrics host metrics and C advisor a tool from Google that gives us C growth metrics so basically container metrics memory usage CPU usage and so on and we're going to use a records because we're going to run these exporters as global services we can discover each of these on the on each node with the tasks dot engine or test dot node exporter and so on a record and then we're going to run Ravana and Prometheus and going to play with the data of it so let's start with that Jerome was kind enough again to provide me a five note docker swarm cluster on AWS and right now it's just empty Dada hahahaha what that that funny cuz it just worked no route to host should I reconnect yeah let me try yeah well the cable I'll call haha Copperhead turn turn turn turn maybe I need to kill network manager their content this is hmm do we know what's going on here anyone smart enough hotspot bruton yeah I was going to use the wired but the uplink on the hub is actually not working so ah yeah well this is not working hmm this is probably not the problem but uh-huh oh maybe that's the problem okay yeah I know Don systemd okay so let's hope this works now darker service LS nothing there but I do have a note what's this list or LS haha I have five holes they're part of a docker swarm and have this directory where I have a Prometheus Gamal prepared I scraped everything at five seconds intervals and tameka's also scripts itself just for good measure but then it's great the node exporter which is running as a global service C advisor and this little proxy to the docker engine simple enough that the entire config now we need to somehow you know get that into a container with prometheus and as a hack for this I just have a tiny docker file here that uses the upstream premiership docker image and just copies in our custom Prometheus CMO to use that I'll bring up the docker registry in the swamp cluster and then I will build my custom docker image to push the custom Prometheus image to my own registry so you know that failed oh the build sales oh it should be okay though because I built it before sorry fish tada I have a registry that's I mean we will see if Prometheus comes up right I just got a comment from there that the build fails but cool so what else do we have here we have a compose yamo that's a docker compose file a Dockers with basically services defined I define a network the Prometheus network on which all of my services will live after me chisel itself on there I have profile on there I have a node exporter and you'll notice that I have to mount in some of the system you know protists as directories from the host into that because that's the way that Howell node exporter gets its data about the host system and similar foresee adviser it also needs to get some of these directories mounted in and then to reach the docker engine I basically have a little soak at image here that just allows me to proxy through to the darker engine metrics port cool so let's bring that up Dockers tech deploy I'll name my stack Prometheus and hopefully let's say I go to host one which port should I take maybe I'll look at node metrics first this is the metrics endpoint of the node exporter it gives me host metrics about that host on a different port for example four nine nine eight it gives me talker engine metrics and so on so these are the kind of metrics endpoints that dramatis pulls from parameters itself we oh we have at port 1990 and in the target section we should be seeing that everything looks good we have discovered five targets for each of these services because there's exactly one process of these running on each host that's cool we can actually play around with the data a bit so for example that they art with node CPU usage this is the total number of seconds spent in each CPU in each mode on every on every host so you have this dimensional data going on here and in this case it will give you the current value for each of these these are counters so you're not really interested in the real actual value but only of the in the rate of increase so let's maybe take a rate as averaged over one minute and we might also not be interested in all of the dimensions so like I don't care about the actual CPU in there so let's sum the rates without the CPU dimension and your and also not interested in the mode dimension but I have to be a bit careful I need to actually filter out mode equals idle because I don't want to count idle time usage so I'm going to filter that out and now I'm going to get for every node the actual CPU usage in total and that's pretty cool let's check if our group fana also came up again not localhost but host number one super secret username super secret password admin admin good that I didn't give anyone the IP and now yeah so first step we need to add a data source we just conveniently select parameters here call it Prometheus and here we'll just say well the service name of this is previous so this is going to work on port 1990 we proxy it through grow fauna add and get a source it's working so let's create a dashboard let's add a graph and I basically you know I like my expression that I built you already I could actually graph it over time see what happens there are so pretty and I'm going to just take it plug it into the query field here and I'll get it in the fauna we don't have that much data yet so I'll select 15 minutes for now and maybe now you're not really interested in the exact legend format but only in the instance label there so you can put in some format strings into the legend format field here now we only get the actual host we'll give this a nice title host CPU you upon back to dashboard I guess save this demo dashboard we have some other metrics we could look at let's see there are container metrics let's maybe go to the console first container also again CPU usage in seconds but this time for the individual containers running on these stalkers one whole host again will get two CPU dimension and some other dimensions in here first of all I only want to see containers that are actually docker containers and not just some random other cgroups so I will say what was this I think ID label ID equals yeah so if the ID starts with flash docker slash dot star now I get only the docker ones and now also again like I don't care about the CPU time engine um and yeah see the thirst oh and I want to rate it again because it's the counter I don't care about the total number of CPU seconds spent ever cool this will give me now the rate of CPU usage for every container on my cluster graph that put BOM plug it in and again way too many ladles so we're going to actually use I'll say on this instance we have a container with the following name and this looks nice already right I mean still maybe you probably can you probably wouldn't want to do that in reality but so this is like Perkin container CPU usage a total and now the last one is going to be interesting we're going to have a look at the docker engine metrics okay they all start with engine there's one metric for example that gives you the total number of actions split up by various dimensions that the talker engine has done so far so for example action delete action commit create and so on and as it's a counter metric again we all want to be able we want to take the rate over it right now there's nothing going on so it's all zero but maybe we'll also want to sum that up again without the instance dimension to get the total number of actions of each type but over the entire cluster not split up by instance so that's pretty cool and then we're going to create some docker events to see if there's actually anything happening there to do and again we only want to show the action in the legend we have that and docker engine action rates something like this cool now if I actually go to one of these machines and then let's see what's there I'll just kill one of these I don't want to kill Prometheus itself because that container has my data and but I can prove for example kill the so cat one a bomb daugher kill and then swarm will bring it back up right so that's not really problem I can also kill the advisor just for good measure and if i refresh my dashboard I should start seeing some events yep zoom in a bit more so you see the red is a start action the blue is a create action and yeah if I had actually I think first stopped and then our m2 container I think if our our M it I would also get in a delete action but yeah this way I now see the entire you know the the sum of all docker actions in my cluster that is it for the demo and I want to point out one thing oh where did I put this window I think I killed all my chrome windows so the pop-up on I will have to now display my gmail in front of everyone I'll just click this button and that data because there is a Prometheus birds of a feather session tomorrow where people are going to discuss how we want to do prometheus monitoring properly for docker and I would like to invite everyone to join that that's at 6 p.m. in the at night and yeah Luke Marcin also from we've work is going to be there and hopefully some others I'm going to give like a short crow meters in to talk there others are also going to talk and I think there's going to be a discussion so yeah please join that thank you you [Applause]
Info
Channel: Docker
Views: 167,911
Rating: 4.9351788 out of 5
Keywords: docker, containers, prometheus, docker swarm, virtualization, cloud computing, linux, Black Belt
Id: PDxcEzu62jk
Channel Id: undefined
Length: 40min 5sec (2405 seconds)
Published: Mon May 08 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.