Lonestar ElixirConf 2019 - Monitoring Your Elixir Application with Prometheus - Eric Oestrich

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] all right so yeah this is a monitoring your elixir application with Prometheus so I'm from smart logic we build web and mobile applications if you have a project we can help we've been around for 13 years and I've been doing a lick sir for three of those yeah so we're gonna first cover a little context see how we're gonna set up Prometheus how to get some metrics or to set up metrics instrument them and then kind of take a look at a Prometheus core fauna and how to get some alerts out so you might be asking what is Prometheus so they're their slogan is for metrics to insight it's a go application that pulls your application so it scrapes metrics out of your application so it's a polar instead of a pusher and then the other thing we're going to look at is graph on ax so this is the open platform for beautiful analytics and monitoring this is the thing that will actually be doing your dashboards yeah so some other options out there instead of Prometheus so you might have heard of these as well so there's New Relic Cloud watch data dog stats D and graphite and all those kind of goes on forever yeah the other thing we're gonna look at this is the application it's open source is MIT this is the thing we're gonna be looking at just so you can see what it looks like in a real epic application to be instrumented with from athiest so this is at grapevine dot house that's a German spelling of house it's a it's on github of an ostrich slash grapevine so we can kind of take a pull through this so this is the website this is a grapevine where you heard it so we can go ahead we can see that this is my game mid bud online so the thing that we're gonna kind of want to see is like so we can take one applicator or one metrics like through the whole pipeline right so when someone comes here and registers an account and my super-secret password alright so I just registered I can get a verify my email all right so now I can come here click Play we'll see we'll see what that looks like at the end so I can look at a game click play start start playing this kind of show this off last night this is X venture the other application you can take a peek at so this is kind of what what the application is doing this is connecting through a WebSocket down into the server which then does a telnet connection out to the remote game tell them it is still alive don't use it though yeah so we'll be able to see I forgot to enable this but there the the other thing anyway so there should be a gauge down here that's that's we could see messages being sent back and forth so yeah so that's that's grapevine this is kind of the basics of it you can move around it's it's just a web client for that for the game right so that's the application we're going to look at so how do I have it like what do we do to get there to get metrics out of your application right so these are the three basic libraries that you need so Prometheus X that's going to set up all our counters and gauges and histograms Prometheus plug plugs is how you're gonna actually export that data and then telemetry is how we're going to instrument it alright so so what do we do let's start with trying to export our metrics so we have metrics dot plug exporter which just uses Prometheus plug exporter this sets up a whole bunch of stuff for us so then you go to your web endpoint put in the plug kind of somewhere towards the top towards the bottom doesn't super matter then we're gonna have a set up module so this is metrics that set up with a single method a single function in their metrics plug exporters set up so that's going to call the thing that actually sets up all of the counters and which is which are set up as ETS tables under the hood and they're going to call that function at the start of your application so sometime before your you boot it back to the supervision tree you're gonna you're gonna call that function so as I get us so this gets us just a whole bunch of this is what the Prometheus metrics format looks like there's a few comments so type help the the label this is the label or sorry this is the metric name and then this is the actual value so we can actually take a look at what this looks like so it just kind of goes on there's a whole bunch of VM there's here CVM allocators as a bunch of those along with some of the other ones we have for our application right so now we want to we have mentors to getting exported now let's actually instrument in our application so this is metrics account account instrument and these are all real mint modules you can go find them in Grapevine under Lib Lib metrics account instrument so you see the real full thing so this we're gonna use from a theist metric at the top this is also gonna have a setup function we're gonna have our series of events so we're gonna have an account is created someone signed in someone signed out and then we're going to loop over each of those events to deal with them so inside of this we're going to take that the list of the event the event list we're going to join them all together with an underscore and then we have grapevine accounts and then that name so this this naming scheme is kind of important you want to start with your application and then I've been doing kind of a sub-module the event name and then you can see here the actual counter name is underscore total so you'll want to yeah so you want to start with your at your application name name of whatever you want and then you want to finish it with the if it's a counter total or if it's specifically dealing with bytes you want to do underscore bytes underscore [Music] or if it's anything with time you want to do you want to do seconds so there's base units and Prometheus no there we go and then so you can you can take a look at the Prometheus stock so I don't have internet but there's there's a link right so um prefix with your application use base units and then suffix with the unit so now we add our instrument er into that setup and now we have those metrics so we can if we zoom in on this thing right so we should we will see grapevine account session logout right so there it is alright so now how do we actually like right so we have our we have our metric setup this is how we this is these are the function calls that you kind of drop throughout your code based on like specific spots that matter to you right so in this case we have telemetry execute grapevine accounts create so this is the version of telemetry that you're probably about to use if you're gonna try this at home is 0.3 between starting this talk and now they've also put out 0.4 so it looks like this hopefully they didn't just put out a point 5 but so this instead pushes before this it was the event name comma the count and that was specifically a number there's also you can do comma metadata but if you don't need it you don't have to send it so now the count is a count map so you can pass extra things so doesn't just have to be value it could be like bytes like whatever whatever name matters to you you can send so now we've hooked into telemetry or sorry we have the execute lines for telemetry and now we have to actually hook into them so this kind of extends the set up event before so we're going to take our event name and we're gonna prepend it with grapevine accounts yep so that that's our full event that we have so we have right grapevine accounts and then if you remember the first list was just create right so that just gets appended the grapevine accounts create so that's what that is when we have telemetry attached so we're gonna use that name we generated up here this name has to be unique for your whole system it's that way like we're already kind of namespacing this metric so it's good to just reuse it so the name the event this is the thing that is triggering right and then we're gonna pass it a callback function that this will call and then the last one is a config that can you can push in so this is just a static config for every time it gets called don't use that as well so in this case we don't need it so it's just nil yeah and so one one kind of smallest difference you might see it attach and attach many I had previously been using attach many so we have there was a big list of events so you could previously just say grapevine it counts as your name and pass it a huge list and then a single function but the thing to note is that if any of your handle functions fail telemetry will just will catch the error and then just boot it out of the handle so like it will never run again until your application restarts or you you reattach it so this way if if you had a list of fifteen and one in the middle started failing it's gonna eject all fifteen of them so you don't want that so if you do each one separately then just that one gets evicted and the rest can carry on so here's what our handle events function looks like I just have a the handy function head at the top so we have a vent count metadata metadata is the things you pass in for each telemetry execute and then the config is the static one passed above so for this case we don't actually care about anything in the in the Council there those are all just underscores so we're just gonna pattern match on the function or I'm sorry on the event and then this is where you can you can do whatever you want right so you can do a logger info account created senior in logs a nice little happy message and then this is gonna use counter increment the name this is this is this is what Prometheus X looks like yeah so it's just it's just incrementing the counter and then a small small aside on concurrency so telemetry executes runs in the process that calls it so it's not Sydney it's not casting a message to a single gen server that's going to get bottlenecked this is spread out prometheus counters gauges histograms at all that are tracked in a public writable ets table so it will be fast nothing is nothing should bottleneck behind up a gen server so now we can view our metrics right so we have slash metrics we have a burmecia server and then Griffin ax yeah so here's our Prometheus metrics here that's really zoomed in yeah so I guess first are our basic Prometheus setup so like how do we tell this to actually start pulling our applications so they have what's called scrape config which has jobs so this is our the job name is grapevine this is going to get it added to a label to the actual final metric so you can have it be or you can have a bunch of jobs that all pull into the same metrics Prometheus sprayer so yeah so then you have static configs you can set it up with I think console and all kinds of things I've only ever needed the static configs so you give it all the targets that it needs in this case localhost 4000 there's an assumed HTTP at the front : / laughs and then a slash metrics you can configure all that highly recommend in actual production to use it TOS yeah so that's what that looks like here's what the actual target that's kind of small so we can take a look at the actual web page that's what this looks like so here's here's the actual endpoint is hitting we can see that it's up and then some additional labels and you can see when it's scraping etc so this is so if we want to if we want to look at this create we can go ahead and see when I actually made that so that was that was there so this is the counter going from zero to one so this is kind of boring but you can do functions which we'll see in a bit to do better ones and then you can set up rules for your application so this is a Prometheus metric a function expression I guess they're called that is this this checks for the increase of this counter over the last five minutes as long if it's bigger than zero this rule will start triggering you so you can add annotations these annotations will show up in the emails that you can get sent from alert manager and we can see what that looks like so we should I left this web client open you should have an alert firing yeah so this is this we can see that the this is just a really stupid version of it so it's telnet client count is bigger than zero right so you can see what this looks like and yeah it's okay it's that one all right yeah so you can see that that bat is firing all right and then we can connect Gravano to this and so for Griffin you set up dashboards that's what a tiny picture we can look at the real full one so this is the actual this this is the dashboard that I have for my instance of Grapevine so you can see the number of attached games the number of sockets the number of open web connect clients and then this is what I was hoping would have data but apparently I forgot to actually set it up but you can sort of see here I wonder if yeah that'll get bigger so you can this this merges together a whole bunch of different metrics so I had to has something called interpreters command is the byte 255 so that is not an actual ASCII character so that's a signal to the to the binary stream that something after this is about to be a command so do something with it don't display it so in this case there's a you can do negotiation so the server can say I want to do this and then the client is exercising now don't do that or sorry won't do that and so there's there's do don't will won't something called the mud stats server protocol sorry mud server stats protocols one of those two we can see when line modes turning on when someone requests the terminal type the thing I was hoping to see is that G MCP messages which is an out-of-band protocol for telnet so I guess specifically for muds that kind of pass around JSON with each other and then the other thing to note is that you can push some of these metrics to the other side of the y-axis so that way the if the GMC p1 should have been at like 60 the the web the this web client open count doesn't just completely get squished at the bottom so that one can be off to the side all right and then we have our demo as well as that sorry oh no okay the other thing I was hoping is that we should have an email here sweet so this is this is the alert manager that's actually this is mail hog so it's just local so we can see that what and what an alert actually looks like right so we have the web client is firing so then this gets an alert you can set this up for specifically any of these rules so any of these these alerts that you have in the rule fire rule rule file can get triggered so this one also should have triggered but it didn't so this is an increase this is the increase to try and capture that someone signed up it apparently didn't trigger so something's wrong but the other you can also silence alerts in metrics so you can you can give it the different look any kind of labels and matchers and whatnot you can be pretty smart about this let's see maybe y'all oh right I'm signed in to the wrong person I will be why that's not so let's try and get the gauge showing hopefully this will pre-fill for me this is all right so we have a purple health gage so this might actually we have to quit so we authorized our connection that's not that's over till now but we're not asking for a password because that's been saying yeah so here's our health gauge so we can go fight that gobble and that's here so he's gonna start beating us up so we can start to see some messages flowing so there's our spike let's change this to the last five minutes so we can see every time I new GMC P message comes in we get it we increment the counter which then goes up we can watch it happen and hopefully I didn't die so let's let's heal myself a few times to let that keep going oh yeah so there's there's a the counter is actually happening if I can if this goblin will keep beating us up we might actually be able to trigger an alert for that as well trying to yeah so I have I have it if it goes over 60 it's like one a second roughly that's mostly to just tell me that someone's using it since this is still like a sore ass high it's a it's a very niche market so knowing when this is being used is good it looks like we're at 60 and then it peaked so it's just the once a second oh and I died that would be why it did that so let's see what else will trigger that so we can shoot a magic missile out the Goblin maybe a frost ray see so we did indeed trigger to area 71 so this this the alerts only fire every I think 15 seconds you can configure this this is just to make sure that Prometheus isn't doing so much work that it's gonna fall over so there's our six seconds ago and if it worked right so this is an instantaneous rate no data points found that's okay I remember why that happens so this this is this is actually a clustered application and I forgot to scrape the second node that's sitting there so targets are important yeah I think that's that's pretty much everything so to wrap it up I've been streaming on Twitch every Monday at 12:00 Eastern for an hour this is that elixir development so we can you can actually watch me do grapevine or X venture so this is that twitch.tv slash smart logic TV make sure to follow and all that fun stuff and as Justice has not let you not know about we've been doing a new podcast called smart software smart logic podcast at smart logic do go check it out and then any questions and I guess while we're waiting the this is my cat odo if anyone is interested in this ticker I have
Info
Channel: Lonestar ElixirConf
Views: 1,170
Rating: 4.7333331 out of 5
Keywords:
Id: ETUD9SaRCjY
Channel Id: undefined
Length: 22min 12sec (1332 seconds)
Published: Wed Mar 06 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.