Using Jsonnet to Package Together Dashboards, Alerts and Exporters - Tom Wilkie

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody my name is Tom I work for the final apps now although every cube can I seem to be a different company I'm still working on Prometheus and and today I'm come to talk about Prometheus monitoring mix-ins so this is a way of packaging together dashboards and alerts in kind of reusable redistributable way and I'm here to talk to you about why the story behind this is we were doing we were helping people adopt Prometheus in Crafar know and do their monitoring setups for them we found we were copying and pasting the alerts we defined for kubernetes for you know for Cassandra for console for CD we were copy and paste them between customers and doing slight tweaks and when we got to like the sixth or seventh time we're like you know this has got to stop so we've been working a bunch of has been working over the past few months on a way of making this much more reusable hopefully we can avoid this this copy and paste nightmare and hopefully we can use this to kind of enforce best practices and like iteratively improve the status quo of like how monitoring is done just better health housekeeping I am currently on 980 990 followers on Twitter so if 10 of you could follow me make me really happy so there is already a way of sharing dashboards and this is fun labs just come here and I work for they have an online system that allows you to upload dashboards and then other people can really easily download them and I can't I can't say nasty things because they're now my employer but there there's some challenges with this approach most of the dashboards that I've got from Griffin comm give me things that look like this they basically just just haven't quite worked every single time I've gone to wow they're bright hunt they he's really lit up the screen every time I've got them I've had to go in and tweak with them and play with them to make them work for my install of communities or my install of of anything I'm going to turn that off and normally the things we've the problems we've had is kind of the queries that they've used have made assumptions about the labels that are being used for the targets and for the time series so this is trying to tell me how much memory I'm using in my entire cluster the sum of all of the container memory usage divided by the sum of all the machine and it's trying to do it for a particular node but in my prometheus kubernetes setup I don't put the hostname of the pods and containers as a label because that's actually a bad practice and you shouldn't do that you should be doing it through joins and other methods and so this this is really common most for me at least most of the dashboards I've downloaded this way haven't worked and so the take-home here is that to fix this problem dashboard should not be making opinionate should not be having opinions about the label sets you use this should be configuration that I can inject after I've downloaded the dashboard and that's not the only problem we want to encourage people who write software to also tell us how to monitor it normally the author of the software is the best person to do this and they know how it can fail they know you know what the error cases are and so really we want people to distribute alert definitions and dashboards with their code and and chorus and SCD do this which is really good right really really should be rewarded but the problem is you know I come in and I copy and paste this and I put it into my cluster and then it don't quite work you know inevitably and so I fix a few things and I make a PR and I share them and everyone's doing this but there's not really a way to consume the iterative improvements that are happening on open sourced monitoring configurations there's no easy way I'd have to keep going and checking and copying and pasting and reapplying my label sets to this config which is just you know it just doesn't happen so we want a way that's that we can configure these labels we want a way that we can extend them so Jack here gave me a really good example just earlier he wants to inject into the alerts the ability to add URLs to his playbook you know so the sed alerts they distribute don't have that obviously because it's his playbook and so he wants to be able to extend them I want them to be reusable I want them to be easy to install and more importantly easy to stay up-to-date with you know I want to treat all of this like code you know code is constantly improving and if your vendor in or if you're using dependencies properly then you're consuming those up and getting those improvements so I want to I want to do this for for my monitoring config and then the final thing I want this to be deployment agnostic because Prometheus Inc Ravana support way more than just kubernetes and so how do we do this I don't want it to be wait don't it to be too kubernetes specific yeah so let's look at the first one configurable and extensible tends out actually profound it does have a way of doing this Bhuvana has templates I didn't know about this for a really long time I'm using graph honor for maybe three or four years and I only found out about this in the last year so maybe I'm slow but uh you can you can define templates at the top of your dashboard and then from there you can use dropdowns to select and you can substitute them into the queries and this actually souls most of the problems for graphing our dashboards unfortunately graph on accom doesn't support PRS otherwise I'd be going through and making these changes to the dashboards on there so that kind of comes to my second point I was making but also this approach doesn't work for Prometheus recording rules and Prometheus alerts because there's no UI for those you know that they're their files on disk the other thing is I'm a big fan of my dashboards being read-only ingre fauna and they should really be stored in my config management in github and they should be deployed whenever I change them and I want if a change needs to be made I want them to be peered into that and I wonder if you code reviewed and I'm going to use this mechanism to slowly and gradually improve the quality of my dashboards and so building dashboards that way if you've ever exported a dashboard from grifone you'll see it's a massive blob of JSON and it's really hard to code review and it's really hard to enforce standards on so so yeah we're not we're not going to do it like this so we're gonna use a config language and then you get into the big debate of which is the best config language so this is the next 30 minutes of the talk which is the best conflict language I'm gonna review all of them no not really I'm going to jump straight to the answer you can do yeah Mel and Jason but they're not really dynamic enough they're just data data structure definitions effectively and there's no way of stop shooting I know yeah Mel has some extensions to do this but yeah not going to do that and actually quite common in most of the places I see some combination of this for at least for doing their kubernetes deployments and so on they're using em mm for Ginga to template out the differences between different environments and this is fine this is better than just having the raw yeah Mille but the problem here is that doesn't give you any kind of abstract avow you can't define some defaults for all deployments or you can't define you know in this world you can't define defaults for all dashboards and what's more I want to be able to build functions which allow me to say generate recording rules for my histograms I probably should have asked at the very beginning like is everyone familiar with Prometheus I'm using recording rules and histograms like hands up if you're using Prometheus okay and just oh you all happy with these terms good and hands up refusing core fauna yeah okay so so I want functions basically I want some sort of abstraction and what what I think's actually becoming quite popular is using like real programming languages for our configuration a company called improbable in London uses go for all of their configuration for their monitoring for the deployment objects in kubernetes I think this is a project called playwright I couldn't find it online so I don't think don't know whether it's open source but it's quite quite a cool system my previous company I worked at weave works we use Python to define our dashboards and this is way easier to read than the the a more earthy the Jason you use in graph Anna and you can see how doing a PR and a code review on this would be pretty straightforward right and also this would be pretty straightforward to be able to substitute in the job equals bits but I don't like this approach I don't like this approach because this is a full-blown programming language this feels like I'm writing a program it doesn't feel like I'm writing code it's a bit intangible but also like python has global state I hate global state I don't want side-effects in my config that's really strange to me and and I wanna I want a bit more predictability and I worked at Google so what I want is JSON it JSON it is a configuration language it comes out of Google but it's not actually used by Google as far as I've been told it's based on JSON and it adds a whole bunch of extra Platt power it's inspired by in the same way prometheus is by bogman J sana is inspired by BCL the board configuration language so this is the language internally within Google that they used to describe jobs running on Borg which is I guess the predecessor to kubernetes what's more J sonic can manifest out can render out Yama files any files JSON files in you basically name it all just strings and just normal files written by a chap called Dave Cunningham who I don't think is here but he's a good guy to give you a kind of idea what what JSON adds I've just taken this straight off their website Jason as I said is just kind of the data definition and JSON it adds variables conditionals arithmetic functions we can read but mainly adds functions and add string substitutions so I'm now going to go to his website and give you a bit of a demo if I can fiddle with my screens good so the websites recently been rim that revamped and is pretty good and he's included a little JSON JavaScript compiler so you can see you know you can edit this and it compiles out so what we'll start from from scratch okay an empty file is is an error which is good so JSON it you've got JSON objects you don't need the quotes around the keys or cheshire I don't know whether you need an Jason I don't rent a son for a while okay and so if you do that you'll see it renders an object called hello world right this is trivial but if I don't want to just say hello to the world I can do things like hello substitution name and then I can do local name equals Tom okay hello Tom even better it's getting personal so now I actually want to use this and I want to kind of make the person who I can say hello table to configurable as you might expect so I'm gonna put name in here and I'm gonna I'm gonna leave it empty okay and then here I'm gonna refer to name so this is the first kind of bit of magic which is called self knit refers to the dictionary in basically so you see this is saying I must supply a name so I'm gonna I'm gonna merge this this is the second bit of magic in JSON it which is the merge operator and I'm gonna have to quote that okay so now I can say hello to Tom but the output spitting out name Tom well I don't want that in the output like cuz that's kind of a hidden variable right so if we do a double double colon there double column means hide you'll see this a lot in the future so this is why I'm putting it in you can also put arbitrary stuff in here you know you could do be good by and you'll see how the merge operator works okay so this is the basics of JSON it and you can see from these building blocks you can build up quite a lot of complexity I'm just gonna though some other stuff I wanted to show you done string substitution done hidden fields done merging and there's syntactic sugar around the merge operator which is kind of cool so I'll call this local hello err yep definitely spelt right and then I'll do hello okay and you can see how I don't add to put the plus in there yet because it's it's syntactic sugar basically and this kind of makes it feel like you're instantiate a Hello if you see what I mean you're not you're just merging the dictionary name equals Tom with a Hello a dictionary and it becomes this thing okay another example merging references are there's one extra special reference that I'm going to do so be it's gonna be another object and inside B we're gonna put C and we're gonna make that a list and inside here we're gonna put a string well just a string okay so this is kind of what we expect yep and then we can sub the name in but self dot name wouldn't work here right because we're in a string we're in a list right Phil does not exist name so what you can do is you can do super nope that's wrong you can do dollar dollar name dollar refers to the utmost object so in this case dollar is referring to this object here and fingers crossed it works yes I seem to know what I'm doing so anything else to show you yes there's one more and this is where it starts to get really powerful you can do things like we're gonna go a super dot a plus I don't know right and so super refers to in this case the thing you're being merged with no it doesn't oh yeah it does it's my name yeah hello Tom dude excellent so that's basically JSON it it's really straightforward in my opinion it's got a standard library for doing all sorts of functions and things and see it's got maths and and so on but really it's that merge operator that is the important thing because that's the that's what you that's what you use for composition which is what you use to build up your monitoring config there's a few extra examples on here to encourage you you know it shows how to do a function which is just how you'd expect it to work and then down here it shows you how to manifest out ini files and other types of files they're less interesting in my opinion so on mirror they used to be away for keynote to automatically mirror a nun no I'm pretty sure or I just dreamt it okay so that's that's JSON and that's the language we're gonna use and now I'm going to show you how we're gonna use it because it's not enough for everyone to go off and and just dive in and start using JSON it to do this because the idea is that these mix-ins were going to produce are gonna be reusable by you know people who didn't write them so we want some kind of standard you know a lightweight standard at least to say what the keys are going to be able so this is it this is the standard all five lines of it you put configuration in a thing called underscore config and it's hidden the reason we say underscore config is just to make it really clear that this thing's reserved and that you shouldn't use this the underscore is not meaningful the plus here means when this whole dictionary is merged with another dictionary that merge will become recursive so you can have an optional plus here if it doesn't have the Plus this configure would overwrite the config of anything you're merging with which is not what you want you basically always want Plus : when you're defining these things then we've got a dictionary full of dashboards and then we've got a list full of alerts and a list full of coop recording rules and they could be the same thing I guess for the parents out there the config normally this is Frederick who I've been working with on this I should mention him an earlier sorry is he here oh he's doing the deep dive okay well he's been working on me with this it's a real shame he's not here but we've settled on this format for providing the config you basically have to provide a the key value pair that gets substituted into your Prometheus query you put it in config call it a selector and then when you define an alert it gets pretty coded code heavy from here on in I'm afraid guys when you define an alert you substitute in the selector you want with the config like that this is string substitution in JSON it it's pretty good triple a triple line triple bars are just for doing big multi-line strings which I had to do to fit them on the slides and this is a cube pod crash looping alert which you can consume and run against any kubernetes cluster and any cube state metrics you just have to provide the label that selects your cube State metrics so for instance it I think if you're deploying corer s this is called name equals cube State metrics where and if you deploy on my clusters at least I tend to include the namespace in the job name so I don't accidentally aggregate jobs across namespaces which is almost never what I want to do you can see when you render this out it comes out like this which is for everyone here who's ever written a Prometheus alert definition this is a Prometheus alert definition I've made it nice and bold so you can see where the substitutions come in so that's the JSON it and that's the rendered yeah more and I'll show you how to render it next so reasonable so here we're going to look at how do we satisfy the second bit of the requirements which is being able to download easily install and stay up to date it's really key that we can stay up to date otherwise those hold all those drive-by commits on github are for nothing so how do we do that well this is actually where I was hoping Fredrik would would join me and say this but turns out he can't we've written a tool called JSON it bundler you don't have to type JSON at bundler every time you want to use it because JB wasn't was not taking on most of our machines so it's just called JB and this is a package manager for JSON it basically like it's super simple right now Frederick will admit it's you know been been written very quickly let's say but it works and we're using it and we're using it with a couple of other people now this was inspired by bundler from ruby i have no idea what that is because I don't use Ruby but I do know go lang and we've tried to avoid the horror show that is going venn during and package management you install it with go because it's written and go and then you have to do it in it which just drops off a couple of empty files there then to install a package so we've we've got this cube Letty's monitoring github org and inside there off here is the kubernetes mix in which we've slowly been working on together and this will dump it into your vendor directory and inside you'll notice it's installed this thing called graph on it graph on it is a json library for building graph on our dashboards and it's also a mouthful to say it's noticed that that is a trend transient dependency of kubernetes mixin installed it that way and that's really like the Minimum Viable Product we thought for a for a package manager you can install stuff it'll install dependencies and if you run JB installed without any parameters it'll update everything which are the three things we wanted the next thing you can do if you want to consume this and start customizing these mix-ins and say you know combine mix in three or four of them you dumped a little file in your in your directory called monitoring monitoring JSON it which can be called whatever you want actually but inside there you see you can override the config and this is where we get into the JSON it right so we're taking the kubernetes mixin we're importing it we're merging it with this dictionary that contains a config dict which is merged into the other conflict and overrides the see advisor selector cube state metrics here we've changed it to job equals cube State metrics instead of job equals default cube sabermetrics and then if you render it out with this incredibly long and complicated command you can see the job is changed to cube State metrics and there we go it's done that's how you supply your opinions about your labeling in your cluster to the Nixons so I think that's pretty cool one of the things we really want to do we didn't get time to is package up all of these commands into some kind of script or little like mix tool that allows you to Lin and check your mix-ins work and allows you to deploy them a bit easier but these will basically render out into the ammo files that you can then load into Prometheus or the JSON files you can load into gravano and the final the third thing I really wanted right was deployment agnostic we said we want to support more than just kubernetes or though we want a really first-class experience on kubernetes so we want to you know we need a way of building normal files which you can integrate into your config management system so he's the complete commands it's just JSON it this one's quite cool the - mmm command will generate out a subdirectory full of dashboards so I'm gonna just give that a show I give that a go see if it works and my screen okay well oh you can't see it that's what's going on how does computer work it's not just me right no one else can see it this is fun isn't it right then so we'll go to cube comm so there's loads of stuff in here let's just remove it all don't put slash and now empty directory will J be will install JB go get JB I already had it that was quick we can I'm supposed to have there we go we can JB in it and then we can install cube Letty's mixin I know I'm cheating so I'm using my bash history but I typed like a five-year-old so there we go we've installed it if we do tree on vendor you can see all the stuff that's been included and the transient dependencies and then if we touch monitoring dot JSON it and then we go back to the slides and copy and paste what's in there I already have it here that's what I was supposed to do now if I save that then I should you can see monitoring dot JSON it has what I wanted and if I do JSON its to do so this one is doing the including the vendor directory manifesting everything out into a folder called dashboards and then importing and and sorry executing they'll go find the dashboards so if I run that Oh need to create directory you can tell it's a real demo because get errors and if we run it one more time yeah we go we can see we have a set of dashboards if we look in dashboards there's a set of JSON dashboards if we go and have a look in one of these dashboards cluster then we look for cube State metrics nope not found it's a bit annoying oh because these are using dashboards a bad example so if we manifest out the alerts will this work first time note one cuz I'm using one command command is here this will work first time so close and then it's gonna have munge the the quotes live demos why do we put ourselves through this right fingers crossed yay there's some alerts and you should be able to see that all of the job names in these alerts have been they've removed the namespace prefix that I that I specified my clusters because let's say this cluster doesn't have them so almost done there is one more thing in in good presentation style and this is case on it I touched on case on it earlier this is the the taller Brian down here works on and it's a library and a tool for building kubernetes objects with case on it though with JSON it and it's really good we use it we use it with all our customers and the actual reason we chose Jay sonic to use the mix-ins is because once you've got the same language for your kubernetes deployments and objects and the same language for your dashboard you can start sharing the configuration between them and so this is really cool we've built well I think is cool at least we've built Prometheus case on it which is a library a case on it library that contains deployments for Prometheus Rivanna node Explorer 2 cube state metrics and then it depends on those keys that are included in the mix-ins to manifest out all of the dashboards and alerts into config maps and then apply them into your cluster all in one step you don't have to compile them or anything it's worth noting that we do it in a particular opinionated way here Frederic is working on the Prometheus operator and cube Prometheus and Kubica fauna his projects to also support mix-ins so that there's multiple options and multiple ways of deploying these to kubernetes if you use the Prometheus operator for instance and so this I think this is super cool I'm going to just give you a quick example no I'm not actually to have time come find me afterwards I'll give you a quick example but it effectively means there's a single single layer a single namespace a single object domain for all of our configuration we can just add stuff in and it becomes incredibly easy to manage so that's Prometheus monitoring mix-ins I haven't got my notes because I didn't turn military walk give me a second there it is mirroring that's monitoring nixon's we have built a whole bunch of them already this basically has taken the stuff we've worked on with our customers over the past six months and turn them into mix-ins we've got the one going into chorus fret CD and frederick super supportive of this loads of improvements going in there I've built one for console hoping to merge that it mean it's in in my old companies public mono repo but we're hoping to merge that into the console exporter if if they let me I own the volt exporter and the company grapeshot sponsored all my work on that so we've already put the voltage Porter mixing in with that the kubernetes mixing as we discussed this is in this new organization and we've also got a semi prometheus mix in that kind of for self monitoring Prometheus and and dashboards as part of Prometheus case on it I'm gonna break that out and put it somewhere a bit more official working on a node exporter one so this is effectively going to take a whole bunch of the alerts that we've defined in the Kuban eTI's one but that are more generic and and would apply to pretty much any linux machine and we're going to hopefully include that in the node exporter internally I'm working ones for Cassandra ones for my sequel and you name it basically any system to monitor mix-ins configurable an extensible reusable hopefully easy to install deployment agnostic I've got a little bit of work to do want to put yours in playbooks currently we we haven't figured out way of doing that I want to build this tool so you don't have to have those cumbersome commands we did this all in the open Frederick and I so if you've been on the Prometheus mailing list or if you've been on I think the case on its slack a whole bunch of different places we've been posting the design Docs they're out there publicly readable and comments on please you know tell me if you think this is a good idea or not similarly Frederick's put the package management proposal up there I'll put these slides online so you'd after do this I wanted to call out a few people who have really helped me on this project this is Dave looking very thoughtful you can find him at spark prime on github as well he's a really cool guy he's really been really supportive of our work here and like privately he told me probably shouldn't repeat this I guess well no he told me that this is really what he envisions for JSON it when he built it you know years ago and that he never managed to get this far because he got bogged down in other things but that he's really happy that JSON is kind of this configure and configuration language that should be able to combine multiple different domains Frederic obviously has been you know been my partner on this he's built the package management he's been you know he owns all of this with me Julius I don't know whether he's here but he had a previous design doc suggesting you know a lot of the problems we've tried to solve were motivated by the design dock that he circulated and again super great feedback from him David I stole his slides because he gave this talk or over version of this talk at the Berlin cloud native meetup a while ago as this was much early on and grapeshot I wanted to give a shout out to grapeshot who sponsored a lot of work on this yes thank you very much thanks for listening any questions no questions no there's one there go on shower repeat that's a great question so the question is what do I do with my existing dashboards and how do I translate them is there at all so the nice thing is JSON is a superset of Jason so you can just in you can just include your old dashboards as big JSON blobs in your mix-ins and then you know put a little bit of effort into maybe tightening them up and adding in the substitutions but JSON it is just Jason so you don't really need there's no migration necessary effectively obviously there's static and you don't get the benefits of this we don't have at all I think it's a great suggestion would you like to write one I mean I ride encourage I'd help I think I remember John relief works was saying there is a he's working on a tool which is more like a set of rex's I think for translating them into graph on it and if that approach works then sorry into refine a Lib and if that approach works we can potentially use the same tool to translate them into a graph on it and JSON it maybe I don't know but I think it's a great suggestion thank you other question I hear anyone it's a big audience so I'm gonna go over there at the back are we going to have any curated lists I don't know I want to I want this to be a big open thing like I don't want to I don't want to rule this with an iron fist you know I've gotta let it fly if it's gonna flourish so I kind of like how we go there's no go package manager there's no like central go repository and you can just publish anything on github and that's why we went with that model for JB you know they can just be anywhere I suspect like with with Prometheus there is a a list of exporters and it's relatively easy to add to that you know we don't we don't stop you adding to it we just stop you like having two or three entries that are the same so I said you know maybe we'll do something like that but we want to encourage this to grow like I'm not going to have a central list that you must add a PR to to have an official mixin or anything like that like if you want to write one just write one maybe we'll see like if you think there should be one like I'd love to I'd love to hear anyone down at the front Bob okay so it does have a package manager we've been talking to Brian about what we're gonna do about that cuz they're they're very similar and there's slight differences I would hope we can not have two and we're both being flexible about it and Brian said something to Bob that I didn't hear we're looking at JB great yeah so I mean I hopefully we can work together like we're talking about it anyone else over here yep ah where's Frederick so the question was is there a good solution for - brought Jason's being too large for config maps there is a tool in maybe in the prometheus operator maybe it's in the cubical fauna replacing one of Frederick's repos it doesn't work with this yet but it's on our it's on our like radar of having a version that works or this I would actually hope that we could just do it in case on it because we've got you know it's quite a powerful language there's no reason why we shouldn't be able to write some sort of tool to do that and I guess we could with with with especially with the with the case on its support we could just have a config map a dashboard at this rate it would be pretty straightforward to do but like yes we're aware of this problem and we've got some thoughts around it and we would love help up there I'm not going to answer questions on helm sorry I've been told to be nice now I mean I realize helm has a lot more use than case on it and I know Brian was telling me that they want to find a way that that these can be you know more compatible together I guess I should probably look at a way of consuming nixon's in helm as its helm is so widely used I would make sense really wouldn't it but I don't use helm so anyone else No oh one more hi this is a old-fashioned I think and I don't drink them because I can't take alcohol that strong this is the this is why I thank David because he sent me his slides and that was his front page and it was much better than the one I had so I still look cool but no I don't drink I don't drink spirits beer for me Bob last one [Music] would you use this to set up your real able rules well Prometheus case on it so which is my opinionated set of deployments for Prometheus on kubernetes has embedded in it a set of relabeling rules and they're the best practices that I've learned in the two or three years I've been doing this they're different than the Prometheus operator rules so the mix in mean relabeling rules are definitely outside of the scope of the mix-ins and the whole point I guess of the mixes is to be able to support people with different relabeling rules yeah so I guess answers questions no this doesn't this is you know relabeling rules are beyond the scope of this and really the way you deploy it would dictate your the real Eamonn was cool I think we're out of time if you've got any questions feel free to come up front and we'll answer later thank you very much
Info
Channel: CNCF [Cloud Native Computing Foundation]
Views: 6,337
Rating: undefined out of 5
Keywords:
Id: b7-DtFfsL6E
Channel Id: undefined
Length: 35min 2sec (2102 seconds)
Published: Mon May 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.