Red Hat OpenShift Twitch: Playing with Prometheus

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] good morning good afternoon good evening everyone I am very excited to join you today from hello good afternoon good evening good morning good everyone be everything to everyone I am happily joining this twitch train today um it's a chaos of deliveries and other issues so I am very happy to introduce today Josh woods he is joining us and then Eric Jacobs will be joining us here in a few minutes I believe he had some deliveries as well show up at the same time so yeah is there there Christian is complaining of echo I am not hearing any echo Christian all right sorry so Josh woods he is known for a book believe it or not they let Josh woods write a book that's actually one of my favorite books to be honest with you because it has helped me so much learn about kubernetes operators I don't have a physical copy Josh because you know the times were in but I did print out a cover for everyone to see you can go to just just Google at kubernetes operators eBook it's on RedHat comm I will drop a link in chat here in a moment but yes Jason Toby's who was on yesterday and Joshua wood who wrote this wonderful book on kubernetes operators Josh can you tell us a little bit about like the experience of writing a book a technical book like this oh wow okay yeah I mean I can and thanks for mentioning it Chris and before I dive in I'll say one thing if you shoot me an email some time with address info I could make sure you get a book oh well thank you very much I appreciate that I'll even I'll scribble in it for you so yeah the process we had lots and lots of help from of course a ton of people at Red Hat Dobies and I are both developer advocates and while I have a long history with with operators originating from my time at core OS and this has kind of been a focus for me for what is an accumulating number of years as hard as that is to believe Dobies was fairly new and that gave us a really good kind of platform of a good knowledge of the basics and a good knowledge of what somebody who's just coming to it needs to know to get the basics down to under you to understand how to start using and how to start writing kubernetes operators so the book focuses on red hats operator framework and sdk tools for the mechanism for writing that and as far as like how the book had like what writing the book was like we had a ton of help from the folks at O'Reilly who published it and in started the project and from a lot of the folks at Red Hat who are in operators project management or working directly on building operators and the operator lifecycle management framework that's such a big part of how we deliver the features we add on top of kubernetes and OpenShift so we had recourse to a lot of expertise and we're not trying to write it entirely out of our own empty heads and probably the most challenging thing in doing all of this is that we're dealing with something that's still really rapidly evolving especially in the framework in the sdk tools while a lot of the thinking and sort of conceptual model of operators has been in place for a good amount of time and we see maturity around it in partners and vendors who are delivering operators into the operator hub the tools that we use to build operators and the underlying kubernetes abstractions that opera layed operators rely on and leverage continue to change rapidly so throughout the course of the book we get a chapter drafted and into place and we would constantly be having to do a walk back through and reviewing what we were doing and trying to make sure that at least at the moment we delivered the book to the presses it was as up-to-date as it could we be so for example I already know of a couple instances in the SDK where where we need to make updates and where that'll be noted in like sort of errata pages for the book that have happened even in the month and a half since since the book printed in in March that yeah I was about to say like that is like the the fact that hey you wanted you wanted to write this book knowing full good and well that the pattern is still it's a pattern it's an established pattern it's a good practice you know we've established that but like the actual underneath bits are gonna be changing very rapidly and that includes that it's Takai kubernetes the whole nine yards and you know also this whole this whole thing as we're evolving here you know we're donating the framework and potentially olm and you know operator hub is out there for now but the framework itself operator sdk framework is being donated to CNCs so this is not just a Red Hat thing yeah and use kubernetes operators on any kubernetes cluster right like it's not we but we you know we sponsored the book we're giving it away for free on the website I just dropped in chat but you can use these operators or the operator pattern in general on any cluster yeah yeah it's a really good point in it and it even extends to to one of the things that I think is the neatest part of what would built out of the pattern and that is this idea of an operator lifecycle manager what is OLN well operators automate and manage the software they run OLM automates and manages operators for a cluster and OLM while it is again essential to how we're building features on top of the kubernetes core of our openshift distribution it's also available in a bolt-on that you could add to any bog standard kubernetes cluster and then begin consuming operators from operator hub do and other catalog sources where where vendors have mature operators already already in the market consumable in one way or another so I think that's a really important note that you make that that while this is intrinsic to a lot of how we deliver OpenShift it's been designed in a way to be modular to any kubernetes cluster which is is what the book focuses on certainly and in the exercises in the book we use mostly uh mini coup to illustrate this very fact so if you look in the later chapters you'll see you know L and bolted on top of mini cube and we use that for our building deployment environment in the the hands-on examples cool that's awesome so you know Jarek just said he was on his way he to be honest he had a delivery right as we were joining so that's why we were a little delayed so his doorbell rang he had to sign for it and all that fun stuff so now he is on his walking treadmill it looks like joining us from a far-off land hey Eric how's it going well he says he's good I can't hear him though uh-huh Eric with the audio issue don't know oh that's cool you can do it Eric I believe I believe I can fly I just want to believe we could here I just want to believe we can hear him now Eric got nothing nope nothing no unplug it plug it back in again or is it unplugged and completely today yesterday he joined and I was like I can't hear you what's going on normally have no problem and he had to plug in his head still is not working though Eric no nothing I got nothing man I'm sorry yeah Eric the hint I could maybe offers that zoom made unexpected choices for both my input and output sources when I that's true I've you know I've used it before but I had unexpected choices for those two items when I opened up today I did too yeah actually I had to adjust all my settings and my creep plug stuff in and everything so yes something in some zoom update somewhere has potentially maybe changed our defaults so yeah so yeah Josh talk about prometheus today or yeah and so having gotten my shameless plug for my literary output out of the way what we are going to talk about today and and why we brought Eric as the SME for this particular feature is we're going to talk about the new user workload monitoring features previewed and OpenShift 4.3 in GA and OpenShift 4.4 that allow you to them here yeah you leverage open shifts Prometheus monitoring and alerting system with your own applications in a fairly easy and plug-and-play way I think we can hear Eric now so we can hear Eric now but can I share a single test maybe this will work the question oh I see screaming I am very happy yay Eric all right yeah that's working we see your login page yeah all right so yeah I I caught myself by surprise because I forgot sort of what we were gonna do you didn't forget you just know I mean I've been talking to Josh about this and it was like Josh doesn't really know kind of how the stuff works and we had talked in the name was like oh I thought he was gonna drive but then I realized that I probably need to drive because I'm the one who's actually gonna write the code maybe also yeah I was trying to get set up metrics and monitoring and we were just getting started also if you could turn your input up a little bit that'd be great volume yes your volume you are very quiet we get to watch my settings yeah which are hard to find while I'm sharing my screen no I got it I hear you they are you go to that little zoom window the thing audience and hangs off there you go the the autumn is all the way up so let me know seers and maybe I could this is the joy of live streaming folks right like nothing goes perfectly it was we will get to a point in this process where we have this down we think yeah that volume is definitely there we go no that was all the way up to okay well I guess that's as good as we get I don't know how to get any louder you sound better now a little bit I think you just moving your mic help okay so yeah I don't know let's keep yeah sounds good let's keep ice cream yeah so I hear that from the hilltops Prometheus yeah so I've got an open two four four cluster here which I'm not sure if it's the GA release software because uh I don't know what our demo system gave me but such as life to be clear our demo system what Erica's made references there for the audience is what we were hinting at in that little intro that Chris and I were talking while Eric was coming online which is the the features we're gonna be looking at today were a technology preview in open ship 4.3 and I have reached RG a state in openshift four point four so even if we're looking at a little bit of a pre-baked version of four four this is where the features GA and that is the the user workload monitoring feature with open shifts built in Prometheus that we're going to be using to monitor our own application today's session cool so let's do a couple things here I'll get a folder set up here for our so more or less I came to Eric and said hey I'm an open shift dev advocate I'm generally well educated this is a new feature Prometheus's I'm not something I'm deeply familiar with and Prometheus has sort of its own dsl prom cool for writing queries and an alerting system for triggering alerts and i know that a lot of these features are maturing in open ships delivery of them so what we wanted to do is literally have him walk me through learning how to use these features yeah so I'm making a folder called Sinatra metrics so I'm a weird person and I like Ruby as a program here about that I mean you know I like a lot of different languages does that make me weird no but most people think that Ruby people are weird so such is life ah this repository has too many active changes yeah that's fine so yeah well it's like I have my entire home folder as it get repo but I have most of it ignored but anyway so we'll just make a new repo here to make vs code less angry so we'll go into our we'll try to go into our open shift environment and let's create a project for our for our metrics it'll call this metrics playground right in and for folks we might not know or we're tuning in to learn about open shift and open shift facilities I'll briefly say projects are kind of open shifts version of namespaces on steroids and a way of isolating teams and in the work of individual teams or individual developers from one another safely on single cluster deployments hey can you zoom or increase the size of your browser when you get a second yeah absolutely thank you you got it okay so here's a simple Sinatra Ruby application Sinatra is basically one of these Ruby Wow so now it's not really a framework it's more just like a like a server if you will and it's one of these real basic ones that like you got to tell it everything you want it to do so in this case so me writing go code is what you're saying so in this case I have to tell it okay you're gonna respond to slash get oh sorry you're gonna respond to a get request on slash with just the text hello world life alright pretty basic so the first thing we're going to do is our I've got this code so let me reopen this folder to make vyas code get more happy now that I have a repo in here Sinatra metrics increase the font size of you guys code to even more yeah Wow I feel like I feel like a very old person right now you can maximize the window to have more space I can't but it still doesn't make me feel any less old well you know it makes me feel younger cuz I can sit further back now I'm just fighting the vs code like it still has this stupid main oh I get I didn't commit anything this is the fun of live-streaming right so get status all right KITT and commit okay fine so how do I well I think you have your project open for your whole home folder rather than the project for you yeah but I just literally reopened this folder okay I'll open it again maybe it'll behave better now but it keeps wanting to let's do this new window close this window file open free sent folder Oh much happier okay there we go alright so I have this basic thing and so what I'm gonna do yes thank you okay so what I want I'm gonna take advantage or try to take advantage of the source to image framework which OpenShift knows how to use and so source to image is just a way to combine an existing based image with application code and so what I'm going to do is just make an actual repo on github because that's public metrics it's public repository that's fine we will do this and we will get ad remotes hope need to give that a name margin bit remote ad yep pair programming at its finest whoo so push that code to master okay great so we finally have some code in the public it's just a Sinatra app so I think a quick summary there's probably useful you're going to use a ruby builder within OpenShift to generate a container to run this code in and in order to do that you're making your repo available publicly so you can aim your openshift cluster at it I'll bet your code yep we're off to pickup code so we have our metrics playground project I'll switch to the developer view the the let me chime in there so there's an administrator view that we're all familiar with I think on my team we've introduced a developer view that makes this a lot easier for developer to just get up and running as fast as they can not have to see all this stuff on the side about you know cluster metrics and everything else right like that's for the ops or the admins of the cluster over here in the developer pane this is where we start deploying code and getting crazy with metrics and everything else right right I would say the developer perspective is intensely focused on application code and on a topological representation of the components of your app running on the cluster as opposed to the admin view which is a lot about like how many nodes are in the cluster what's the load average on each of them and an ops team sort of concerns as you mentioned right so like me personally I spend a lot of my time at the administrator view but when I'm on these live streams I spend a lot of time in the developer view and it's fun I get both oh thanks okay tweet off stuff scheduled [Music] well I have friends on Facebook who do technology stuff man no I have friends on Facebook too but on the browser and in an opening Facebook I know what that's gonna do to the quality of the streams URL but we don't have SSH so I need the regular HTTP URL paste that in here show Advanced Options I don't need to do anything fancy because it's all just in master and then the base whatever unable to detect the Builder image I'm pretty confident this is gonna be a problem and I know why it's gonna be a problem but we're gonna break it anyway just for funsies oh so it's Ruby okay two five sounds good enough for me sure why not what do we want to call this call it that there's so two different names there's the name of the application which is really the name of the grouping right and then there's the name of the resources so I'm gonna call them both the same thing I could call them something different whatever do I want a deployment or a deployment config you'll hear it's talking about this one a lot deployment is standard kubernetes capital G deployment so if you know about that and how it uses replica sets you know that's great deployment config gives us a couple of extra bells and whistles and one of them is automatically redeploying when things change and since we're definitely gonna be changing this code a whole lot we're gonna want to take advantage of that so we're gonna use it apply man config we should probably do but to give a quick overview on summary and to extend the one that Eric just touched on a little is analogous to what we were talking about with projects as a sort of accentuated a version of what namespaces do in default or standard kubernetes or plain vanilla kubernetes deployment configs are analogous to deployments in the same way there are additional openshift features that we've a that we built on top of the fundamental abstractions to enable some developer convenience features like triggering rebuilds the that Eric is specifically using in this scenario suite and then lastly we're gonna create a route routes are similar to ingress but again we've got some extra cool holes that go along with routes but this is going to expose our application go ahead and create this whoa that's a big is a big and so what's happening right now is that there's build running or starting or getting ready to get fired off right and what we see here is OpenShift has spun up the Builder image the ruby based builder base image all your base and then it pulls in the code and then it runs a build process but you'll see it actually didn't do anything and this is gonna explode and appall fire in a moment because I left something out but I sort of did it on purpose so anyways tune in chat and it's from Christian so yes or chris it assume rails is that why so you're kind of on the right path but not not totally on the right path so here we go it starts to pull in the image and then it starts doing all kinds of ability stuff and then it runs the assemble script and the assemble script builds my Ruby application well if we think about Java and building usually that's like maven and if we think about Python and building that's kind of like pip install and doing some other stuff so Ruby uses gem file but I forgot to actually have a gem file so that's why I couldn't detect what my source code was because it's looking for key files in the repo that help identify the language that should be used so when I put in the source repo URL and it it it introspected my repo it looked at all the files and it's like well I don't see anything I don't see a maven file I don't see a gem file I don't see a pip requirements text like I have no idea what the heck this language is it doesn't just assume that because there's a ruby dot RB file that it's rupee right so anyway this is gonna finish building and combining all the things it's gonna push the image into the internal registry and then it's gonna try and deploy it and it's gonna fail because nothing there's nothing to run it doesn't even know what to do so it runs the run script but there's nothing to run so okay so let's go back to our code what is wrong actually go ahead what is wrong with our code well sort of but what I'm actually gonna do here is I'm gonna try to do something silly slash not silly here's the build configuration and then what I think I can do is where is the hey look at that so github sorry let's back up builds and openshift have the ability to be integrated with web hooks and so I can tell github to call my cluster anytime the repo changes and so as soon as I upload code it hits open shift and says hey something happened openshift goes oh that means I'm supposed to do a new built so we're like already doing CI with almost no effort so let me come in here to the to the github i'll make this bigger settings on the repo web hooks at a web hook payload URL URL form encoded is fine the secret is embedded in the URL itself so I don't need to put anything in here I need to disable SSL verification because my OpenShift cluster does not use any known yeah yeah so the SSL cert that is exposed by my cluster is not known to github well the the CA is not known to get up so if you know anything about how all that junk works basically like it's not at all well I know you know how but I don't want to bother do we want to go into PKI infrastructure we can we got it that if you want so anyway so we got disable SSL verification just push the event I think is fine we'll leave it active okay great I don't know why that was not successful so many so many windows so little time [Music] it can you i order late the end of the line I don't get that all right well that means I'm works 2 for 3 I don't know why what was the error here King oh that's fine it actually worked it's just the whatever answer it got back it didn't like hmm okay I think that's the headers oh the response sorry yeah that's okay failure unsupported content totally cool we're good do we just have the type wrong like we no I don't think it matters okay I think it's just it didn't like what we told it but that's fine it wasn't it wasn't it couldn't find it it was just you know it wasn't supposed to work anyway you know what let's ask the docks to the docks when in doubt that's why we're item right it says Oh changes if we have the type wrong because that's what the the error seemed to indicate I mean he wrote a book you are too beaucoup Eric but you didn't write a book ain't nobody got time for that the same we're good so in our gem file we are going to need to define the Sinatra gem so I think I can just do well you know syntax I don't remember Krishna says you need to click update I need to click update I did click update I'll click it again and yes J I think he disabled as I saw hook yeah that's disabled yeah when I click update it does it just takes me back ok all right Jim foul syntax Saurus rubygems.org so we will make a file called gem file and we will add the source rubygems.org and i don't care about the ruby version and then so we want gem Sinatra we will add our gem file to our repo right and but yeah and as as eric moves along here I don't like sort of give a preview for the Prometheus part of the thing as we put this app out this gem file actually become key because it's how we'll add an exporter for Prometheus protocol to to the a to our demo app here this is weird build configs but it doesn't show me it says builds but it doesn't show me the bill anyway whatever in the yeah look it's running tada didn't do anything because we triggered a new build by sending that hook from from github to this overnight exactly when I push the code it caused the web hook to fire which told openshift to do the thing which happens to be a build so that's what's happening okay here we go it's copying the source code installing running bundle install which is good which is hopefully gonna pull in Sinatra dependencies sorry hey Sinatra hey ok so maybe this might actually work and it will fly us to the moon will it or send us as a fly me to do them yeah fly me to the new deployment hey oh wait something oh let's doing something says running oh you might consider adding Puma I might not so so not for what a wild a cat here you know no you could put up swimming in it but like it shouldn't need now joking you should oh it's using freaking how's it doing my I don't know it's no no it's trying to use rack so Christians question about yeah so rack is like Ruby middleware that it's hard to explain rack explain for Ruby developers here we go so it's it's it's between rails in the web server but the way that the bills are image is configured huge mistake oh you gotta get a cheap so the way that the Ruby builder image is written I think it expects and so if you look at the error message that we're getting its complaining there's no config dot are you found that that's a ranch is a rack up file okay so my question there is gonna be did we fail to generate this rack up file or did we generate it and it's not in the expected path no not war for this is gonna be like a like s - I run problem whatever whatever that source to image image i co org Ruby pretty sure it's this one so now we're getting into kind of some of the bowels of source to image if we look in the bin we look at the run so if Puma is installed which it isn't otherwise you might consider adding Puma okay fine if bundle exec rack up is null then exec this otherwise exact this other thing so I think what it's doing is this but that's actually failing so that's why it's crashing rack is not installed in the image so we're not getting this image or sorry we're not getting that error right it just says you might consider adding Puma it's this configuration configured our use not found so I think if we have a blank file that will be okay so would that be a blank file for the Puma dot no it'll be a complete so rack up - e is yeah it's just it's defaulting to look for config dot are you I mean actually Puma might make our lives easier maybe we put too much I don't know should we pull my or should we try to fix whatever's going on whatever you think is best because you're the real expert but this is probably gonna blow up without a Puma config maybe it wouldn't know well let's try adding a config that are you that's just empty by default we'll only set up session disfigured are you I'm gonna back up acquire my app I might have yeah I think we need to do this so let's try class my app Sinatra face is that what we have see we do not this is annoying figuring someone literally just texted me a picture of them watching twitch that's cool that's very cool thank you mr. Nix will Nick's little next yeah I will hey thanks buddy do that something about serving the module application I don't care but we do want the config file on my a Barbie okay so this is probably you know what there's probably like a OpenShift Sinatra example somewhere which that'd be a thing I'd be cheating though let's see so it's referring to Miami okay cool and then my app which we have in here yes Michael hey look we're gonna get another built builds builds builds number three wow that was fast here we go that wasn't real name somebody who will Nix was in but it was like a chain so again to kind of contextualize some of this a little bit while we haven't yet had a running server as a result what we are illustrating is s2i builds in OpenShift assembling all our components building them into a campaign container depositing that container into a container registry accessible to OpenShift and then deploying that new build on often the cluster so like we're seeing a lot of developer convenience steps assembled by by this process that we're running through we got more gems now maybe all the gems gymnastic we may have gotten those last time I just wasn't paying attention perhaps this should have been planned more no no that would be failure failure is part of the fun now exactly like learning in public if you just want to see a demo go to youtube we we learn in public here well we do things that should probably be learning but anyway okay fingers crossed maybe it worked this time ooh still blue details rocks hey good job I might consider no I don't want to okay so we have a route for this thing which we can see this button here open URL so when I click this I get my hello world yeah which I think it is again really important to underline that the open ships assemble all of these pieces along the line for us up to and including giving us a URL where external clients can access this contrived service that we're using yes okay awesome all right now if we look at the details on this pod we see some pretty basic stuff memory usage 32 megabytes CPU almost nothing Network basically nothing right but this isn't necessarily all that interesting or valuable or useful right because luckily none of it is particularly application specific these are exactly sistex about the pod in which our application is running right so so if we want to drill down and know more about the behavior of the internals of our code we need a way to instrument that yes so if we switch back to the administrator view real quickly actually wait whoa that's that's a new thing and it is beautiful it's it's certainly colorful rate of received packets wow that's exciting it's a lot of metrics yeah again and these are metrics that out of the box we have on OpenShift just about any particular deployment we've made or deployment config I'm a not have to this looks like it might be turned on already [Music] well we'll see what happens and maybe it'll work or maybe it won't okay if we go to the documentation again and find the monitoring services by default actually going to go back to the openshift UI if we go back to a measure of you we go into monitoring let me go into metrics we have a lot of interesting can I see No here we go okay so the cluster is already configured to fetch all kinds of metrics about stuff and so essentially when we build OpenShift and then when we install it we pre configure the cluster to be doing lots of metrics exposition and ingestion right and the way that we do that is with Prometheus rules and service monitors I think I got that right so if I go to actually I'll do this in the UI because it'll be fun and it'll be an experience if you look at CR DS so it's a I'll give a little background here in it this thing called a service monitor is as a CR D a custom resource to find in this clusters kubernetes api that describes an additional data monitoring point or an application with a set of monitoring points that we want to be able to describe to the cluster and have it start fetching those those metrics right yeah and if we look at the instances of service monitors you'll see that we have all these existing service monitors these are the built-in ones inside OpenShift that tell the cluster monitoring to look at for example the API server or to look at the marketplace operator and collect metrics on that marketplace operator right right and and architectural e-even these are a really cool thing to look at because they give you an idea of how we're we're extending kubernetes features and openshift in kubernetes terms so a CR d because it has a is a known format with a with a standard way of expressing some set of data means that other developers other communities within kubernetes have a way to describe this and talk about these these ideas of service monitors intelligibly with one another and that we can bolt them on to any kubernetes cluster because they're built out in terms of extending the case API itself yes so we've been shipping this built-in cluster monitoring for a while now but we have missing anyway not important but we haven't been sorry I just realized I wear like a sleep ring and I'm not wearing it right now and I have no idea what I possibly could have done with it oh I washed my hands and I took it off okay it's downstairs by the kitchen sink never mind sorry about that brain totally like my brain just went over there and it was gone like it was done there was no coming back until I went through that whole train of thought I will almost have it like walk away anyway so the cluster monitor has been there for a while and we didn't really give you a way to monitor your own applications you basically had to like self install your own prometheus instance which is panda but right so with four three open two for three and still now an open ship for four we give you a tech preview ability to tell OpenShift to use the existing monitoring stack to to look at your surfaces that you define and so it is looking throughout the cluster for service monitors and Prometheus rules to pick up and so what we're gonna do is look at the docs and figure out how to turn this on so that it will look for our own exported metrics and then we'll try and turn it on right on so the prerequisites make sure you have the config map object with a thing so let's see so someone in chat is saying you're looking for a service monitor we've already gone over that yeah yeah that's why it's a while back sorry it's okay cluster Monahan config config map okay does he get config map cluster monitoring config in the open shift monitoring namespace does not exist did I spell it wrong cluster Mon net tour ring I don't think I spelled it wrong maybe we're gonna create it still on tech preview yeah yeah but so here's the funny thing right it's like prerequisites make sure you have it right let them ID it's worded badly it's fine yeah I would I would agree with that about the doc but it seems to indicate that if we create it and then edit it to have these contents yeah you know which ought to be intervening from here I guess is does this need to be bright is the dark theme making out I think it's fine I think it's fine so we're gonna create a new file we're gonna paste the yamo content into this we're gonna save it in temp CM config name is custom honor and config it goes in OpenShift monitoring type of your workload enabled true okay so we will OC create this file success one thing to note only a cluster administrator can do this thing right right so in theory if you're not the person who owns the open shift cluster you need to talk to that person to ask them to turn this on so we are enabling a feature in the monitoring solution save the file monitoring your own services is now enabled automatically you can then check if the Prometheus workload prometheus user workload pots were created so if we run this command maybe it hasn't succeeded yet so yes but no please go ahead sorry cluster okay it started so here's user workload monitoring oops there we go okay at this point we have told via the config map we told the cluster monitoring solution hey well technically we told the monitoring operator stack hey we want to monitor user workload which resulted in this news operator and other stuff getting deployed by the existing operator it's like very inception esque but basically this is the Prometheus stack that's gonna monitor our laser work those typically right as opposed to the stack that's already running in the cluster that's monitoring the default statistics and metrics that were working at deploying a sample service to test your monitoring services you can deploy a sample service we don't want to do that because we are writing the sample service you can check that it's running setting up a role for sorry creating a role for setting up the metrics of collection I don't know that we actually have to do this this enables a user to set up metrics collection but I think I think we can get around that good for the role binding whatever we'll go backwards if we have to do it okay so setting up metrics collection it says to use the metrics exposed by your service you need to configure OpenShift monitoring to script metrics from the metrics endpoint wait we don't we don't have a metrics endpoint so how do we how do we actually do the metrics endpoint how do we make and what is that even supposed to look like right if we think about Prometheus atheists that IO and we look at their docks I think it talks about yeah a data model doesn't really it's not what I wanted first steps overview where does it tell me where's the thing the thing it's discovering service discovery looking for a better diagram here but anyway at that metrics endpoint that it's talking about right it says metrics and point Prometheus expects to see a JSON payload so I'll go back here to look at the data model so metrics names and labels breathe these fundamentally stores data as a time series every time series justified by a name and some key value pair labels whatever and so it when you visit that metrics endpoint you're basically it's expecting to see this like weird Jason not weird it's very thing you see a JSON payload in a specific format that tells it about what's going on in the application we could write this ourselves the payloads not actually JSON says metal mates that's true yes it's something that includes both JSON and other things I think I might be doing this wrong close enough right we'll actually see exactly what it is as soon as I figure it out so anyway we we need our app to send the message we could write something to make it do that but let's see Prometheus exporter let's see if there's already a metrics exporter for Ruby oh look at that there's a Prometheus metrics exporter sweet of instrumentation metrics primitives Ruby that can be exposed through an HTTP interface so yeah just to give everybody like some background on Prometheus it was developed around the same time as kubernetes was so it has been around a while and it has you know libraries for pretty much every language I've I've come across it's very very very very handy and developer friendly as well as operations friendly in my opinion okay so Jim for me this client I'm actually gonna do this locally first just for giggles cuz why not screw up my own laptop CD metrics okay bundle install and what this should do is install the Prometheus ruby gem it's the Prometheus exposition format there you go yeah and so in a really short summary form if we want Prometheus to discover and begin to ingest metrics from an arbitrary application we need to provide Prometheus with this slash metrics endpoint that that it can respond with this payload these libraries which exist as Chris mentioned for a great number of languages and runtimes give us that metrics in point and allow us to focus just on defining what metrics we want that in that that endpoint to export so that Prometheus can discover and represent them if I move the video box from zoom over here do you see it I do not not on June now I can actually look at you and it will look like I'm looking at you okay and then what I can do is I can look down so it looks like we're doing Brady Bunch stuff like Chris okay so Prometheus client got installed we're good and now if I Ruby my app locally it should work nope exits because now I have to do the rack thing you said you were a ruby developer this this is this is Ruby development I know I used to work in a ruby shop I get it sure yeah okay 1990s if I go locally localhost 1992 I should see hello world and now if I do metrics maybe it does something Sinatra doesn't do this alright let's figure this out oh do you have to somehow registered that that one URL with like do you have to tell some point is there yeah so in my app so I added it in my gem file so it's here but my my app doesn't know anything about it right right now my apps only loading Sinatra so we need to require the Prometheus client in the application I think I just paste it over that exactly what I did and then returns default Registry create a new counter metric register HTTP requests okay and then so increment actually would do it whenever it's called so let's do this we'll create the client will do all actually we'll do all these things just for giggles see what happens never done this before like legitimately never done this before alright so we have our registry we have our metrics we have our registering the metrics we have this helper function for accessing HTTP requests and then increment the counter but we only want to increment the counter where the counter should be incremented so we'll put it here in the slash action so when anybody visits the application at slash we would increment the metric maybe so we'll see if this actually does something I need to restart the application and it explodes HTTP request has already been registered Oh equivalent helper function I would help if I actually read copying and pasting from Stack Overflow for dummies here we go by the way this is legitimate I shouldn't developers because I really am not one I'm at I'm a crappy hacker at best alright make crappy hacker at everything okay here we go so if i refresh my hello world it blows up also great undefined local variable or method HTTP requests well that's because we defined here try get the Stack Overflow joke my my girlfriend is in the middle of some programming classes which I was trying to help her with on Sunday evening and we're working through this Python app to draw some charts and so you know I go and I google how to ingest JSON and Python she's like is this like is this program I mean I'm like it is what I do it well but that's like I mean I know I know we're sort of getting like a little off topic but the thing is like there's no excuse for not I mean sure there's excuses but like legitimately what's the excuse for not learning programming like you can legitimately sit in front of Google and 400 other three services and and build an app doing nothing but search and like free training online I mean it's what this is awesome right anyway so it is working now I just refreshed yes it is awesome and now if we go to metrics hopefully something crossed so they like I think I understand the hint that it shows at the bottom of that error page which is somehow or another either with a slash star that has a handler for every URL under there or by explicitly defining slash metrics there's got to be something that tells the HTTP server to do yep or we gotta keep rolling or we got to keep reading the directions there are two rack middleware available want to expose the metrics endpoint hey that's what we want to trace HTTP requests ooh that would be cool if we were doing like jäger service masters it's highly recommended to enable gzip compression we are totally not doing that all right so now we need rack and the middleware collector and the middleware exporter we're not gonna use the deflator but we are gonna use these two things where does this go does this go and my config that rack up file Oh duh it's like telling me right here it would be great if I would read what I'm actually copying and pasting from stack over though well you're not walking right now right no I'm just standing okay so you're standing and programming like that is too difficult things I feel like I was walking the other day but my wife is like you know I tried to watch her stream for a little bit but I just I couldn't deal with the like your head going back and forth driving me crazy I don't know that this is gonna work right but Ryan know what those use long lines is what Ryan is saying all Ryan Charlie's Jarvan and Mike what about the use lines right yeah Ryan Ryan would you like me to DM you the zoom blink [Laughter] Ryan are you rubyist save me for myself all right I've loaded I followed slightly more directions and now we will go to the metrics endpoint and okay but no look there are no HTTP requests right like everything's commented out but I think that's because since I started the server I have not visited the regular /url so if I go to the regular /url I should see the the hello world which I do and now if i refresh this page I should hopefully see that there was a HTTP recorded oh wow data look at all that data that's on a data yeah HTTP 1.0 because you know fractional requests are thing apparently oh and so now if i refresh my page - this was like a worthwhile point to make if you observe the data model that we were looking at for Prometheus they're all floats so that there's no way to just export an integer which is why you and so lilyc Kozik says that these are not comments these are help and type metadata smarter than we are well Lily is from the Prometheus team yes you are probably doing that thing where you're sitting and watching the television and you want to like strangle just watch the life like from their eyes I get it that's cool I'm sorry this is so painful but whatever so anyway look we've got requests this is awesome but we haven't made it work in openshift quite yet because first we have to commit our code Wow what just apparently we have files oh he says no worries by the way everybody's worried that's all right I need a GUID ignore because I want to ignore the vendor folder sweet okay I can't ever remember if I'm supposed to ignore get gemfile lock or not but we're gonna we're gonna go with it so this is ads Prometheus support because that's a cool commit message if we go back to open shift if we go to the developer view go back to builds go back to Sinatra look at the builds here's build number four look at the logs wait for the logs takes a long time for cops around it's cool that it has my commit message in there that's very nice also has your email so look out for that I'm pretty sure that if somebody wanted to figure out my email this is why I just put it online that way I know like everything is gonna get blocked at some point okay what's the problem so the gemfile lock which is like the state of what's going on is because I used bundle to install this thing so I actually want to remove this from the repo just this is like a another artifact of how openshift wants to do the builds I think like I'm not totally sure but just whatever will go with it so make it ignore as well I'm sure that somewhere buried in the sto local fine okay missed why Ruby we go and we look at the assemble script okay so what does it do it's for those applications that are using racket puts them in production mode it has bundle installed already in the base image so it does this thing installs application source button building your application source whatever and then it does bundle install but I think there's something that it does when there's already - my LinkedIn profile where was I deleted the file what was the error message could not find bundler to14 yeah so I I'm using a different version of bundler and that got baked in and so basically it was like well I'm trying to do this thing oh but I'm not the right version so it blew up so I had the gem file lock to the kid ignore remove it from the repo push built five logs so exciting by the way as somebody mentioned while my email address is available through my LinkedIn profile feel free to send me a LinkedIn connection request I'm happy to entertain all technologists I do many things alongside my Red Hat work that require a good network of people's mm-hmm happy to happy to connect I will find your LinkedIn profile and drop it in chat I'm doing hatsune football [Laughter] that the the III gent it's funny I'm more selective about Facebook than I am with about LinkedIn oh so like I I will there there are virtually no LinkedIn connection requests that I will reject unless they are like super obviously spammy like the only reason this person is connecting with me is because they're probably going to immediately try to tell me something right and then even then I usually accept anyway and then my response is just immediately like no thanks but yeah no I get the I get the hey I wanted to talk to you about blah blah blah blah blah blah and it's like disconnect sorry oh yeah okay cool so that built fine so that fix the problem we're moving the lock file so if we go back to our topology view we have an app deployed we see it's the fourth one so this is gonna confuse people right there's a - four in there like the number four but this was the fifth build but the reason that this is - four is because the fourth build failed so there was no fourth deployment with the fourth build so now the fourth deployment of this thing actually is the fifth build but like nobody's really gonna be looking at this stuff that closely but that's why if you're wondering why the numbers mismatch it's because this is the fourth time it was actually deployed anyway so we have this open I think already somewhere this app maybe no okay we visited the app this is good and now if we go to the metrics end point fingers crossed everybody all right nice we have somewhat liftoff liftoff okay cool don't worry about that being small you don't really need to read it okay now where are we at so we have the cluster open shift has been told to look for user well look for user stuff exactly yeah and we now have an app that makes stuff happen exports metrics so now we need to tie the two things together until the cluster Prometheus to actually look for the metrics so let's go back to the documentation so we need to create a service monitor which tells the Prometheus what what endpoints to actually consume and so this is where it's gonna get a little a little weird so let me copy copy pasta as Chris sure likes to say be pasta alright new file close this we're going to paste this alright turn off the terminal here write this file as slash temp service monitor mo and then we get cool formatting okay it is a type type service monitor I don't think it needs to be labeled so what does it say of course not so we're gonna delete that for now we're gonna call this our Sinatra monitor and it lives in the oh boy metrics playground metrics playground Patrick's playground namespace endpoints selector matched labels this is all we heard right so what we're going to do is we're actually going to look at the Prometheus documentation to explain better what where's the engine labels yeah what the service monitor spec basically guides where's Ryan he probably has a link bookmark somewhere right of course lutely probably just laughing at us now eating popcorn and like yeah throwing stuff at our monitor surface monitor tells it which services so here after each monitor no she's a radius operator not of Prometheus itself right yes yes Lily so the key word here is service monitor the pod monitor it's not deployment monitor it's not replica set monitor it's service monitor the resource includes a called service monitor selector which defines a selection of service so the key here is that there's a kubernetes service that prometheus is gonna look for to monitor right it's that service will be durable in the face of rebuild redeployments pods dying pods being scale that gives us a reliable endpoint to reach the implementers of that service behind that so service is a is an in cluster load balancer among a group of pods that implement some some arbitrary service yep and so if we look at the yeah mo for the service we see that it has a label of app Sinatra metrics so the match labels is gonna be apt Sinatra metrics well it's it's I think it'll actually just be Sinatra metrics I think the app is you're right sorry link there was no happy docks the key was app and the value is and then the important thing is this section of endpoints is looking for the name of a port defined in the service if we look at the service the service has a port named 8080 TCP so the port that we're looking for is 8080 TCP that's the name of the port we're gonna query it every 30 seconds that's fine the scheme is HTTP which I believe is as opposed to like gr PC or something I lost my Doc's where's the docs services discovered by service monitor endpoints port we could probably have left it out okay we're probably gonna have to do some our back II stuff but it'll be fun okay so scheme HTTP yes so we will save this and then we will create this file see SVC monitor diamo okay OC get pod - a grep atheist and what I'm gonna do is for giggles let's look at the logs for that pot and see what it says in Thea's key workload deprecated spec image just looking to see if there's anything interesting in here get namespace does not exist that's fine ok cool let's look at the logs for this one just to see does it say anything interesting gosh okay rules config map reloader basically I'm looking to see if like it was figured it out that it's supposed to do something completed loading of configuration file that's not interesting interesting nevermind Paula Phantom says and it is here yes that is the service monitor just go to the targets endpoint of the user workload Prometheus UI or does it have a route there's no routes in that user workload monitor and query your metrics well yeah I know I can query I was trying to prove that like the metrics were there but sure let's let's query them ok metrics it's like query custom chrome QL is what we want well what I guess HTTP requests oh yeah stuff HTTP request total how do I run the query just hit enter oh there you got it no data points okay well that's cuz we haven't hit the thing yet I'm watching Chad for Paul to say yeah we did it wrong okay so I just said it but Lily says just go to slush targets in point of user workload Prometheus you I yeah I'm not sure GP requests total enter no routes yet coming soon report forward okay so the metrics is not yet but I mean it's working in the sense that like it's collected it's collecting them oh wait HTTP requests total that's not an actual yeah or you're not generating that are you what's in there hey metrics so cool open a thing up to the world so we can get more metrics [Music] start typing yeah yeah so to kind of reiterate the key idea here is we deployed a really simple HTTP server app we added to it a library that exports some counters at slash metrics and we connected that by describing it in a service monitor to the onboard user workload monitoring and OpenShift so we've got facilities to draw graphs with it and and dig into it and and analyze it right here in the open ship developer console well people have done the thing i've done the thing sorry I want metrics give me metrics metrics oh cool so now we have we have an interesting metric right I mean it's not an interesting metric HTTP requests let's it's a useful metric it is but let's let's come up with some kind of imaginary metric right so where's the documentation for the exporter okay so what are the different types of metrics so we've got counters we've got gauges what is it gauge so gauges that's just like a different continuous value oh we just said it remember yeah yeah get a histogram provides a sum that's that's a lot a numeric data labels all metrics can have labels values all right so let's let's do something interesting right so we're gonna we're gonna create a gauge in here and you'll understand why I want to to do this room temperature Celsius doc string doctor thought labels room set a value labels room kitchen I got okay so what do we what do we want to do here so we're gonna go back to the app we are going to create a gauge room temperatures not very interesting what do we what kind of a thing do we want to measure and the reason I'm doing this is so that we can try to create alerts right because like we don't want to alert on the number of requests I mean we might want to like maybe we have a really terrible app that after a thousand requests we need to reboot it but that's that's not interesting right so want a gauge of something maybe the gauge of twitch viewers create a gauge called I mean if we want to go like totally wacko we can we can try to figure out how to tie this into the tree shape you know so we're gonna create a gauge of viewers or over the name for viewers with a pen with no doc string I'm gonna put a duck string in there just for giggles eight years later does this a labels are due to labels all metrics can have labels allowing grouping of related time series okay so we'll call this service and you'll see why in a second okay so we have a gauge what do we do with the gauge gauge set a value gauge get a value gauge incremental value gauge decrement devalue okay let's create a new endpoint in our application called twitchy twitchy and when twitchy gets visited hmm I don't know instead of random how do I do a random number and Ruby how'd it get a random number in Ruby use R and range okay one plus okay sure so it's a zero to whatever so we'll say num viewers equals R and I don't know we're not that popular so 50 this will give this a number from 0 to 50 and then we will set the gauge go back to the docs slightly offended by the we're not popular it should be will be the twitchy estores yeah well yeah that's about your gauge that said just copy this too many parentheses okay so we're gonna set this to the number of viewers we are gonna label it as I think I said the label was service we're gonna call this twitch okay let's try to run this locally see if it works come on back up okay if we visit our localhost version of this there's no metrics yet apparently getting metrics counts as oh sure it's a server request but it doesn't actually increment the metrics counter but that's okay we did that on purpose all right so if we go to what did I call it twitchy twitchy nothing happens but that's okay nothing really was supposed to happen because we didn't tell Sinatra to return any dinner so now if we go to localhost 1992 slash metrics we should see twitchy yeah but I don't see the gauge yeah maybe because I forgot to register it well create gauge now yeah I have to register it from ACS register fewer gauge okay so we have to reboot our app server real quick help viewers each of my viewers so it's cool we're in there if we visit twitchy and then visit our metrics we see that we have 35 years maybe on service twitch huh interesting cool and if I hit that page again twitchy and i refresh the metrics now we only have 21 viewers well so I'm pretty sure that's the algorithm like you know how YouTube does the the weird you never know how many views a thing actually has um I'm pretty sure you're not gonna get the if you keep pinging it you'll get a different number every time okay so we can push this code live now so this adds a it's touchy end point push it and then if we go back to our builds we'll see that it's building build number six is running we'll wait for this to finish actually while that's getting ready to finish let's go find the documentation for alerting rules creating alerting rules so the difference between a service monitor and a learning rule right Prometheus is the thing that alerts and it alert managers the thing that delivers the alert so when you're configuring alerting what you're doing is you're telling Prometheus at what condition to tell alert manager to deliver an alert so Prometheus collects metrics collects all the things and then if the metric exceeds a condition for which an alert is defined it goes in a calls alert manager and says tell somebody about this thing we will create an alerting rule for the number of twitch viewers oh actually oh yeah sure we're gonna be okay here's our Prometheus rule why is this version alert this configuration creates a learning rule named example alert which fires an alert when the version metrics oh I guess this is the name of the alert wait I need to file some Doc's books here this would be nice if these were clear I have a friend in Docs now we have lots of friends and dogs well I don't think I mean I'm sure Ally's not watching but she's definitely probably do some nice to make up for this we're gonna call this temp service alert Tiamo which doesn't make any sense but whatever so we are gonna call this too popular that's gonna be our that's gonna be our alert metrics example so this is gonna be twitch rules version alert I got I gotta look that up because I want to understand how this actually makes any sense come on there we go service monitor tells it to monitor service easiest rule exposing of learning describes what maybe I want the syntax of kind for me yes that's deploying aware of my previous role here we go well that's not useful explain me the alert it's configuring the alert manager now want to understand what that name is Lily still watching Lily where's the documentation for the actual Prometheus rule alerting thing by getting started tells me huh I just saw lurking well yeah but we were just looking at that yeah yeah I want to understand the Prometheus rule syntax for so like here's the Prometheus rule fine this is I don't I don't even understand so alert like alert example er it's just the name of the alert I'm gonna screaming so I'm gonna go with that we're gonna find out so we're gonna call this it won't work in 4-3 alerting on custom user metrics for 5 and onwards as with below so this is for for is it gonna work for 5 and onwards is what she said we'll find out Paul Drewes sent me a link on the yeah he said in the bottom of that page for alerting yeah that's where we were it's not I mean it's sort of yes it doesn't explain the syntax of the alerting no no the actual alerting md file this yeah but it's so the syntax like yes here it is but it's not clear that like this defines the name of the alert whatever we're gonna try it anyway and if it doesn't work so be it expression version job equals syria i think we're gonna need some code too okay let's see so the build completed we're good if we go to topology view we're cool here's our app hello world that's great if we go to the metrics we see eg many viewers nothing's there if we go to twitchy nothing happens but that's fine and then we go to metrics and we should see something metrics viewers eight Wow people are really bored okay they're not bored it says 35 over here so your so here we go here's the syntax right viewer service equals twitch so expression viewer service which greater than 40 do something send an alert right I don't think this is actually gonna do anything thanks Paul because it sounds like somebody says this doesn't actually do anything but you know what we're gonna try it anyway there's more links I'm not sure if you saw content of rules follow Prometheus format of alert awake viewers thanks Ryan it's clearly you're awake so I guess we're doing well and we're doing a good job here alright so I created this prometheus rule I have no idea how to figure out whether it's working or not I cuz alerts to alert to events to alert show up as events I don't know that I can see them here I might have to go to the admin view to see them maybe but I think Lily had said that it doesn't work - oh yeah right and so I think if an alert were fired it does show up and events I mean of course most of the alerting machinery is oriented around emailing or ringing a pager doing actual alert things on the outbound side but think that we would have an alert notice in in the in the events but I don't think we're actually I don't think alert sirens work based on what Lily is told us I don't think alerts are events yeah I think you're right Erik their alerts with their own thing what this is actually sure yeah lurk managers the thing that controls what gets like broadcast right show me everything right like if it's not good like click the plus sign next time not grouped a little bit down a little down yeah what is that watchdog oh this was showing me all the alerts I'm like totally looking at it and not understanding what it's telling me so this is just these are currently alerts that are happening so for example alert name image pruning disabled is great okay so currently we only have 8 viewers so we need more we need more viewers than that to make the alert happen so let me go to the Chi now if everybody goes to that URL that's gonna screw it all up so please be oh wow now we're down to three viewers it got worse oh I just totally did something bad well this is gonna weird beeping in the background oh this is gonna be real bad do you still see what you're supposed to see yeah why what's up you know I have everything on a KBM and double page down is changed to the other thing but that monitor is going through the KVM and I've got some things on my laptop screen and my main screen in front of me and so it looks like my webcam has stopped your webcams fine oh you see it yeah it's not moving nevermind I'm saying so let me how do I get back to the settings here that is exactly what I have 15 and hopefully that's gonna work is it gonna work it's not working that's a bummer yeah nope there you are on a different case well a different webcam so side view now that's fine oh there we go in your racket thank goodness hey I gotta be more gentle with the page down but yeah no kidding all right what do we set the alert for 40 some 40 Inc it's at 47 all right so now we're at 47 which isn't good as long as nobody goes to it wrecks us don't visit that URL monitoring metrics we want to look at fewers worse cool up somebody went to it boo my one knife only got we got bumped yeah we're back at four viewers 29 yeah 36 come on baby 43 already whoo but I don't think alert manager is gonna figure any of that out I think this one's still got 43 viewers yeah we're just waiting for Oh cuz the there we go okay so now we're back up it has detected that we have lots of viewers but I don't think alert manager is gonna do anything why is that because I don't think it works well I don't believe larch exists yet yeah oldest I arrived today thinking that it existed but I've learned that it doesn't yet yeah so so maybe Lily can tell us is it that it's not it's just not looking at the alert rules like the Prometheus that's configured for user work load monitoring just doesn't care about Prometheus rule alerts or is it that the Prometheus user thing doesn't know how to find the cluster alert manager or what's the yes it we understand that it sounds like it's coming in four or five the question is like what part isn't wired together right yeah like what is the missing piece kind of in pursuit of finding out what we might wire together to make it work manually right but I mean I you know we can keep going and trying to do other interesting things but at this point you know we've created interesting custom metrics that are showing up in the metrics UI yeah part I think in an Eric if you want to bring that back up on the screen like that's which that's the coolest point we've added arbitrary measuring which internal Bo the view sorry uh yeah I meant the metrics view ordeal the alerting view back over and the openshift console because what we've been able to do and remarkably and actually a fairly short span of time is yeah take a little app arbitrarily measure different parts of its internal state and present them back in graphs right in the OpenShift web console without really having to build out any of the visualization part all we've had to do is identify an end point and had Prometheus to go scrape it for us and we only had to learn a little bit about Ruby to do it and we only had to learn a little Ruby so hopefully hopefully whoever was disgruntled about us doing software development and app deployment is happier now don't be disparaging I'm not trying to be disparaging somebody was disgruntled legitimately well they're gruntled now the opposite right in and what I don't know but I'm good at making up words so many words so many words I don't I don't I think we've sort of achieved the goal for today yeah and I mean he's a little bit funny that we didn't have an absolutely correct understanding of the alerting piece but we actually did illustrate a pretty useful process for anybody who's building applications on the platform and needs to measure internal state of that application and with the with the with the knowledge that the upcoming feature is wiring automatic alerts based on these custom counters of internal application State in the very near OpenShift version future ok so lily has provided us a comment sorry we're not looking at the camera yeah I'm trying to I'm trying to say for it I understand the global view in creative learning rules but just on your just on your own custom metrics not an alert manager yeah no right an alert manager is the thing that routes the alerts totally got that so the question is technically right now because the number of twitch viewers is too high the rule exists for the alert so is there somewhere that I can see the alert like being fired or triggered right because I understand that you know alert managers does it do anything was there somewhere else yeah or - the question another way like would we have to go all the way to configuring alert manager with a known set of alert targets to be able to observe so Ryan asks in essary terminology did we establish a new or custom service reliability indicator you can see it in Prometheus UI for your user workload monitoring well I can see the value but how do i how do I see is there a query that's like alert would you see it and local oh no because the application doesn't know about so I think what Paul is suggesting is that in the admin perspective right the Prometheus UI Wow so this is the for me feeis UI for user workload monitoring which we never exposed so I think what he's saying is that I would have to expose the Prometheus UI for one of those user work load monitoring pods and then we could go to it Paul and and see it so no port forward the Prometheus user workload Eric what is more forward the Prometheus user work and I could be wrong supposing the service the same thing sorry to the admin UI I think we have a Prometheus UI yeah but it's not it's we do but I don't think that's the same one because there's a there's a different Prometheus right that we have this other Prometheus of pods and appointments running to do our user workload monitoring right yeah this this if I click the go to it from this view this takes me to the cluster one and Paul and Lily have also further clarified that your you're correct Erik like what they're referring to is specifically the user were close monitoring Prometheus and they're talking about port forwarding to its its UI in point the service prometheus use your workload service right here so couldn't I do OC expose we're gonna do it we're gonna try to do it yes that's a service okay OC kit route I'm gonna do it it's either gonna work or it's not client sends an HTTP request through the HTTP server okay yes that's curious fine because o because I think it's not proxy what about how do you have t need to be logged in know internally it is I guess it's 90 91 90 90 I think oh look at screen I mean yeah I know it says it's target port metrics I'm really hello I sorry lily is here like she's actually doing this stuff live she's checking her bash history now okay that'd be a little easier so if we look at the pot the pot should define ports that it does something with 1991 here's that here's the port that it exposes so OC works for hoc port forward zero [Music] one nine okay says it's doing it so now if we do local host 909 one Oh it's thinking yay it's better we've been here before which i think is probably gonna take us towards Lily's comment about OC or about OAuth proxy yeah yes because it says because I would need to send some kind of token Lily drop a comment and here cube port forward yeah right yeah yeah so I don't think this is gonna work no but could we fix it because it's just it's configured to want to there's no but 99 he's not exposed in the pot main enemy wanted to cube our back proxy 1990 is prometheus they understand how oh I guess the are back proxy in turn oh maybe I didn't scroll up far enough just do a find I think I see what they're saying and had your thoughts where scroll up well this that's a URL sorry anyways I'm just looking for 99 but that there's probably some awful JQ command to make that work right so Paul says this is why you need port forward uh-oh connection closed I know what that means well because it's probably HTTP alerts which popular yay yeah look at that e to many viewers look at that kids we did it oh but I did now and Eric and Lily did it with special guests Paul who by the way folks are like at least lily is in Germany and it's probably I don't know very late relock in the morning right yeah that's true it is now it's like 9 p.m. maybe 10 days later but the fact that the fact that they're willing to hang out with us like absolutely after thank you so much and and that neither of them have managed to kill me somehow through my computer yet I think that doesn't mean they're not planning to it's only 9 p.m. I will end up dead at some point writing that program iteratively and they just haven't got it all the way through to bogeys yeah so Lily says all this magic is now done for you from 4.5 onwards now it'll be cool when he's been poured forward oh it says it's not active wait did somebody did somebody change the number of viewers this is weird well so the number of yours is is greater than 40 but the alert I see the alert but it says it's it's not active active so shouldn't it be active unless do my down again can you go look at the graph over in the UI so should be should be on hiring and theory fewers namespace metrics playground service not there is that the right greater than or less than symbol I never get those right should be it is it points at the little one Chris yes I have to remember every time I have to look it up every time I used to have a sticky note exactly what it is execute no data yes the alligator this is true yeah well I was always told we point alert oh here we go okay sir this is what the wave define my room correctly no sewer service twitch oh what's the metric though now viewer service twitch exported service twitch oh I think I'm using a no I'm overloading a word because look at that look at the actual data exported sorry sorry right words so if we look here the service that's being monitored is it's not for metrics we don't want to measure that yeah so I need to call this alert exported service twitch export yeah there we go because if I change my viewers to exported service cute yeah it still works okay so once I change this rule hopefully Prometheus will pick it back up at some point I don't know what the interval is on which I think it's 30 or 60 seconds you know well that's the that's the interval for polling that we defined in the service monitor right what I'm saying is we just edited metrics playground too popular we just edited this definition so the question is when does Prometheus reload the Prometheus rules that are defined yeah it still has not loaded I'm still over 40 right yeah well it's it still says service equals oh yeah and not yeah yeah there we go so maybe they'll know firing there is a firing alert yeah bang bang all right so now we have actually finally succeeded and doing the thing that we set out to do it's almost completely huh yeah now what yeah and as Lily points out that that recycle time was basically kubernetes brief cycle time so it were what we were waiting on is for kubernetes to go through and reload config Maps that had changed yeah very cool whenever Cates reloads config maps zero to five minutes is the answer to that question because my assumption is that the operator makes and Lily or Paul can confirm the per medias rule is a CR D and so when the CRT changes some operate sorry previous rule is a CRT too popular is a custom resource instance of a Prometheus rule and there's an operator somewhere that's looking at Prometheus rule instances and I'm guessing it manipulates a config map eventually once the Prometheus rule instance changes which then eventually gets reloaded by Kate's into the prometheus pod at which point it HUP's itself and knows about the new rule definition does that make sense as a convoluted description but yeah at that point I think you mean a picture but yeah see this yeah so if we look at the ammo for that okay here's our twitch right so okay so there's a Prometheus rule called too popular in my project mm-hmm the Prometheus user workload operator this thing is looking at all of the projects to find for me these rules and so it found my Prometheus rule and then it found them I for me he's real changed and so it updated the config map with the new rule definition and what Lily said about reloading config Maps at some point Kate's says oh the config map is different than what is actually in the pod it fixes that somehow at which point the Prometheus pod is like oh I have different rule definitions and then it it loads the new rules and then that's when our alert finally fired so if we look at the logs for this pod I wonder if we'll see where it picked up the rule or something no it doesn't actually that'd be nice if it did but help for trouble yes so just scroll back up to your question you said did we just establish a new service reliability indicator so the answer to the question is yes with an asterisk it depends on the context of the indicator so if you could create a gauge in your application you know that was a metric of health right that was derived from other things going on in the application absolutely you could now have this be an SSRI oh the the measurement would be the SLI and the high water or low water mark of the right we define an alert based on VI and so the other thing that you can do with medias rules which you have to be careful of is you can do like derived mathematical whatever's let me see if I can pull up the docs for it but there's basically a way to do stuff like you could do math basically yeah which might be a combination of two counters or a function of one counter by the other counter or other ways that you want to massage two sets or three sets of data points to make them useful as a high or low water mark trigger right yeah yes that would be called a recording rule right sorry not necessarily where's the yeah so you can define yeah you know like an example of this that when we talked about this before Eric tan example yes that always comes to mind for me is like if I have a some kind of a sensor that say measures temperature it may not be giving me a degrees Fahrenheit or degrees Celsius it may be giving me a raw count from the sensor that I could then apply simple math to to get a degrees Fahrenheit number it may be giving me seconds when I want monies I'll give you a perfect example of that in the real world so I'm a car person I like motorsports and car stuff and so my I have a and programmable engine computer for tuning the performance of the engine and so it outputs sensor values and so there's a messaging bus that it basically spits out all these numbers but it doesn't do pressure in psi or bar it does it in kilo pascals but it scales small number right yeah but not only is it a raw number but it's a raw number that's in reference to atmospheric pressure so technically it's already 14 psi or whatever the current barometer is like above the actual value so if I want to display on my dashboard in the car the oil pressure I have to convert from kilo Pascal's to psi and then subtract out 14 pounds because atmosphere and then that's the number I can show so in in Prometheus terminology that's a recording rule that would be you know whatever the scale factor is 4 kilo Pascal at a PSI plus 14 and so instead of having to alert on a thousand and 96 kilo Pascal's I can have a different thing that's like pressure psi as a metric and then I can build an alert off of pressure psi but that two derived metric right right and so our I'd like to summarize and whether like not intending to be perfectly correct but the idea here being we could have a function that first subtracts a sense of ambient atmospheric pressure yep and as a conversion from yeah extra from this to something more stands like more humanly recognizable or more easily painted on a gauge in a cardboard like you're talking yeah mm-hm yeah and so if you had a bunch of different metrics that you could mathematically combine in some way to get a single health score if you will then you could have a alert that fires when health is below some threshold or above you know whatever right the one caveat to this precompute so if you're doing like crazy math and you're doing lots of crazy math across lots and lots and lots and lots and lots of rules well crazy math is math where in is large despite the fact it is whatever but the more the more complex your recording rules get the more horsepower you're asking Prometheus to use every time it has to calculate so it is entirely possible that you can crush just be careful if you start to do these you know like sums with square roots with you know expiration if you start throwing a lot of you know like this metric with this metric with this metric and all the comparison of it and then you get this algorithmic thing and then boom like it's that's computational time right like that Prometheus doesn't magically solve that problem you still have to like account for that yep so as a quick background aside for our viewers Eric what does a racecar does it have square tube frame rails and a naturally aspirated v8 and now are you an SEC any kind of right car guy III am an SEC a slash NASA kind of race car person so yeah and so for those who don't know SCCA is like yacht racing for people who like race cars those you know NASCAR which is my father actually was that's what he did for a living was drive NASCAR race cars drove yeah if you didn't know this about Josh yeah like my only real my only real engineering background my degree is in journalism everything I know about engineering was from doing trig and sitting Tassie's up what's that what's my engineering background my father's name is bobby rust would right out of Talladega Nights here I'll find you a link Hey look at that career stats yeah so we raced what's they referred to here as the Southeast series and dad was Rookie of the Year and what was then the NASCAR Slim Jim all Pro Series in 1994 yeah we were at the time we were like the third level NASCAR series the what were the craftsman trucks and are now like the Camping World Truck sacktual EIS and now that bird level so it's like a double a baseball like will leave this after this but I did I got since we brought it up I got to say this one thing if you go down a few rows and you see that Kendall Oil gt1 yeah that that was my like first professional action in my entire life I was about 14 at the time and I did that sponsor deal with Kendall oil that's awesome like one of my roles in the race super cool dude negotiations for billboarding essentially look at that we were racing Ultima Beals of car brand there are probably people on this stream who have never heard of but it was cutlass the very I'm all cutlasses yeah yeah I guess the chief was later that might have been 97 98 or something like that I don't know whatever anywho cool beans all right well that's cool we did it we got alerts and everything not at all Thanks and and we got a little card talk out of it yeah never have to worry about card talk with me I'm always happy to talk about scars cool beats all right well I got I got nothing Chris what you got I got nothing I would like to invite everybody tomorrow morning first thing bright and early zero nine hundred Eastern Time 1,300 GMT UTC we are going to talk about OpenShift virtualization which I'm super excited about and then later in the day I will be a co streaming an event with OpenShift Commons where they have our global transformation office which is Andrew clay Schaefer and John Willis and jab broom and that whole team jabe I forget his last name boom boom but there's two names anyways they're all gonna be on chatting with Diane and we'll livestream that here as well tomorrow at noon Eastern which is 1600 UTC so you've got two shows tomorrow for you looking for a packed house stay tuned we'll get we'll get more schedule and info out as we can set up more infrastructure to get that out to people we are literally doing this as live as we can well at least it's plugged in today so thank you thank you so much Josh would thank you so much Jakob's thank you everyone in chat lily paul ryan thank you all so much have a great afternoon have a great evening wherever you are stay safe out there right like we want to see you back here tomorrow so thank you everyone talks in thanks a good night [Music]
Info
Channel: OpenShift
Views: 1,617
Rating: 4.8666668 out of 5
Keywords:
Id: zCPHyLWt7Ew
Channel Id: undefined
Length: 136min 53sec (8213 seconds)
Published: Sat May 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.