Observability vs. APM vs. Monitoring

Video Statistics and Information

Video

Captions Word Cloud

Captions

These days I hear the terms Observability, Monitoring, and APM, or Application Performance Management thrown around seemingly interchangeably, but these terms actually mean quite different things. So let's dive in head first and see an example of how exactly these things differ. So to start I'm going to start with kind of a Java EE application, it's kind of old school, we'll go back you know maybe a decade. And let's say that we've got some components in this Java EE app that actually power it. So something important to remember here although we might be using a SOA, or service oriented architecture, this is not exactly microservices. So they're not communicating over Rest APIs. So you have some inherent advantages here, for example you can take advantage of like the framework the Java EE framework to output log files which will probably all come out into the same directory and the timestamps match up so things are good. In addition, you could take advantage of something like an APM solution which is kind of like a one size fits all set and forget so you install it and it'll kind of get rich analytics and data and metrics about the running services within the application. So essentially what we've done is we've made our system observable so that you know our Ops teams were then able to kind of look into it and identify problems and figure out you know if anything needed to be done. So for the business objectives back then this was essentially good enough, but this tends to fall apart very quickly when you start to move to a more cloud native approach where you have multiple run times and multiple kind of layers to the architecture. So let's say we have an example app here. So we'll say we'll start with node as a front end. Let's say we also have a Java backend application. And then finally let's say we also have a Python app which is doing some data processing. So let's see how these things work with each other so the front-end app probably talks to the Java app and also the Python app for some data processing. The Java app probably communicates with a database and then the Python app probably talks to the Java app for kind of crud operations. So this is kind of my quick sketch, kind of a dummy layout for a microservices based application. You can take it a step further and even say that this is all running within Kubernetes. So we've got these container-based applications running in a cluster. So immediately the first problem I can see here is that with multiple runtimes we now have to think about multiple different agents or ways to collect data. So instead of just one APM tool we might have to start thinking about pulling in multiple so how would we con consolidate all that data right so that's a challenge. In addition, let's think about things like logging. So each of these runtimes probably outputting logs in a different place, and you know, we have to figure out how we consolidate all those. Maybe we use a log streaming service. Regardless you can see the complexity starts to grow. And finally, as you add more services and microservices components to this architecture, say a user comes in where try to actually access one of these services and they run into an error you need to trace that request through the multiple services. Well unless you have the right architecture infrastructure in place, you know something like headers on requests, maybe a way to handle web sockets, things are going to start to get messy and you can see how the technical complexity grows quite large. So here's where Observability comes in and actually differs, and differs itself from kind of standard APM tools. It thinks about the more holistic cloud cloud-native approach for being able to do things like logging and monitoring and that kind of thing. So I'll say there's three major steps for any sort of Observability solution. We'll start with the first one we'll call it collect, because we need to collect data. Then we'll go to monitor, and we'll talk about this because this is you know part of monitoring. And finally we'll end with analyze, kind of doing something with the actual data that you have so with the collect step, you know first thing let's say that we actually made our system observable. So the great thing is with Kubernetes you get some CPU memory data automatically. So let's say we get some of that, we get some logs from the application all streaming to the same location and let's say we even get some other stuff like high availability numbers or average latency, you know things that we want to be able to track and monitor. So that brings me to my next step. So once we have this data available we need to be able to actually do something with it, at least visualizing it maybe if we're not actually even solving problems yet what do we do with this data. Well maybe we create some dashboards to be able to monitor the health of our application, and say we create multiple dashboards to be able to track different services or kind of different business objectives, high availability versus latency, that kind of thing. Now the final thing that I want to talk about here is what do we do next. So say we found some bug in the application by kind of looking at our monitoring dashboards and we need to dive in deeper and fix the problem with the node app. Well the great thing about that is an Observability solution should allow you to do just that, it allows you to actually take it even a step further because these days with Kubernetes you're getting a lot of that information from the Kubernetes layer. So this is something I want to quickly pause and talk about. so with APM tools in the past they were really kind of focused on kind of like resource constraints, CPU usage, memory usage, that kind of thing. These days that's been offloaded to the Kubernetes layer, so you know Observability kind of took APM and evolved it to the next stage, pulled it a step up and enables our users to focus on things like SLOs and SLIs, Service Level Objectives and Service Level Indicators. So these will enable you to actually focus on things that matter to your business. So things like making sure that latencies are low or that application uptime is high. So I think that's kind of the crucial three steps for any sort of observability solution. Let's take a step back again. These things can be hard to set up on your own with open source projects and capabilities pulling all the different things together, so you might be looking at an Enterprise Observability Solution and so when you're comparing competitors and looking at building out your enterprise observability capability I would look at kind of three main things. Now let's start with automation. Now every step of the way we need to make sure that automation is there to make things easier so let's say that our dev team pushes out a new version of the node app and go from v1 to v2. Now let's say they inadvertently introduced a bug. Instead of making a bulk API call they now make individual API calls to the Python app. So in our monitoring dashboard our Ops team's like oh guys something's wrong, the DB app is getting a lot of requests what's going on? Well you need to be able to kind of automatically go back and trace through the requests and identify what happened. That actually brings me to my second point as well, which is context. It's always important, I can spell, to have that context. So automation is important here because when upgrading to the new version a node you want to make sure that the right agent is automatically installed and kind of the instrumentation is in place so your dev team doesn't quite have to do that, and as new services get added you want your monitoring dashboards to be automatically updated as well. And that context is extremely crucial as with this example we needed to be able to trace that request back to the source of the problem. So once we've traced that request back to the source with that context that we have the third step here and I think probably one of the most important is action. What do we actually do now? And that brings me to my last step here the analyze phase, which remember we talked about was kind of an evolution of traditional APM tools to kind of the the way that Observability tools implement that today. So when you get to this step you'll probably want to look at maybe the SLIs within the node app. Maybe dive in deeper, right. So maybe you look in and you identify that you need to look at application trace logs. So you look in the trace logs and you identify some problems and you figure out what the what the fix is you tell it to your dev team you know maybe the last step here is fix and then rinse and repeat for any other issues that might come up in the future. So I think Enterprise Observability is extremely crucial here when we're kind of looking at the bigger picture because it's not just about having the individual pieces, which again like I said might be quite hard to set up with purely open source approaches, but you want to think about automation to make sure things are kind of set up seamlessly to reduce the overhead on your side. make sure you have context to be able to see how services work with each other maybe even generate things like dependency graphs to see the broader view because you might not always have a light board like this to see the architecture so cleanly. And finally being able to take action when you do find a problem. So making sure that your Observability solution has a way to automatically pull together data from multiple sources, multiple services, and then figure out what's valid and necessary for you to be able to make that fix happen. So IBM is invested in making sure our clients can effectively set up Enterprise Observability with the recent acquisition of Instanta. To learn more about the acquisition, or to get a showcase of the capabilities be sure to check out the links in the description below. As always thanks for watching our videos. If you liked the video or have any questions or comments, be sure to drop a like and a question or comment below. Be sure to subscribe and stay tuned for more videos like this in the future. Thank you.

Info

Channel: IBM Technology

Views: 150,378

Rating: undefined out of 5

Keywords: Observability, Application Performance Management, APM, Monitoring, API, Application Performance Monitoring, IBM, IBM Cloud, Instana

Id: CAQ_a2-9UOI

Channel Id: undefined

Length: 9min 40sec (580 seconds)

Published: Fri Feb 05 2021