Observability vs. APM vs. Monitoring

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
These days I hear the terms  Observability, Monitoring, and APM,   or Application Performance Management  thrown around seemingly interchangeably,   but these terms actually mean quite different  things. So let's dive in head first and see an   example of how exactly these things differ. So to  start I'm going to start with kind of a Java EE   application, it's kind of old school, we'll  go back you know maybe a decade. And let's say   that we've got some components in this Java EE app  that actually power it. So something important to   remember here although we might be using a SOA, or  service oriented architecture, this is not exactly   microservices. So they're not communicating over  Rest APIs. So you have some inherent advantages   here, for example you can take advantage of  like the framework the Java EE framework to   output log files which will probably all come  out into the same directory and the timestamps   match up so things are good. In addition, you  could take advantage of something like an APM   solution which is kind of like a one size fits  all set and forget so you install it and it'll   kind of get rich analytics and data and metrics  about the running services within the application.   So essentially what we've done is we've made  our system observable so that you know our   Ops teams were then able to kind of look into  it and identify problems and figure out you   know if anything needed to be done. So for the  business objectives back then this was essentially   good enough, but this tends to fall apart very  quickly when you start to move to a more cloud   native approach where you have multiple run times  and multiple kind of layers to the architecture. So let's say we have an example app here. So we'll  say we'll start with node as a front end. Let's   say we also have a Java backend application. And  then finally let's say we also have a Python app   which is doing some data processing. So let's  see how these things work with each other so   the front-end app probably talks to the Java app  and also the Python app for some data processing.   The Java app probably communicates with a database  and then the Python app probably talks to the Java   app for kind of crud operations. So this is kind  of my quick sketch, kind of a dummy layout for a   microservices based application. You can take it a  step further and even say that this is all running   within Kubernetes. So we've got these  container-based applications running in a cluster.   So immediately the first problem I can  see here is that with multiple runtimes   we now have to think about multiple  different agents or ways to collect data.   So instead of just one APM tool we might have  to start thinking about pulling in multiple   so how would we con consolidate all  that data right so that's a challenge.   In addition, let's think about things like  logging. So each of these runtimes probably   outputting logs in a different place, and you  know, we have to figure out how we consolidate   all those. Maybe we use a log streaming service.  Regardless you can see the complexity starts to   grow. And finally, as you add more services and  microservices components to this architecture,   say a user comes in where try to actually access  one of these services and they run into an error   you need to trace that request through the  multiple services. Well unless you have the   right architecture infrastructure in place,  you know something like headers on requests,   maybe a way to handle web sockets, things are  going to start to get messy and you can see how   the technical complexity grows quite large. So  here's where Observability comes in and actually   differs, and differs itself from kind of standard  APM tools. It thinks about the more holistic cloud   cloud-native approach for being able to do  things like logging and monitoring and that   kind of thing. So I'll say there's three major  steps for any sort of Observability solution.   We'll start with the first one we'll call  it collect, because we need to collect data.   Then we'll go to monitor, and we'll talk about  this because this is you know part of monitoring.   And finally we'll end with analyze, kind of doing  something with the actual data that you have so   with the collect step, you know first thing let's  say that we actually made our system observable.   So the great thing is with Kubernetes you get  some CPU memory data automatically. So let's say   we get some of that, we get some logs from the  application all streaming to the same location   and let's say we even get some other stuff like  high availability numbers or average latency,   you know things that we want to  be able to track and monitor.   So that brings me to my next step.  So once we have this data available   we need to be able to actually do something with  it, at least visualizing it maybe if we're not   actually even solving problems yet what do  we do with this data. Well maybe we create   some dashboards to be able to monitor the  health of our application, and say we create   multiple dashboards to be able to track different  services or kind of different business objectives,   high availability versus latency, that kind of  thing. Now the final thing that I want to talk   about here is what do we do next. So say we found  some bug in the application by kind of looking   at our monitoring dashboards and we need to dive  in deeper and fix the problem with the node app.   Well the great thing about that is an  Observability solution should allow you to do   just that, it allows you to actually take it even  a step further because these days with Kubernetes   you're getting a lot of that information from the  Kubernetes layer. So this is something I want to   quickly pause and talk about. so with APM tools in  the past they were really kind of focused on kind   of like resource constraints, CPU usage, memory  usage, that kind of thing. These days that's been   offloaded to the Kubernetes layer, so you know  Observability kind of took APM and evolved it   to the next stage, pulled it a step up and  enables our users to focus on things like   SLOs and SLIs, Service Level Objectives  and Service Level Indicators.   So these will enable you to actually focus  on things that matter to your business.   So things like making sure that latencies  are low or that application uptime is   high. So I think that's kind of the crucial three  steps for any sort of observability solution.   Let's take a step back again. These  things can be hard to set up on your own   with open source projects and capabilities  pulling all the different things together,   so you might be looking at an Enterprise  Observability Solution and so when you're   comparing competitors and looking at building  out your enterprise observability capability   I would look at kind of three main  things. Now let's start with automation.   Now every step of the way we need to make sure  that automation is there to make things easier   so let's say that our dev team pushes out a new  version of the node app and go from v1 to v2.   Now let's say they inadvertently introduced a  bug. Instead of making a bulk API call they now   make individual API calls to the Python app. So  in our monitoring dashboard our Ops team's like   oh guys something's wrong, the DB app is getting a  lot of requests what's going on? Well you need to   be able to kind of automatically go back and trace  through the requests and identify what happened.   That actually brings me to my second point as  well, which is context. It's always important,   I can spell, to have that context. So automation  is important here because when upgrading to the   new version a node you want to make sure that the  right agent is automatically installed and kind   of the instrumentation is in place so your  dev team doesn't quite have to do that, and   as new services get added you want your monitoring  dashboards to be automatically updated as well.   And that context is extremely crucial as with  this example we needed to be able to trace that   request back to the source of the problem. So once  we've traced that request back to the source with   that context that we have the third step here  and I think probably one of the most important   is action. What do we actually do now? And that  brings me to my last step here the analyze phase,   which remember we talked about was  kind of an evolution of traditional   APM tools to kind of the the way that  Observability tools implement that today.   So when you get to this step you'll probably want  to look at maybe the SLIs within the node app.   Maybe dive in deeper, right. So maybe you look  in and you identify that you need to look at   application trace logs. So you look in the trace  logs and you identify some problems and you figure   out what the what the fix is you tell it to your  dev team you know maybe the last step here is fix   and then rinse and repeat for any other  issues that might come up in the future.   So I think Enterprise Observability is extremely  crucial here when we're kind of looking at   the bigger picture because it's not  just about having the individual pieces,   which again like I said might be quite hard  to set up with purely open source approaches,   but you want to think about automation to make  sure things are kind of set up seamlessly to   reduce the overhead on your side. make sure you  have context to be able to see how services work   with each other maybe even generate things like  dependency graphs to see the broader view because   you might not always have a light board like  this to see the architecture so cleanly. And   finally being able to take action when you do find  a problem. So making sure that your Observability   solution has a way to automatically pull together  data from multiple sources, multiple services,   and then figure out what's valid and necessary  for you to be able to make that fix happen. So   IBM is invested in making sure our clients can  effectively set up Enterprise Observability   with the recent acquisition of Instanta.  To learn more about the acquisition,   or to get a showcase of the capabilities be sure  to check out the links in the description below.   As always thanks for watching our videos. If you  liked the video or have any questions or comments,   be sure to drop a like and a question or  comment below. Be sure to subscribe and   stay tuned for more videos like  this in the future. Thank you.
Info
Channel: IBM Technology
Views: 150,378
Rating: undefined out of 5
Keywords: Observability, Application Performance Management, APM, Monitoring, API, Application Performance Monitoring, IBM, IBM Cloud, Instana
Id: CAQ_a2-9UOI
Channel Id: undefined
Length: 9min 40sec (580 seconds)
Published: Fri Feb 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.