Webinar: How OpenTelemetry is Eating the World

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it started so hi I'd like to thank everyone who is joining us today welcome to today's CN CF webinar how open telemetry is eating the world I'm Kristy tan marketing communications manager at CN CF I'll be moderating today's webinar we would like to welcome our presenter today Steve Flanders director of engineering at slunk a few housekeeping items before we get started during the webinar you are not able to talk as an attendee there is a Q&A box at the bottom of your screen please feel free to drop your questions in there and we'll get to as many as we can throughout the presentation and at the end this is an official webinar of the CNCs and as such is subject to the CNC EFT code of conduct please do not add anything to the chat or questions that would be in violation of that code of conduct basically please be respectful of all of your fellow participants and presenters I'd also like to remind folks that the webinar slides in recording will be available later today on the CNCs website at CNCs io / webinars with that I'll hand it over to Steve to kick off today's presentation thanks so much and thanks for having me so today we're going to talk about open telemetry and how it is like many open source projects really contributing to a broad adoption and really kind of changing the landscape of how you can solve observability problems I actually have a lot of material to go through today hopefully the goal would be if you're not familiar with the project you'll learn all about it today but even if you are familiar with it there may be some aspects of the project it's actually quite broad in scope so hopefully you will learn something new either way also hoping to have time for a demo but that'll be dependent on the number of questions I definitely want to make sure people are getting what they expect from this presentation and we definitely encourage questions throughout we'll try to get as many of them answered I can also touch base offline afterwards as well so with that just a quick introduction my name is Steve Flanders I'm a director of engineering at Splunk I'm also an open slim 'try collector approver and a chair nominee for the CN CF sig observability project previously I was at a company called omniscient I was acquired by Splunk and Domitian we worked on the open census project which is now part of open slim 'try and prior to that I was at VMware leading engineering efforts for their log initiatives so I've kind of been in the observability monitoring space here for for almost a decade I provided some links to some other material if you're interested in learning more about some of the things that I talked about so what is open telemetry hopefully this one's not too much of a surprise for folks but it's actually the joining of two other projects there's open tracing which is in the cnc of today as an incubating project and there is open census these two projects had a lot in common but they also had some significant differences and so really the goal was to bring it both projects together and really provide a single solution today that people could rely on the idea here is really to standardize and to make it easy so you don't have to choose between multiple projects that are kind of doing similar things and so really all investments going forward by both the contributors of open tracing and open census are contributing now to open telemetry so you can think of it as the next major version of both of these projects and the the goal is to sunset both open tracing open census so going forward open symmetry is really the future with that said people are already using open census and open tracing and the goal is not to leave people stranded so opens limitary actually has shims that are completely backwards compatible in both these projects so it really provides a transition plan for you as well it's not like you're going to have to change everything now in order to take advantage of some of the things in opens limitary there's actually a path forward for you as well so let's talk about cloud native telemetry here I'm sure if you're if you're familiar with cloud native terminology you've heard of the three pillars of observability here I'm going to refer to them as limitary verticals but things like traces metrics and logs sometimes these are called signals or data sources it's basically information that you can collect from your application or from your infrastructure in order to figure out what's going on and the end goal here is really to collect this data so you can answer any questions you have and be able to solve both availability as well as performance problems now while you may have heard of these three pillars of observability there's actually multiple layers for each of these different verticals that you need to consider from an implementation perspective you have the api's themselves the canonical implementations and SDKs you have the data infrastructure which includes things like data collection agents and collectors and services and then you have interrupt formats like data wire protocols you have other standards such as w3c all of this is very relevant and usually they're specific to the the vertical that you're referring to in addition to this there are often language specific and as you look at more cloud native workloads that are moving to more micro services based deployments you see a lot of polyglot architectures where you're leveraging multiple languages and so the question becomes how can you have consistency across all of this as you're collecting it now where does the open symmetry project fall in regards to these different verticals and layers well the goal is actually to cover all of it and and the way you should think about this is kind of twofold one is when you look at these verticals and layers it's really about the instrumentation and data collection aspects you can think of it more as the what you're deploying in your environments in order to get the telemetry data out what open symmetry is not looking to solve is the back end issue so it supports sending to a variety of different backends and we'll talk more about that later in this webinar but it doesn't provide a back-end so you're responsible for plugging this in you can of course use open source backends Jaeger Prometheus what have you you can also use commercial third-party vendors as well another thing worth noting is kind of the priorities from an open symmetry perspective there's already broad support for tracing I will spend a fair amount of time talking about that today a metric support is coming in right now in fact the open symmetry project just announced its beta last week or a couple weeks ago that includes both traces and metrics and logs is starting to be discussed currently there's actually a log sig that is forming but there's no like native support for logging other than say like adding trace contacts tricity spin ID and does into some of the logs and again will more about that in a little bit the long-term the vision is really to handle all these different verticals and really provide an open-source vendor agnostic open standards-based approach as to how you would solve this with the the end goal being that this is actually provided out of the box there isn't much as a developer that you need to do to take advantage of this just by building your app hopefully you'll have everything that you need to admit that limitary data and then you can collect it in the platform or platforms that you care about in order to analyze that data and get it back out so I wanted to provide a little bit of stats I'm a very data-driven person and there are actually some pretty cool numbers here so C and C F has something called dev stats you might have heard of it it's online it's backed by graph wanna so as of right now there are a hundred and four members that are part of open slim 'try but there are a broad range of actually active contributors many companies involved here from across the globe lots of contributions so this project is very much active very much growing and maturing extremely rapidly one of the really cool things is that there is support and to end for this project what I mean by that is you're seeing cloud providers like Ashura and GCP big name vendors that are in the monitoring and observability space as well as end users that are both consuming this as well as contributing back this is quite unique you're basically seeing the community as a whole come together and really kind of support and embrace this which means that there actually is a problem here that needs to be solved there's a lot of value that can come out of making this project successful so that's really really exciting in my mind really it's nice to be involved in a project where a lot of people kind of feel this pain a lot of people want to help make this better and everyone's kind of coming together to do the right thing but it's not just about the open symmetry project either there's actually pretty broad communication and collaboration happening with others the MCF projects so so for example we work very closely with the Jaeger team and more recently with flu and fits given some of the log conversations that are happening in the log sig so we're trying to make sure that we're very much engaging others and kind of get insights so that any solution that's being provided is really applicable to the more broad use cases today and one really cool stat that I wanted to share is that according to CN CF dev stats as of right now opens limit tree is actually the second most active project in CN CF that's a pretty astounding number given that number one is kubernetes and opens limit tree currently is a sandbox project so though it's it's be leveraging open trace you can open open census type information as well oh that's a pretty cool milestone and again I think it kind of shows the importance of this project people are really interested in this problem space and really want it to be better okay so with that introduction what I'd like to do next is kind of jump quickly into the architecture just to provide at a high level of what is provided by opens limit tree and then kind of drill down into more of the specifics so you can kind of understand how you can consume it so there's basically three components that you could primary component so you can think of here there's the specification this is actually super important this is the foundation of which everything is built on and that's broken up into three different areas you have the API the SDK and then some data stuff including semantic conventions which we'll talk more about later so the goal here is just to make sure that whatever is being developed from a specification perspective is broadly applicable because the idea is that this specification shouldn't change very much without a significant consideration especially given that all the other components rely heavily on it and making changes requires changing everything end to end the next component is really around data collection so the open symmetry project does provide a collector that is capable of being deployed as an agent or as a standalone service that is completely vendor agnostic and actually handles some pretty cool things like translating into and out of different formats so the idea here is that you have a single solution that you can potentially leverage and not be locked into one particular vendors choice which is pretty cool and then there is the instrumentation aspect this is for traces and metrics and eventually logs a the goal here is to provide a vendor agnostic way to do this and supports broadly all the different languages libraries and versions that you care about and ideally do this in a friction-free way or provide enough flexibility where you can kind of control what is being instrumented what is being sent how you're enhancing that data and what-have-you as I mentioned logging is incubating right now and that is a goal of open symmetry it's still early days for that a lot of the focus has been around tracing and metrics as I mentioned the project as a whole is in the beta status internally right now we cut a beta release a couple weeks back that didn't include all of the languages that are part of open symmetry today it was a subset as listed here you can see Orlando Java JavaScript and Python but there are some pretty cool aspects like Java includes both manual and auto instrumentation we'll talk about that more in just a little bit that's for the tracing aspect javascript includes the web component as well which can help with real user monitoring another very common use case here and there are plans to add more broad open auto instrumentation supports as well as get additional languages dotnet and Ruby are actually pretty close to making it to the beta status going forward here the goal is to get the entire project and all of its sub components into that beta status and then get to GA I'll talk more about the roadmap here towards the end of the session so I wanted to walk you through kind of a high-level architecture of open telemetry kind of where the components fit how they would be stitched together again this is meant to be super high-level basically everything is plug-and-play there's a lot of flexibility here this isn't to kind of dictate how you must use the project it's just one of the ways that we would typically recommend stitching things together as I mentioned given that choices and metrics are kind of the primary focus right now I'll talk about them specifically so let's assume that you have your application you probably have multiple of them probably microservices running throughout your environment they run on one or more different hosts and the net result is you want to collect out the telemetry data that you care about and then send it to one or potentially multiple different backends so how you would do that using the components that are available in open symmetry today the first step would typically be deploying the open telemetry collector as an agent the collector name is a little bit of a misnomer here it can actually be deployed a variety of different ways a standalone binary sidecar daemon SATs as an agent or as a standalone service but you want to basically have something as close to the application as possible and you want it to be able to handle a few things one is if it's running on say a host you want it to be able to collect some of the metric informations you can do more infrastructure correlation and then the second aspect of that is you wanted to be close to the application to get the application metrics and trace information as well and you might have scenarios we want like a subset of data to go to one back-end and maybe all the data to go to another so really there's flexibility and how you configure this collector to send that data the other aspect would be around adding instrumentation to your application itself again opens lumetri provides a variety of different client libraries and those can be configured typically out of the box they support sending locally to the open telemetry collector running as an agent so here you can get both metric and trace information whereas from the host you would typically just be collecting metric information so this is kind of at a high level what it would look like maybe you only have one back-end maybe you only want to collect traces and not metrics or metrics or not traces again you can configure this in different ways and then for more of the enterprise like production grade type deployments we also support a model like the following where similarly you still have an agent running locally but you might have use cases where you might want to have a standalone service that kind of aggregates this data use cases for that would include like limiting the number of egress points you want to have out controlling say API tokens if you're sending this to like a third party SAS vendor could also include doing more advanced use cases like Tayla based sampling where you all spans forgiven tres to go to the same a collector instance so again flexibility here and as how you could deploy this but at a high level this is what it would look like and you don't have to necessarily take our word and open symmetry for this other projects they're already adopting it as well Jagr just recently announced they have the blog post here that they are in the process of supporting the open Salama tree collector as a replacement for the Jager collector again very similar notion here the primary difference being that today they're still using the Jager client libraries they are talking about switching over to the open symmetry ones in the future but the other aspect is there supporting Jager natively so like the default config for the hotel collector would be to support the open telemetry protocol whereas Jager wants to support the Jager protocol this is basically just a configuration change but they are they offer a distribution of this that is pre-configured to negatively support Jager intent hey Steve we have a question and in the Q&A and I think makes sense to answer here so Andre is asking does open telemetry include adoption of w3c trace context as the standard propagation header format yes that's a very good question let me get to that in the client libraries I actually cover that explicitly the short answer is yes but there's more I want to talk about with context propagation so that is a very good question and just give me a little bit and we'll cover in more depth cool so now you kind of understand a little bit about the project high-level architecture I want to jump into the specification because that kind of lays the groundwork for the data collection and client library aspects so actually we talked a little bit about context propagation here so I guess good segue for that question from a tracing perspective there are many different concepts and I can't get into all the specification here given the amount of time that we have but some of the big things to be aware of is this notion of context and the in the distributed tracing world this is actually a super critical it allows you to actually get context and correlation throughout your infrastructure so I guess to directly answer the question I didn't realize it was the next slide here w3 trace context is natively supported by all of the open telemetry client libraries so the answer is yes absolutely many of these client libraries also support other formats because not everyone has moved to w3c yet it's a very new standard that's coming out but it really is the future of context propagation so it's good that you're looking at it b3 is commonly used there are other context propagation formats that are commonly used and you're gonna see support for that in the client libraries as as well what's also kind of cool and I'll talk a little bit more about it later is that you can actually support having multiple context propagation formats run in parallel so you can receive multiple different ones this is going to be important because many people that already have tracing in their environment are probably using something like b3 and they're going to need to transition to w3c so open symmetry is actually going to make that pretty easy it's just enabling multiple contexts propagations and then turning it off when you're when you're ready some of the other aspects to be aware of you have this notion of a tracer it's basically how you pay it pass context around and generate your spans spans is a typical distributed tracing concept is basically a call and your request path it's made up of multiple different components and then you have more advanced type stuff like sampling and how you handle exporting but there's a lot of flexibility here there isn't like a prescribed you can only do it one way as you can see for almost all of these there are multi different options context propagation lists multiple sample error processors exporters so a lot of flexibility as well one thing I will call out in the tracing role that leaves for open symmetry we call key value pairs or tags or metadata attributes so if you hear that term it's it's kind of similar but the naming for that is is called attributes today one other thing I want to call out about tracing is actually semantic conventions as it turns out open symmetry is not very prescriptive about how you actually denote your spans it's kind of freeform it's up to you to add the data that is important to you with that said it does make recommendations on how to call things similarly that are well-known so HTTP calls are a good example of the databases are a good example of this and this is actually super important because these semantic conventions make it so you actually can have a vendor agnostic solution whereas you know what a database is no matter which vendor or open source project you're leveraging so this is actually very cool it doesn't force you to do it but it would be encouraged that you actually start taking advantage of this and some of these are actually built into the client libraries today I'll show some of that with the auto instrumentation work that exists so let me give you an example of the power of semantic conventions let's say that I have an application doesn't matter what language it is let's say it's leveraging the open symmetry library or really any tracing metric library doesn't really matter and let's say that your application actually calls out to a database now there are a few different types of ways that you can call it to a database maybe it's something that you can control which means you can add instrumentation to the database as well but more often than not you're probably leveraging another type of database maybe some sort of third party thing maybe you're leveraging like a cloud provider a one so you don't have direct access to actually add instrumentation you're just kind of consuming the database this is where semantic conventions can be extremely powerful for example let's say your application calls out the database if I'm leveraging semantic conventions for that call for let's say traces then I can denote in my span hey this is a client's pan which means I'm calling out to some other service and I can denote it saying hey the DB type is whatever this thing is maybe it's a MongoDB and I can say DB instance is MongoDB o1 it's the first instance they happen to be calling maybe it has the DB duck statement information so I know that I'm running a select query on some table here all of that can be tagged in as basically metadata on to the spans that are being generated from the application itself and because of that even though the database is not instrumented I can now infer that a database exists and I can actually calculate information like I will know the amount of calls that I'm making so I can actually compute my read metrics requests errors duration from an application level perspective and I can also infer what that relationship is say like latency between my application and my database because again I have instrumentation on this side and I know what I'm calling so this is actually extremely powerful and as you leverage more like cloud providers or third-party services taking advantage of these semantics conventions can can really give you more insights as to how your environment is behaving on the other side of the coin here we have metric basics so same idea as it turns out context is important here as well so typically there is no notion of context with metrics but in the open symmetry world context is actually added span and correlation information is added in as metadata so now I can actually enhance these spans and actually understand how a behaviors are happening the terminology here is is slightly different so instead of traces and spans we have more like meters and metrics measurements and you'd use things slightly differently like aggregations and time versus say like sampling but at the high level the concepts are very similar metrics are typically a little bit easier to understand because most people use them today where we're tracing hasn't received its broad adoption yet but hopefully that will change because especially as you start looking at more of these micro-services architectures without the right context and correlation really things like metrics and logs or more symptoms and it's really hard to get to like root cause without having some sort of treat trace information or relief the context that's provided by a trace information throughout your environments one other really cool thing the open symmetry provides is this notion of a resource SDK and this resource SDK also has its own semantics conventions the idea here is how can I identify the source of the object that is generating this telemetry data and and this is super important as I as I want to solve more let's say infrastructure correlation information or if I want to do problem isolation and I want to identify where in my environment a problem is occurring this is extremely useful and the example provided here I think makes a lot of sense especially given its kind of cloud native focused so think about I have some sort of process that's running some sort of micro service let's say it's producing telemetry it happens to be in a container that runs in kubernetes that means it has a pod name because that's how kubernetes works it will be running in a namespace again that's how kubernetes works and it might be part of a deployment it could be another potential objects but let's say that it's a deployments all three of these things the pod name the main namespace name and the deployment name can be added as attributes into this resource so now I can identify kind of where this is happening in my environments and it's immutable so I now come I know it's state indefinitely so some semantic conventions have been defined here and there is ability to tag resource information on to both traces and metrics so now I have even more visibility not only can I answer application level and questions I can also answer some the infrastructure ones that come along with it okay so with the specification at least high level information and again if you're really interested go take a look at the specification it's pretty in-depth there's a lot of conversation going on there I can't cover all the aspects but I think that gives you enough of a foundation to help you understand what's available I want to jump into the collector real quick and then we talk about the clinic libraries so what is the objective of this collector and the idea here is really to provide an implementation for you so not only do you have your telemetry data but you have a way of collecting it and sending it in a completely vendor agnostic way this vendor agnostic aspect is actually super critical one of the things you commonly see is that most vendors provide their own agent or collector or both a type of implementation but it's proprietary to them even if it's open source it only works for their backends it's very hard to extend it's very hard to make a good kind of open standards-based and so the collector is looking to solve this across a problem by offering a way to basically receive telemetry data to process that telemetry data in case you want to make changes to it and then export it to one or more different backends this also would include transformations or translations of that data so for example I can receive and Jaeger but I can export in Zipkin that's totally possible in the collector today and there are some very high-level objectives here kind of end goals in terms of usability and performance and and just providing a single solution end to end but it might be more helpful to kind of understand this context by by drilling into it a little bit more what I do want to cover is the the but why because it does come up from time to time look there are agents and collectors out there or why can't I just have my client library instrumentation send directly to the the back end that I care about and at a high level there are kind of two bullet points that I think answered this question one is the goal of generating this telemetry data it should be to ensure that you are not adding a significant overhead to your application you can't impact application performance so it needs to be as lightweight as possible and especially as we look at more micro services and polyglot architectures you're going to be solving this for every single language so any feature that you add into a client library needs to be added to every single language any bug that you find is probably applicable to most of the languages and needs to be fixed end-to-end so one of the things that you should be looking to do is to offload as much of that responsibility from the client library as possible so that you're not impacting the app location this can include things like compression encryption retry logic it can include things like adding additional metadata or handling like redaction for like PII it can also include like supporting multiple exporters maybe I'm using vendor a today but I want to use vendor B tomorrow that would mean having multiple exporters configured in my client library or even rebuilding my app to add that support the other side of the the coin here in regards to why the collector would be time to value if I offer and offload these responsibilities into a collector I can solve it in one language and not in multiple which makes life a lot easier and I can move to more of a config based updates it's usually pretty trivial to update a configuration file or even update an agent it is not as trivial to go update your application code and to go push that through and get that rolled out throughout your entire environments in addition the goal should be to kind of set it and forget it the idea is that your instrumentation is configured once and then you don't have to touch it and out of the box basically it'll support sending locally to the collector running as an agent which means if you deploy it then no configuration change is going to be necessary in your instrumentation it'll automatically get picked up and then finally of course as I mentioned vendor agnostic and easily extensible so out-of-the-box it provides support for a variety of popular open source solutions like Jaeger and Prometheus but it also offers a very flexible pluggable architecture so that vendors or really anyone can add additional support and capabilities as well so let's look at the architecture of the collector and this is going to be a bit of an eye chart so I apologize but I was I was trying to think of a way to make this understandable basically there's a notion of receivers this is how you get data into the collector this could be push or pull based it doesn't matter but basically it's it's telemetry data that's going to enter the collector this works for both traces and metrics so like there's a Jaeger receiver today there's a Prometheus one and open symmetry actually has its own protocol as well on the other side of the collector would be exporters which is how you send data out of the collector and again you're going to support like the same thing on both ends at least when it comes to some of these open source solutions so I can export in Jaeger or Prometheus or the open telemetry protocol and then in the middle here you have this notion of processors and its ways that you can kind of massage manipulate change the data as its flowing through the collector so this could include things like I want to batch the data before I send it out or I want to retry in case for whatever reason the exporting fails this could also include things like adding metadata redacting tags that may contain PII doing things like tale based sampling all of that would be kind of processors or things that kind of happen in the middle and you can actually default the define multiple different types of the same processor as well so I can have two different batch ones what you basically do is with this architecture you build what we call pipelines a pipeline says hey I want this set of receivers to talk to this set of processors to this set of exporters and I want it to do it in this order so for example you can think of maybe I have a pipeline where the open cementery receiver is configured to go to the batch processor the queued retry processor and then I want it to export in yaeger that would be a pipeline as I mentioned the collector is capable of doing transformations so receiving an open symmetry and exporting and Jager is fully supported and maybe I want to have a second pipeline also defined where again I'm receiving an open symmetry it goes through a different set of processors so a separate batch so that the batches between each pipeline is different the queued retry is different between each and maybe in this case I wanted to export in both Prometheus and opens limit tree protocol that's totally possible so there's a lot of flexibility and configuration here depending on your use cases and out of the box it provides a pretty consistent experience to get you started and then finally on top of all of this we have the notion of extensions which is like health information P prof Z pages is as a concept of kind of sampling the data that's going through the collector and looking for potential problems so again anyone well out of the box there are a fixed number of receivers processors exporters and extensions anyone can also write their own because this architecture is very pluggable there are a notion of core components so these are things that the maintainer z' of this project actually maintain that are built in out of the box these are all going to be open source base there's no vendor proprietary stuff in core and the goal is to keep core as minimal as possible I won't cover all of these but it's all covered in the documentation but a cool thing that we have is also a notion of contribs so we have a separate repository where there's more community based extensions that are being written processors receivers and this is where say vendor information could go so if you have like a vendor specific exporter it would live in this contribu and we have a way of building core and contributing kind of combine things as you see fit so you get the components that you really care about that again I won't drill into the specifics but go check the documentation on that next up let's talk about client libraries I'm going to focus on Java but this is going to be applicable to all the client libraries basically what you need to do whether you're doing traces or metrics is a SAN she ate a tracer or a meter basically a way of collecting context and submitting your telemetry data you need to generate the data that you care about so spans in the case of traces or metrics and the cases metrics enhance them and then send them back out I'll walk you through kind of a QuickStart example I'll highlight this just you don't have to read all the code basically you instantiate a tracer so you can say hey this is my service it's called in this case it says instrumentation library name and it's run necessarily running a certain version I can go ahead and generate my span one thing worth noting make sure you close your spans because at least in the case of Java they do not close automatically so you need to tell it when you're done and then perhaps add additional metadata that you care about so I added another version here even though the version in the end the tracer itself this could be data center information it could be garbage collection time it could be whatever you want right something that you want to enhance and provide additional information so this will basically generate a spam and you can do this in each of your functions you can do this for just service to service calls so say different our pcs between your micro services this generates the span information that you that you care about and then on the flip side you need to configure the SDK so you can actually export this data out so again kind of getting your tracer information telling it how you want to sample so for example you can say I want to sample all of the spins I only want to get maybe 50% of them so I mean with a probabilistic sampler again flexibility depending on your use cases and then how do you want to export it in this case I'm leveraging the Yeager exporter but it could be the open symmetry it could be Zipkin whatever one that you that you care about and you build basically a processor to export that data out I understand that there's a lot of information on these slides and given our time here I don't have a lot of time to go through it so you may not still be with me which is which is okay because manual instrumentation may be complex if you're not familiar with it this is really for people that have familiarity with kind of instrumenting they're out manually and are comfortable doing so so open symmetry is also looking to ease this and to provide a quicker time to value and that's going to be done through more of an easy button type of approach so what if I told you instead of doing all that code that I just showed you to manually instrument a single span I could do it with zero code changes whatsoever I'm sure a lot of people would be interested in this approach especially if they don't have traces instrumented in their environment today so in addition to manual instrumentation for traces there's also auto or automatic instrumentation as well this is done with no code changes only runtime changes so basically in the case of Java here I specify a couple other parameter during runtime Java has this notion of a Java agent and basically we provide a jar for you that does bytecode manipulation and then you can kind of configure your exporter or other configuration parameters that you care about what's cool about this is it will instrument all libraries that it's aware of it will ensure that it adheres to semantic conventions it doesn't require you modifying your code or doing it in multiple different languages and it's pretty flexible not only can I pass in parameters here I'm not showing an example but you can do this through environments variables as well instead of passing parameters directly into the command that you're running one caveat I will call out is that many people offer auto instrumentation don't run two on the same service you will most likely have a bad time so if you're using an auto instrumentation thing only use one at a time for service as it turns out auto instrumentation is library specific so not only do you have to have auto instrumentation you have to make sure that auto instrumentation supports the libraries and versions that you care about in the case of Java 4 opens limit tree there is broad support all today already here's a list of many of them and again extensible system pretty easy to add additional integrations or additional library support if need be one other thing worth noting is that Java is the only thing that has auto instrumentation today an open symmetry Python and Ruby are just getting started I think net is also in the process of just getting started so they will be coming and then other other client libraries need to be modified as well but the goal will be to offer both manual and automatic from a tracing perspective on the metric side I'm going to keep this one pretty high-level it's similar to tracing it's just you're using metric so you have to have like a meter you have to give it a name you can give it like version information you need to create your metrics you need to observe those metrics maybe on some sort of cadence and then admit that data out so just the syntax is is different here but the concepts are generally the same and basically the specification ensures that everything is pretty consistent between both given time Steve yes oh hey sorry we have a quick question so Oh Anonymous attendee is asking will there be any exporters provided for payment I okay mo and IO to Oakland telemetry or do we have to completely migrate away from came into Oakland telemetry yeah so regarding exporting in a format the the architecture is extremely flexible so there is not a came an exporter today but there's no reason why one could not be written so that could be like submitting an issue and someone from the community picking it up we of course accept that pull request so if you're interested in doing that work we would love that as well but there's no there's no reason why you'd have to migrate off whether it is an open source destination or a proprietary one it doesn't really matter as long as you can write an exporter and let's say the collector or the client libraries today provide flexibility of writing any exporter that you want so on the surface without actually drilling into the specifics of camon it should work but there's nothing there natively you'd have to get an export it written okay I want to jump into a demo real quick after kind of showing off the different components you can kind of see this working and to end so what I have here is I just found the pet clinic it's a spring application that's microservices based it's open source on on github they provide kind of some docker compose type stuff so I basically picked this one up after doing a quick Google search and I was like hey I would be interested in getting open symmetry to work with this project this project today does not have open symmetry from what I can tell it doesn't have any distributed tracing information it does have a Zipkin server for for log information but it's not actually instrumented at least that I that I can tell so I took this application and I actually have it running here on my system and I actually modified it by adding open symmetry to it given that it's spring it's it's Java based I can leverage the Java auto instrumentation I also threw in the open telemetry collector and if people are curious I actually pulled up put up pull requests into the repo saying hey let me know what you folks think of how to do this while it's a docker can post file so many files have been touched to kind of update the docker compose on the surface the change was actually pretty minimal what I need to do is pull in the jars that are applicable to the open its limit tree Java auto instrumentation so this would be the auto jar as well as the exporter I chose to use a Jager exporter for this example and then I have to explicitly tell it that I want it to run the Java agent and which additional parameters I want set so those are really the only changes I made to this app as you can see I didn't actually change its code these are runtime type things or in the in case of docker like pulling down the docker dependencies but I didn't modify the the spring application at all and I have that running here the other thing that I did is add the open telemetry collector in so just to kind of show that off I built a quick collector gamal file so the configuration is yamo based and i basically said hey i want to have a Yaeger receiver so I want to receive Jager in because that's what I told this pet clinic service to export in and I want to export to Zipkin because the pet clinic example actually has a Zipkin server that is running and I basically built a pipeline around that so I have this trace pipeline that says take in Jager I do have some processors so I have a batch process or attributes retry this isn't required but it is kind of a best practice and then I said go ahead and send that data to Zipkin so I'm actually going to translate from the Jager format into the Zipkin format and have Zipkin accept that and so that collector is actually running in this docker composed so there's a docker compose file I went ahead and modified it to have an open symmetry collector here so basically I pulled them to collector it's using the collector config file and then I exposed the Jager G or PC port so that I can send data to it that's really being only changes that I made to this thing and then I started running it and this is what the pet clinic app looks like when you actually fire it up you can actually go ahead and see for example the different owners these are built into the system there's a list of veterinarians this should populate as well I can actually register another owner so I can register myself goes and I don't think it actually does validation there we go so it adds me in you can add pet information pretty cool and as that's happening you'll actually see that there are traces being or spans being generated so this app is Auto instrumented and now generating spam data that wouldn't be there otherwise because the odo collector and the auto instrumentation was not there and if I fire up the Zipkin server that's built into this docker compose you'll actually see all the different micro services and you'll see spans that are being generated from those calls that I just made and I can pull up any one of those traces and I can actually see the information about the calls are being made the operations the duration and any associated metadata I actually added a tag for environments I called the test so it's pretty easy to get started especially with the auto instrumentation aspects and it doesn't really matter if you're using Zipkin your Jaeger or a commercial vendor or not like you can modify the collector config pretty easily to get an end-to-end going where you see data flowing throughout your system so definitely try to take a look at this we'll be providing more examples documentation has some of these the QuickStart guides have some this PR is up and I'll share the links in the slide but this this shows that you can take an existing app you can add the necessary hooks to get instrumentation out you can deploy the the collector very easily and even have it receive in one format send out another and tag in additional information and this is set up in I don't know five ten minutes pretty quickly here it probably takes longer to start the the spring application and have it be fully running I think that takes about four or five right now than it does to actually instrument it in a way that you're getting open symmetry data to be exported which is pretty cool so definiteiy Cal look at that what you will notice is that the they give an architecture diagram these will be represented inside of the Zipkin cerberus you'll see all the different calls the different micro-services that exist and any errors and things that are that are generated so pretty cool overall I think all right so what with the remaining time I did want to cover a few other aspects and then I'll definitely open it up broadly to two questions as well a few other things that the project already has in place today even though we're only in sandbox there is a government in spord there is a code of conduct there is a technical steering committee so there's a lot of oversight here there's actually pretty good representation from from many companies just kind of ensure that we're not building something in a specific direction and we're actually taking broad community insights here one cool thing that we have that I think is more unique I don't I haven't really seen it another ciencia project maybe exists we have what we call open slim 'try enhancement proposals we call them Oh tips you can think of them sort of like design Docs in a way it's a way of kind of vetting an idea and ensuring that there's alignment and generally even doing proof of concepts before actually submitting that PR isn't actually getting this built in this can be valuable for an individual project so like you'll see some of them that are specific to say the collector that wouldn't be applicable to the client libraries or they can be more generic or alike there's a proposal that would impact every single client library and so we want to have a note tab to ensure there's agreements before we ask the maintainer to kind of take on the work of making those changes the logs sig is also following this otep so kind of since it's getting started and it's unclear exactly what the work streams are going to be or what decisions are going to be made we leverage o temps for that I did kind of highlight this earlier but not only the collector also the the client libraries themselves have this notion of core versus contributor the idea is to be as lightweight and efficient as possible and to make really make it be as minimal as possible there's also contributing and community based doesn't mean necessarily like third-party vendor closed source there could be open source aspects here to where it doesn't make sense for it to exist in core because maybe a lot of people don't use it or it's more legacy or the overhead of maintaining it is not it's not possible today so this is really cool because it allows us to move quickly if everything was in core it would really slow down our progress core would become very large build times would go out there'd be a lot of problems and as you're trying to solve the we want the ability of providing vendor agnostic see having like other third-party company is being core with this as well it's not the best outcome so really having that distinction is providing a lot of flexibility for us and then finally there is a there is a website that provides more information and actually links to the read Me's much of the documentation is either in the github readme today or on other sites like Java Docs or go Doc's have their own destination but the open cemetry site is actually maintained it lists all the different types of components that are that are possible and supported it links out to to kind of blogs and and video recordings and other media aspects so I definitely take a look at that from a roadmap perspective we're looking to get all the client libraries to beta as soon as possible I mentioned ed and Ruby are getting pretty close there are several other client libraries that aren't there yet that need to get there our intention is to get this project to GA later this year specifically for traces and metrics that wouldn't include the logging aspect given that logging has just kicked off there also is an intention to get auto instrumentation for all languages but that will take some amount of time as you can see Java is the only one in that state now that's pretty far along it actually made the beta for it and then we want to get the initial log support or what the decisions are for a lot of support later this year as well probably in a in a beta state with the idea being that that would GA probably really next year and then as always like most projects right improve the documentation PR is definitely welcome like if you're confused you're having a time getting started that is a bug let's go fix it increasing adoption of the project overall including getting case studies I mentioned some some big big companies are using this today post mates Shopify MailChimp just kind of understanding their use cases of how they see value why they're contributing things like that will be very important and then of course making the getting started as easy as possible as you saw like manual to automatic is kind of night and day in terms of the amount of effort necessary to get started so we really want to provide a friction-free way to kind of get up and going but flexibility of enhancing that with additional information should you need it as well so next steps please join the conversation we have a git er there are multiple rooms but you should probably start in the community one it's a great place to start we have many special interest groups again come join us the meeting schedule calendar invites are all up on the community page and then definitely please submit PRS we leverage the labels of good first issue and Help Wanted so if you see those those are good places to start if you have your own issues or only things you want to work on that's always welcome as well I did want to note that I kind of put together this Google slides templates I'll submit a PR for it so that people can leverage it if they're interested or enhance it and make it better but I thought it was kind of cool to show off similar like color palette and information for open symmetry as well tons of links I won't cover them these slide decks will be shared out with folks but much of what I covered here there's the link to it so definitely check it out if you think other resources would be useful please ping me message me find me I'm on getter I'm on Twitter like I'm around I would love to get feedback as you as to what you like and kind of going forward what you think would be good to drill into further about the project and with that I would just like to say thank you for having me and I would love to open it up for questions in the last like five minutes it looks like we have time awesome thanks for the presentations to you it was super informative um so yeah we're gonna move into the question and answer piece we only have a few minutes so if you have a question that you'd like to ask Steve please do submit it in the a box at the bottom of your screen and we'll get to as many as we can we do have one in here right now pardon me uh Mike is asking how much additional overhead do traces add to cluster resources example Hotel API calls using Network storing traces etc yeah so this is going to be client library specific or language specific so each of the languages should be doing performance testing of this of this for that particular language to ensure the overhead is minimal there is going to be overhead like it's not free but the goal is to make it as lightweight as possible because the goal is not to impact the application as I kind of mentioned earlier that's why it's pretty important that you deploy like the open telemetry collector as an agent or another agent if you have another one that you're leveraging today because then you can offload more responsibilities which means less resource consumption in the client library which means less resource consumption in the application so in general they're actually extremely lightweight inefficient they're built that way by design so we're not consuming a lot of memory or processing thing multiple times we use a lot of like streaming string processing of this data through your application so you shouldn't run into performance problems I don't have performance numbers readily available I'm assuming that the maintainer will probably post this on that github repos themselves the collector for example has a performance section that I know for sure and actually the build process for the collector actually test performance and will fail builds if performance has deteriorated so you should be seeing something similar if you don't I would definitely encourage you to open a github issue because we should be tracking that if there are going to be performance problems or if there are known performance problems that definitely needs to be fixed because the goal is not to not to impact the app great okay so we have another question from fellow Steve and he's asking is offloading work from application instrumentation to collector automation or configurable yes so offloading is is basically you have flexibility here it's your choice so from an instrumentation perspective probably the only work that you need to do is batching and so you'll actually see that there's a span processor by default it does simple which kind of sends everything versus batch that's probably the only thing that you want your client library to do everything else should happen within the the agent itself the agent can do its own batching across multiple applications which is powerful queued retry compression encryption but the configuration aspect would be if you haven't configured it in your client library you would then can configure it in the collector itself so like that collector Dom will file that I showed you can add processing information there so maybe you want a larger queue or most maybe you want to have separate batches for separate pipelines there's a lot of lot of flexibility and you can double up like I can enable batching in the client library and batching in the in the collector nothing prevents you from doing that and there actually could be good reasons to do it as well so everything is configurable but out-of-the-box so if you take like the default getting started for let's say Java you will get the batch processor enabled and nothing else which is great because if you take the default configure the collector you'll get the queued retry for for free as well some things cannot be automatically configured so I don't want to give the idea that hey if you just deploy this you'll get everything that you need automatic that's not necessarily the case some things our environment specific and you will need to make modifications but the default behavior should be sane and should do the right thing for you all right looks like that's all the questions and that's all the time that we have today and thanks again Steve for a great presentation and unmaking to all of our attendees for joining us today a reminder that the webinar recording and slides will be online later today we look forward to seeing you out of future CN CF webinar and have a great day thanks Thanks
Info
Channel: CNCF [Cloud Native Computing Foundation]
Views: 13,773
Rating: undefined out of 5
Keywords: Kubernetes, OpenTelemetry
Id: DbaO0Xxv34c
Channel Id: undefined
Length: 56min 36sec (3396 seconds)
Published: Fri May 08 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.