Grafana Loki: like Prometheus, but for Logs

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Applause] Thank You December 2018 that it comes with from a phallus wasn't the stage of typical North America to introduce grafanello an open source solution to collector and query logs at scale now fast forwarding a year and a half later today Loki is generally available and is used by many companies and users as a logging solution to give you an example at grow fauna we have a low-key cluster for our own internal applications we can't be collect about 20 to 25 megabytes of log per second for a total of 2 terabytes of log per day last week my colleague Cyril was tweeting about some performance improvements we are introducing in Loki and in our internal cluster this one we are currently seeing about 10 gigabytes per second of logs processed at query time my name is Marco I'm a software engineer at Griffin labs I'm a low key contributor and user and I'm also a cortex maintainer I mention in cortex the cortex is a distributed storage for metrics I'm messing in cortex even if it's for metrics while Aki is or rocks because they both share the same architecture and we have a pretty good amount of codebase shared between cortex and loci so even if I spend most of my time working on cortex many of the changes we do are actually used by Loki as well in this talk I'm going to give you a brief introduction about Loki a quick demo and then the second part of the presentation will be a deep dive into the Loki architecture and storage now the typical problem we have is this will be the system composed by many applique or microservices each one locking pulse' vlog and we want a cost-effective way to collect as lock store them and then eventually clear them back the way lucky works them is based on an agent called from tail which you install on each node running your applications or services and what prompted does is paling the logs from the local file system and pushing des logs to a central server which is locking and then you query back your rocks using your fauna or we also provide a common line tool called lock CLI now to better understand lucky we have to do a step back and we need to understand that the log get anatomy in body so given a single log entry in lucky each log entry is composed by three components at time s'en a set of labels and the log line like the message you log it from your own application now the labels are key value person if you are familiar with Prometheus they have the same exact centers of Prometheus and they are metadata you can attach to each of your logs like the application which is logging or the node information about where the application is running and so on and this metadata is configured through prompting the agent which push the logs to Auggie now an important trade off picked by Loki is that Loki does not index the content of the law I mean the message you log form your own application or the entire content of the assess log is not indexed by logging what we index actually is the time sample and the set of labels so what happens at query time he is given a time range and your query what we do is to filter down all the logs by timestamp by time range and the label matches and we use an index for this and then given the resulting Luxan with a further scan the entire content in order to fill that not further than down by the content a group of Logs with the same exact labels is hell extreme and a second trade off pic by locking is that given a single lock stream so a series of logs with the same exact labels locking requires that the client like from tail the agent pushed des logs already sorted by time zone this means that if lucky receive how to order logs given a specific log stream lucky reject des loss this is not just a Dido this is actually a design decision we picked when we built cursor in this way we don't have to do any resorting neither at the injection time when we receive the logger neither at query time because for each lock stream each log each log stream is already sorted by time so now don't get too much scared about this we say constrain cause prompting that the a and hit enter can be configured to actually fudge out of order time since so if for any reason your application logs logs with a timestamp which is how to order from tale can be optionally configured to fudge the tension which are out of order in order to not get any log rejected on the server side and then you clear your loss now the query syntax is pretty similar to at least that the first part of the query syntax is pretty similar to to Prometheus because the query syntax is composed by a lock selector and a filter expression now the lock selector is a new to filter the logs by the labels so given all the log streams that we do receive on the server-side when you query back your locks you can filter them by labels and and we use an index to filter the logs by labels in this case you're filtering where by application equal njx or the instance in this case IP reg X start with one and then given the resulting blocks you can further filtering down the locks by the content like the content of your log line with the filter expression and there's a second step filtering is done doing a full scan of the logs so there's no index which in other terms means we basically built a distributed graph if you think about it now prom tale the agent which again is used to ship the log the local logs to two loci does three things the first thing is discover local laws and then for each look like a log file you can configure a pipeline to process those laws you can attach label you can transform the logs you can drop locks you don't want to push too rocky and so on and then at the end of this pipe and processing will push the logs blogging from tail is not the only way available to push locks to lucky we have plugins for foot bit windy we also provided dr. dragon you have an integration with Sicily but from Taylor let's say it's a native way and it's the way I'm going to show you in the next demo now one of the most interesting things to me about Frontale is that Frontale supports primitive style service discover which means we specifically support the static static discovery and the cuber native services are now the status every story is coming to muscle they're loving solutions you can configure from tail to scrape the logs from the log of a system configuring the file path of the logs but if you run your applications on Cuba natives you can actually use the Cuban IT services story and the Cuban it is every discovering what does is a from tailor keeps a a connection open with the kubernetes api and we continuously discover all the pods running on the node where from tail is running and in this way we can do two things the first one we can automatically discover the pods and we can scrape hold the logs so for each container running on each pod on that node and the second thing he is you can configure prom tale to dynamically inject labels to each of the locks based on the cuban eighties metadata like the pod labels or the node information in this case we are attaching the name of the application which i don't know could be the pot a pot label we can attach the node IP the node name and so on basically any information any manner data which prometheus cuban a this service discovery support is also supported to by locking because in practice handle the food we use the same service discovery from promises there are three ways to run block the first one and the easiest one is the so-called single binary mode so low key is just one binary pretty much like promises you can download it you can run it on your computer you can also run it with some little limitations on multiple nodes so in a little cluster and it's usually used for small installations and it's the way I'm going to run low key in the demo the second way is the micro-services way so when you run low key in the single binary what we actually do is running multiple services within a single process but what you can actually do is to pick each of their services and deploy them separately in order to be able to order Donnelly scale each of the services which compose locket we are going to see the micro services architecture in the second half of the presentation this is used usually for large installations it's the way for example we're unlocking at griffons the fair way is to use locator service graph ana as a cloud offer called hosta clocks which means we do run lucky for you and you can't figure from tail on your notes you push the lock to lock you manage microfauna and you get back an endpoint an HTTP endpoint which you can use in graph honor that according to to create back your locks so as I mentioned previously I would like to give you a quick demo I'm going to use this single binary which means I'm going to run lucky on my laptop I'm going to run from tail on my laptop to swap and for this presentation I've downloaded the latest version from the github release page which is the 1 3 0 and high f2 binaries on my computer lucky the server and from tail the agent I built a simple script to generate fake locks just use for this presentation it looks 10 lakhs per second and the format is pretty similar to the go load format so it's time size equal to x name level equal the log level info or error the application name and the message which is the message that application is logging then I configure the low key the server to run with a local the local filesystem as a storage so that everything runs on my laptop and I was so configured from tail to tail my log file where the master suite is the fairy it's Oh for each of those locks it's a touching a static label called the type equal demo which I'm going to use later to easily query back to data and I'm going to parse using Gehrig axe the log line so I'm picking the time set the level the application and the messenger and I'm setting the lock time stand based on the content of the log line otherwise by default is the time when from tail straight the locks I'm setting a couple of labels the name of their application and the log level which are inside my log line and I replace the log line removing the timestamp the level then the application and just keeping the contents of the message so I already have these bash script generating some laksa I'm going to run lucky and I'm going to run from tail so at this point we have from Taylor which stay in this log file and push them and push the logs to locking I also have a local installation of gravina so I'm going to add a new data source to graph on to query back this data so I select the lucky data source and I configure the endpoint of my local loci it's running docker container I say it's green so I don't know if you are familiar with the graph Anna Griffin has a panel called Explorer is it large enough maybe this is better the Explorer can be used to money to manually write queries whether they are metrics or locks and it's a to me at least it's a convenient way to experiment with your data before building dashboards what we care for this demo is the time range so let's be clear even the last five minutes of data and we've right the first period so the queries are written in a sort of primitive style query language so basically we select the labels of the logs that we want to query as I mentioned before for all the logs I'm hiding a label set called sorry a label pair name type equal demo so if I run this query what I get is the list of the entire logs which my fake application which is the bash script is generating so on the top we see in histogram with the laksa splitted by log level in this case the bash bash script is not generating logs with the info and error level and below we see the entire list of logs we can zoom into each single log line and we can see the label set for that specific log line so we can see that the application which was logging this specific log was the API application the file name and the log level and so on now let's try to query data let's say that we just care about errors there's some issue in production we want to check out what's happening so we can further filter down our logs by the log level okay so now we just got the heavy logs and we can now start to filter further down the logs based on the content so if we look at the logs we can see there's the logs the fake locks are V user log out user login user they try to simulate like activities or update user let's say we just care about the authentications so we can look for their logins and now we have further filter down our laksa just showing them the one that contains the looking or we can use regular expression and we can just look for looking or lookout here we are and we have a list you can pipe this filtering you know with hand conditions so for example if we pipe user eighty-two we are going to see the login and logout for the user a teacher now the second interesting thing have lucky is that it allows you to extract metrics from laws so so far we have filtered power locks and we are viewing our logs but what you can actually do switching the button on the top between logs and metrics so we can switch to the metrics now if you switch to the metrics I'm going to simplify the query we can compute matrix using aggregating aggregation functions using actually the same aggregation functions of Prometheus so if you are familiar with Prometheus what we can do for example is to calculate calculate the rate of logs per second calculated over a one minute time with you so we can apply the rate function with a one minute time window to our log QL expression and what we see right now he is maybe like this here is a for each log streamer and again a log Stream is a unique set of labels we can see the rate of logs per second calculated as a one minute hang window so each data point is one minute hang window the rate per second and then it's a type it's a sliding window going back and this is each single pixel we see there now let's say we care the rate per second not for each log stream which in our case is the API application logging error level the API application logging info level and then we have back-end application logging ever back-end application logging info let we just care about the split between the log levels we can further aggregate the rate using a Sun so we can sum by by level of the rate and what we get now is the rate per second of logs splitted by the log level in the same way we can build dashboards so I'm going to build a new dashboard I add a new panel and in this panel we are going to copy and paste our query 15 minutes okay and this is again the same query before but moved into a dashboard and we can add a second panel but now we had the logs panel the logs panel shows you the content of the logs so this is not a metrics query this is a lot query and let's say for example we are just interested in to the errors so we filter by error and what we see here are metrics extracted from logs and the logs and when you change the time window like you select a portion of your chart and the time window the time range get reduced the logs update accordingly so that they are kept kept in sync good do you want to ask a passion now do you want prefer to wait as you wish okay it follows the rotation sorry follows yeah okay we use we register events on the file system yeah but we will support multiple problems I have another half of the presentation what do you think if we move all the questions to the hand because maybe some questions like this one may be answered in the second table and then I will be available the entire night if you want now I'm trying I can pick my microphone I'm trying to give you a better understanding of our lucky works so so far rocky has been a black box right I just told you okay there's this lucky server but I didn't tell you how it works how the storage works and so on so now I'm going to try to give you a sort of deep dive I will make it as much simple as possible or at least I will try so when from tail push the logs to lucky the entry point when you run lucky in the microservices mode so when you run a cluster for rocky is the distributor this the buter is a service which start three things it validates the input locks it shard and replicate the logs across the pool of investors then gesture is another lucky component which keeps all they received the lock streams in memory and builds a chunks of compressed logs so for rich locks tree it build a chunk of chunk is like a compressed buffer okay a vlog received and once the chunk is complete which means it's full the in gesture flush the chunk to a shared storage and offload the chunk from the memory otherwise it will go out of memory soon on the read path when you run a query using clock CLI or graph I'll like we did before the entry point is the query the query is the third service of loci and what it does is given the input time range of the query and the query expression it's a faction they lock the chunks for the lock streams matching the label selector in the query both from the in gesture and the shared storage now the injector contained the latest received logs which had not been flushed to the storage yet while the storage contains all the historical logs right which are already being flushed by the ingestion and then the query received death chunk some data which is still compressed the query decompressed the chance containing the logs and does a scan a full scan of death logs in order to filter down the logs by the log line which is not indexed so each log stream like this one or this one over the time built supper is series half chunks and each chunk contains the compressed laksa for a specific time range like between the time one and time two we have one chunk for each log stream and then between the timesheet t2 and t3 we have another chunk for each work stream and so on and if we look inside each chunk each chunk contains all the compressed logs for one single log stream given a specific time range so since one low key requirement is that all the locks must be pushed to Loki already sorted we don't need to index every single time Stamper inside the chunk we just need to index the minimum and maximum time center of the chunk because if you have the minimum time stamp the maximum time stamp and all the data inside is sorted we can do a binary search inside in order to find the logs that given a specific time range so the chunks as I mentioned previously are filled up in the ingested memory and then flushed to the storage once they are complete and the chunk is considered complete when the maximum chunk size in megabytes is reached or an idle time is reached think about an application which is I don't know which you the Commission so it stop logging that log stream or the lock screen generated by that application will not push logs anymore at some point the logs which are in memory in danger sir needs to be flushed to the storage the reason why we need an idle time we support several backends for the chunks storage which again are the compressed locks if you run on a the place that we support s3 or dynamodb actually s3 is also an option picked by people which is not running on AWS but use a storage which is as free compatible if you run on Google we support GCS and Nick table if you are an on-premise or other clouds we support Cassandra or if you're unlucky on a single node like I did in the demo before on my macbook we support also the file system the local file system now each single chunk is indexed using an inverted index so each chunk is indexed indexed by the label set the minimum timestamp of the locks inside the trunk and the maximum person this index is used at query time by the queriers in order to narrow down the it needs given a specific time range input time range and query and we support a few storages as well for the index a SS dynamodb book or BigTable Cassandra again if you run on premise or other clouds or if you running on one single node like the name I gave you before we support both DB which is integrated into locking now if we get back to the original architecture I hope now the shelf storage is no more a black box now we know that actually is composed by two storages the chunk storage which contains your compressed logs and the index storage which contain an inverted index which index each chunk by the label set and the time range of each tag now previously I mentioned that when from tail push the logs to low key the entry point is the distributor and I also mentioned that one of the things the distributor does is charting since the injectors need to keep all the locks in memory for a short time until we build up the chunk which is then flush to the storage if you have an eye volume of logs you will probably need to horizonte scales does loss there sorry does injustice so the distributor given each single log received an input compute an ash of the label set and given this ash picks one of the injectors from the pool of available injustice to which the specific clock should be pushed I know your next question may be what happens if an ingested that so since the latest logs are kept in memory if any gesture does the logs are lost the replication is the solution - let's say protect from this failure of Cheniere so lucky allows you to configure a replication factor which we suggest to be three which means for each single log stream we do receive an input in the distributor we compute the hash and given that ash we write that plug two one two and three injustice so if an ingested dies the log that have been pushed to ingest a one I have also been pushed to other to injustice in this case in the example the number two and the number three so try to stay with me I want to talk about this one because my boss is super fan of this one and I became a super fun as well because it's a very flexible data structure we use for many things including how the replication and sharding is built so the replication in shutting is based on the ring the ring is just a data structure okay like I needed a structure and its distributor it distributed in the sense that this data structure is shared across the pool of investors and distributor which means every time this data structure change all other all the nodes within the cluster will get the updated data structure now the ash I was talking before in Loki is a 32-bit ash but to keep this presentation simple we are going to use a Nash space be with the keys between 1 and 10 so let's say that we have our ash space which is the number which our ash can assume 32-bit in lucky between 1 and 10 in this demo in this presentation when an injector is started it generate a random value and the random value is the value the key that the injector picks within a ring a ring is just our space containing all the keys so then just the number one starter it generates random value number two and then adjust the number to start it generates a number value 6 and then being just a number 3 start and generate a random value 9 when the distributor compute and gnash our input label set the ash value he is within our T space 32-bit in lucky 110 in this presentation and what we do is the distributor check if there's any gesture with the net with the same value of the hash if there's no in gesture with the hash which we computed with the labels what the distributor jar does is working through the ring in order to fire clockwise in order to find the first in gesture the first the first healthy ingesting within the week and in this case the log is pushed to the in gesture number two if you enable the replication we don't stop at the first ingesting we find what we do is continue working within the ring until we find three we actually chew more investors if you have a replication factor of three so in this case it will continue to work you find it just a number two and then the three and then the one and that's how the replication in sharding is built and that's actually how several other features are built several features which require some sort of coordination are built within locking this data structure needs to be shared across the nodes of your locket cluster so ah we need another back-end we support console actually and we recently introduced the gossip support which means that we can build a peer-to-peer network between between all the locking notes so that we don't need to have console or HDD just to keep the ring data structure which is a few kilobytes out there now the good news is that we don't need the strong persistence to give an example we run blocking production and we just run one console instance in memory so with even without storing the data to disk without any hae or stuff like this why because even if you lose console or HDD for a short time and even if all the data inside is erased when a new instance which is of counsel or Exedy startup even if they're Hampton all the notes within lucky are able to reconstruct the ring the data structure stored within console or at City the Harley downside is that if any change to the data structure will happen during the console or a cd-rom time the change will not be propagated to the hover nodes the change will be propagated as soon as the back end is restarted that would be much more I would like to talk about but obviously I don't want to give you further overload of information we support caching in order to reduce the operations against the shared storage both on the right path and the right path lucky support multi-tenancy which means that you can enable the multi-tenancy in order to have locks fully hydrated between users this is the way we run our own lock is a service solution hosted locks this is host of the way single companies use when they have teams and they want the team hey to just be able to query their locks of the team hey and not exposing the locks of the applications of one single team to the entire company but before leaving there's one last thing I actually would like to talk about which is a new service we introduced or fairly new service we introduced which is an optional service you can put in front of the queriers and is used to speed up the query performances and it's the critic front-end now the Queen front-end does two things the person is a fair schedule now let me explain you the problem without the query front-end so let's say we don't have the cool front but you run lockets Kayla you have multiple query instances and you have a load balancer in front of the query instances so without the query from that what happens is the load balancer receive a query and it does around Robbie across the queries in the backend so you receive one query it goes to the query number one second query query number two and so on and another run from the problem is not every every query put the same load on the system there are queries which can be executed in 100 millisecond with general queries which can scan a bunch of laksa and may take 30 seconds to execute the problem is that if you have a pretty high traffic at some point you may end up in inter situation where your quick develop balancer send the query to a query which is already overloaded but you have another query which is haidle the query front-end introduced a queue in in-memory queue so when we send the query the load balancer balanced across multiple clipping front end zone for high availability purpose and the quilt front-end pick put the query received an input into an internal queue and then the query when it's idle picks another another query from the queue in this way we can better distribute the workload across the distance there in the queries the second thing which is built on top of the queue is the parallelization so let's say that you run a query like the one I used at the very beginning of the presentation in the example and you run this query for a large time range like I want to see all the nginx errors for the last 24 hours now if you have a pool of queries without the query front-end the query will be executed by one single query so the other queries are unused what the queer front-end does is picking this query with a large time range and splitting that 24 hours time range query into 24 one-hour queries so we build we put in the internal q24 queries each one for one single one-hour time range and then the pool of queries will pick these queries to execute from the queue will send back the result to the query front-end the query front-end merge the results and returns back to the client like Agrafena or log CLI the logs sorted by time that's with much all I've been very happy to be here [Applause]

Info

Channel: Marco Pracucci

Views: 6,529

Rating: undefined out of 5

Keywords: grafana, loki, opensource, logs, prometheus

Id: FfXxlqBIcmY

Channel Id: undefined

Length: 40min 6sec (2406 seconds)

Published: Tue Mar 24 2020