Introduction to Fluentd: Collect logs and send almost anywhere

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so you want to achieve centralized logging and you've probably heard of fluentd now fluentd is a fully open source log collector that allows us to collect logs from many sources and send it almost anywhere so welcome to another video about logging in this video we're going to be taking a look at the fundamentals of fluentd how to run it how to configure it and how to collect logs from various sources and push it to a central place where it can be analyzed the fluency documentation is a great place to start the first thing we need to know about fluency is the concept of input plugins there are various types of input plugins that help us to get logs so number one where do we get logs from this is what input plugins are for you may have a web server writing logs to a file you may have applications sending logs to standard out or to an http endpoint you may also have many containers that write logs to standard out so in this demo we'll take a look at three popular input sources on how to collect logs the first one is a web server where you might have application that is sending logs to a file this is very simple because we can install fluentdm configure to collect the logs and send it elsewhere the second example is having an application that uses code logic to send logs to an http endpoint this is useful for scenarios where you cannot have fluentd installed like a cloud service and then the third option is collecting logs from a docker host where you have multiple containers running so before we start running applications let's understand input plugins so to read logs from a file we're going to take a look at the input plugin called tail and all input plugins are defined by a source block so if we take a look at a fluency configuration file we have a source block and we say what type of input plugin we want to use we say tail we tell it format json meaning that we expect our logs in json format read from head true means that fluenty should start from the top beginning of the file we also tag our logs this is very important when we want to use post processing and tell fluently where to send the logs we will use tags to identify the logs we've collected so in this example i'm saying file dash myapp.log since i'm going to be running an application that's writing logs to a file we also tell it the path of where to collect the log so our application will be writing a log to this location we also give it a position file this is the file where fluenty will track up to which line it's read so that it doesn't start from the beginning so that's the file based input the next input source we're going to look at is the http input this is useful if you have an application or web server where you cannot install fluentd and you can write some code to send logs to an http endpoint so that is the http plugin now for that we have another source block where we specify type is http we give it a port of where to expect logs and the address to bind to we also provide a body size limit of the logs and a keep a live timeout to keep the connection open so now that we have various sources we can collect logs from a file from an http endpoint from tcp and udp and other fluency instances and various types of inputs but once we have these logs what do we do with them this is where output plugins come in input plugins use a source block to define where to get logs once we have the logs we need to do something with it output plugins allow us to do almost anything with the logs like dump it to a local file forward it to another fluency instance send it to another http endpoint send it to kafka s3 cloudwatch databases elasticsearch and more so where input plugins uses a source block output plugins will use a match block so in the match block we're gonna have to match a specific tag that's why it's important in the source block to specify a tag so that we can then match that tag in the match block now you may have multiple incoming sources so it's important to match the tag correctly so there are various types of output plugins such as file forwarding to another fluency instance another http endpoint there are also output plugins to send logs to s3 kafka and elasticsearch but before we take a look at those complex ones i highly recommend when starting out with fluentdeer start with a simple output file plugin that way you can collect all the logs and test whether your sources and tagging works and just write it to a local file so for that i have a match block here where i'm going to pick up my logs from a file and the type of output plugin is set to file and i specify a path this is the path where fluendy will dump all the logs it collects so the flue and detail input plugin will read logs from a file and tag it the output plugin will then match the tag and write those logs to an output file now it's important to note that these inputs and outputs are called plugins for a reason that's because fluentd has an extensive plugin capability so if you want to send your locks to a specific location that's not in this list check the documentation because it's highly likely that there is an external plug-in capability that you can install so enough about the theory and the documentation let's dive into the source code so in my github repo i have a folder called monitoring and logging and in here i have a readme guide for the whole logging series in a previous video we've taken a look at the logging basics spoke about standardized and centralized logging and in this video it's all about fluent d so i have fluentd folder and in here i have another readme in this readme is all the steps that i'm going to be showing you today so be sure to check out the link to the source code so you can follow along so let's say we want to read logs from a file of an existing web server so for that example in my monitoring logging fluentd introduction folder i have a docker compose file with various services so for this example i have a service called file my app which is basically an alpine container that runs a little shell script so if we take a look at the file folder in here i have app.sh which is going to run a loop that just writes logs in a json format to a file you can also see i have the example log here this is the file that it will be writing to and it will be appending to this log file now this could be representing a web server like an nginx or an apache server that's configured to write logs to a file in json format so to run that application i'm going to change directory to the fluenty introduction folder i'm going to say docker compose up minus d and i'm going to run the file my app that's going to go ahead and start up the docker container i can say docker ps and we can see it's running and if i take a look at the example log file we can see it's starting to append logs in json format here so now that our application is writing logs to the file how do we collect those logs so if we go back to the docker compose file we can see i have another service for fluentd so in this file i say container name is fluentd i want to run the fluendy image and i'm going to create a few volumes to mount files into the important ones in this example is the file folder this is the file folder we have on the left here where our application is writing logs to and the other important one is the configurations folder so i have a configuration folder with a fluency configuration file we mount that file into fluencyc that is the default location where fluentd will pick up its configuration file and if we take a look at that configuration file we have the source that i showed you earlier sources type tail it's going to read basically logs in json formats it's going to start from the beginning it's going to tag all these logs with file myapp.log and this is the path where it's going to pick up the log that's because we're mounting that log file the file on the left here we're mounting into fluency log files example dash log and we also have a position file so fluenty can remember where it's read up to so we're going to take all these logs from the source and we're going to match it with the tag and this is the output plugin so in the output plugin i want to test whether my source works so i'm going to just use a basic type plugin called file and i'm going to write it to an output directory and if we go back to the docker compose file you can see i'm also mounting that output directory here so there's another folder that's currently empty it has a placeholder and this is the location where fluentd will dump all the logs so to start fluentd in background mode i'm going to say docker compose up minus d flue and d that's going to go ahead and start up fluency container and if i say docker ps we can now see we have fluentd up and running and we have our application that writes to file so fluenty should now be picking up all these logs and we can see that is happening if we take a look at the logs folder that we mounted in this is the output directory we can already see that there's a file a folder created called file dash myapp.log and fluendy has started collecting all the logs in here now that is the tail plug-in example in action now this is useful for collecting logs from disk but there are some rare occasions where you cannot install fluentd on the server sometimes your application is running outside of kubernetes or outside of your own systems where you don't have control to install stuff so you may have no alternative mechanism of getting logs out of the application so for that example let's take a look at the docker compose file i have another small application called http my app this is running a basic alpine container image which is mounting an http folder installing cool and running in little applications shell script and if i take a look at that script here it's basically running a loop and every five seconds it's sending logs using curl sending logs in a json format to the fluentd endpoint so this basically simulates you having an application where you have a logging library installed and your application is sending logs out directly to fluentd so to start that application i'm just going to say docker compose up and i'm going to say minus d and http my app that's going to go ahead and start that up i can say docker ps we now have the http application up as well as the file application and if we jump to the fluentd configuration file let's take a look at how that is set up so i have another source block here where i say type is http i tell fluency to listen on that port bind to this address and this is the body size limit of the messages and it keep a live timeout to make sure i'm not creating a lot of connections and i'm reusing those connections i then specify a match block where i match the tag and you can see here i'm using a wild card in this example so i can match the tag that's incoming and i write it to the output folder and if we take a look at the output folder over here we can see that there's already an http log of all the logs that fluentd has collected via http endpoint and we can see all the logs are being collected and written to file here so another useful feature is called filters [Music] now filters is a basically a post processing pipeline where you can pick up logs and then you can do stuff with the logs before sending them to an output plugin now a filter looks very similar to a match it allows us to do various different types of filter capabilities so let's take a look at a filter example so i have my source collecting my files logs and before i send it to the output plugin let's do something with that log now filter block looks pretty similar to a match block where you can match a specific tag and it also has various plugin capabilities so in this example i'm showing you the record transformer so you can do stuff with the log record so here what i'm doing is i'm basically injecting a host parameter field into my log so i'm taking the host name of the server and injecting it into the log before sending it to the output plugin so i'm going to add that for the file logs and let's do the same thing for the http log so i have the source here i'm then going to inject a filter over here using the same tag as the match block and i'm also going to transform this record and insert the hostname of my server this is useful to understand where the logs are coming from if you have multiple servers running behind a load balancer now to make that change i have to restart fluency so i'm going to say docker compose restart fluency and that's going to go ahead and restart it now if we go back to the output area where the logs are being dumped and i open up one of these buffer files and i scroll to the bottom i can see that blue and d is now injecting the host parameter that we've specified so the hostname of our containers and this is the hostname of our container image which is the container id so that is how filters work fluency also allows us to structure our config to make it easier to read and less complicated and this is done using the include syntax so to break apart our fluency config into separate files you can see we have two parts of our config here we have this big block here which is related to reading logs from a file and writing it to a output file and we have the section here which is related to reading logs from an http endpoint and writing it to a local file so let's start with the file part here so let's create a new configuration file called file.fluend.conf and let's take all of this stuff that we have over here and paste it into that file and now that we have all of that stuff in this new file we can go back to our fluent.conf and we can include that file saying at include file-fluent.com we can then go ahead and take this block out so we can create another configuration file called http.fluen.com and we can take this block out and paste it into the new file and then we can simply refer to that saying at include http fluent.com so you can see my configuration file is now quite neat easy to read and well structured now let's say we have a machine with docker on it with a bunch of containers running and how do we collect all the logs from these containers now i'm going to show you a simple method in doing so and this is the same methodology we'll be taking a look at in a future video when we run fluentd inside of kubernetes now to read container logs let's create another configuration file and this time we call it containers-fluentd.conf and in this file i'm going to add a new source to tell fluendy how to grab container logs now collecting logs from containers with fluent is very simple and the same concept applies to kubernetes when running fluent d as a daemon set if we take a look at our docker compose file i have fluentd and i have a very important mount point which is var lib docker containers this is the folder of the docker host where docker stores all container logs so we're going to mount this into slash flue indeed log containers and if i go back to the containers fluid conf we can see i'm using the same concept of the tail plugin i showed earlier i also expect format json because docker writes its logs in json format by default read from head true we tag it this time as docker dot log and this is the path where we mount in all the docker host logs so that's being mounted into slash flue indeed log containers as i mentioned in the docker compose file and we put a wild card here because this is the folder structure that's going to have all the container ids and then the container id json.log and then we also specify a position file so dot so fluentd can remember where it read up to and then finally here we specify a match block as i specified before we say docker log and we're going to match all those logs coming from that source and writer to output and to include that config file i'm going to go back to my fluent.conf and i'm going to say at include containers.fluid.com that'll keep our config file very well structured so we're now collecting file logs http logs as well as docker container logs and to kick off that change i'm going to say docker compose restart fluency and we can immediately see in our output folder here we now have a docker.log this is the tag of all the container logs we can look at the buffer and we are now picking up all container logs on this machine to showcase this i can say docker run minus it redis to start up a redis container you can see the logs are written to stand it out and if we give it a minute and we go back to the docker log buffer of fluendy we can see that fluently has collected all the reddest logs so now that we know how to collect logs from different sources you're probably thinking well i don't want to write all these logs to a local file you want to push it out to s3 kafka elasticsearch cloudwatch so to achieve that i'm going to show you how to send log externally now fluentd has a very extensive plug-in system and it also has an elasticsearch plug-in which you can see over here so for this demo i'm going to go ahead and i'm going to install the elasticsearch plugin now for that i've created a simple docker file over here where we say from fluentd we pick version 1.11 i switch to root user so that i can install the plugin i install the plugin using gem install and then i switch back to the fluency user if you take a look at my docker compose file i've also specified a build context and an image name so i can just say docker compose build fluentd and it'll go ahead and build automatically now to send logs to elasticsearch let's go ahead and create another configuration file called elastic.fluend.conf now in this config file basically what i want to do is have three match blocks one for each source since i have a file an http and a container source i want to take those three sources and send the logs to elasticsearch so i'm going to go ahead and create three match blocks each of them matching their respective tags so this one is for the http logs i say match http star.log and i say type is elasticsearch the hostname of elasticsearch and the port as well as the index inelastic search to write to now elasticsearch has a concept of indexes so you can write to different indexes i'm creating an index per source so i have index name here is fluentdhtp and the type name is fluentd i then specify a second matchblock which is my file application the application that writes to file and i also have the same elasticsearch details here and in this index name i have fluentd dash file with the type of fluency and then the final match block is the docker log so all the container logs come to this match block i have the same elastic search server specified here i also have an index name called fluentd docker remember you can specify these index names and set up the indexes any way you want so to use this config file i'm going to go back to myfluen.conf and i'm going to include elastic fluent.conf and this configuration file basically sends all logs to elasticsearch now because we're sending our logs to elasticsearch here we may want to stop sending the logs to the output file so in the file fluent.conf i'm going to go ahead and i'm going to remove the match block i'm just going to comment that out i'm going to go to the http conf i'm going to comment out that match block and i'm going to go to the containers file and i'm going to comment out that match block we want to make sure we're not writing it out to the local file and prevent it from hitting elasticsearch so i've commented all that out and now elasticsearch config file will take an effect and send all our logs from the three different sources we have here to elasticsearch so now that fluentd is ready to send the logs to elastic let's start up elasticsearch and kibana so i'm going to say docker compose up elasticsearch kibana and that will go ahead and pull those images and start up those two services and now if i do docker ps we can see we have kibana and elasticsearch up and running now it may take a couple of minutes for kibana to start up so just be patient and then head over to localhost 5601 to access kibana dashboard and because we've made those config changes we want to be sure to restart fluentd so i'm going to say docker compose restart fluentd so those changes start taking effect and we should see logs being shipped to elasticsearch so what we want to do after a few minutes is head over to the menu say discover and you'll get to the index patterns section and to see when log starts coming through is we're just going to type in here fluentd and you can see nothing has come through yet so we have to be patient so in the index pattern section i type fluentd and i say create index pattern we can see that there's no logs yet so we have to be patient and just click this button until logs start coming through eventually you'll see it will start coming through and now we can start to type an index pattern so i can go ahead and say something like fluentd star and we can see we have three indexes here fluendy docker these are our container logs we have our file application and we have http endpoint logs i can then click next create index pattern and head back to the discover section to start looking at logs and now you can see we have this area where we can query our logs and we can see all our logs are being shipped to elasticsearch so it is all good and well all working nicely so i hope this video gave you a good fundamental understanding of fluentd and how to configure it and how to pick up logs from various sources and more importantly how to send logs out of the system now in a future video we'll be taking everything we've learned and instead of running a local container we'll run it all on top of kubernetes so we'll take a look at how to achieve centralized and standardized logging in kubernetes ecosystem so be sure to let me know down in the comments what sort of vlogging videos you'd like me to cover in the future and also be sure to like and subscribe and stay tuned for the next one remember to check out the community link down below and if you want to support the channel even further be sure to hit the join button to become a member and as always thanks for watching and until next time [Music] peace you
Info
Channel: That DevOps Guy
Views: 25,804
Rating: 4.9710145 out of 5
Keywords: devops, infrastructure, as, code, azure, aks, kubernetes, k8s, cloud, training, course, cloudnative, az, github, development, deployment, containers, docker, messagebroker, messge, broker, aws, amazon, web, services, google, gcp, logging, fluentd, elk, elastic, elasticsearch, efk, kibana, logs, datadog, kafka, s3
Id: Gp0-7oVOtPw
Channel Id: undefined
Length: 21min 6sec (1266 seconds)
Published: Fri Oct 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.