Everything you Always Wanted to Know about Filebeat * But Were Afraid to Ask

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome in this episode we are going to discuss everything you always wanted to know about filebeat but were too afraid to ask my name is ricardo ferreira and i'm a developer advocate with elastic and in there i'm part of the community team that spends a very considerable amount of time creating content like this in order to make sure that developers will love the technologies from elastic this module we are going to discuss the file b technology that as you may know is part of the strategy that alaska created to actually ship logs and ship data or telemetry data into elasticsearch that can be further analyzed with kibana the agenda for today is going to be a quick overview about file beach so for those of you that never heard about what is this technology you can get acquainted to it and then we're going to start discussing some more advanced concepts such as modules and outputs and lastly i'm going to actually give you a overview about which knobs you can actually use to improve the technology's resiliency that can be suitable for scenarios where you you need to handle fault tolerance or deal with situations where the file beat crashes in the middle of the process of the given log without further ado let's get to it for a very long time users in the community use it this very famous stack that is commonly referred as elk which is the acronym for three different technologies from elastic elastic search log stash and kibana elasticsearch is the heart of the stack it is a data store that reliably and scalably stores data and makes them available in a new real-time fashion for external consumption logstash is the technology from the stack that knows how to gather data from sources and delivers them into destinations usually elasticsearch but it might be others as well logstash also presents users with pipeline processing capabilities that sits in between the process of gathering the data and delivering them to destinations which gives them the chance to actually process the data capabilities of data augmentation data curation or data filtering kibana is the eyes of the stack it is the place where users actually go to play and analyze the data if we concentrate specifically on the live stash technology we're going to see that logstash has three main components the ability to actually gather data to perform pipeline processing and to deliver data to destinations most of the use cases regarding data collection or data centralization or log centralization boils down for number one knows how to gather the data and number two knows how to deliver the data to destinations steps that as i mentioned before logstash knows how to do and it is very successful with however logstash because ultimately it is a processing pipeline technology has a unique operational burden that most of the time is unnecessary if the use cases is all about data gathering and shipment for this reason alaska created the beats family which is a set of lightweight agents that knows how to collect specifically pieces of data such as logs metrics auditing events or sometimes even more specific events such as heartbeats so what is filebit filebeat is a lightweight agent that knows how to tell logs from the sources and ship them into destinations such as elasticsearch that's the main use case where you would use filebeat filebeat is also interesting because of the way it was created for example because it was written and go it can run natively into a given operation of systems no matter if it's a container a vm or bare metal so it offered users some choices also because it was written and go it can run natively in the operating system without requiring the installation of some runtime libraries or runtime platforms such as a java virtual machine jvm the fact that filebit is very lightweight also contributes for users to be able to scale out the log gathering and log shipment process without necessarily have to scale out the target applications there are two main concepts that you need to understand in order to start playing with the file beat technology harvesters and prospectors the very first thing you're going to actually set into the configuration trial for on file beat is what we call a prospector an example of a prospector you can see here on this directory that was specified and without necessarily specifying what was the name of the file to be read or tailed in this example over here multiple files can be discovered in the phase of prospecting so let's say for example that two files are discovered by the prospectors that means that in runtime two runtime components called harvesters will be created for each one of those files that has been prospect prospectors are also known as inputs as you are going to see in the configuration file that we're going to see next speaking about configuration file it is important for you to know that in order to use filebeat you are going to externalize all this configuration into a yaml file so file beats are lightweight agents that does whatever has been specified in the yaml file where this yaml file is going to be located is usually up to you but it is a best practice to co-locate the yaml file along with the agent itself too complicated let's take a look into a first example about how to get started with the 5b technology in this example i'm going to use filebeat to ship logs that are sitting here on my laptop to a instance of elasticsearch running locally so i am currently executing elasticsearch and kibana here as docker containers on my machine so they are up and running um so let's actually check here in my kibana about any indexes that we might have so as you can see here there are none so we're going to start from scratch so the first thing you actually have to do with your file bit is to look for this file here called filebit.yml if you open this file you're gonna see that there will be some this session called filebit.inputs where you're gonna configure your prospector so probably there will be a default prospector already configured that will point to a specific location of your target uh architecture in my case i'm using linux so that's why it was uh suggests in this folder over here uh let me check if this folder actually has files that we can test so there are some so we're good to go so the first thing you have to do is to enable this uh prospectors over here which will inspect this folder here and create harvester for each one of those files to discover it right uh secondly you have to look for the session here down in the file that says output.elasticsearch that that's the place where you're gonna set the end point of your elasticsearch instance right if everything is okay you can start using your file bit configuration actually if you want to actually check if your file bit configuration is uh it's correct you can use this command called file file b test config it's going to parse all your file bit configuration and check if there are any errors as well as if you replace the option config for output is going to perform a simulation of how to interact with your output which in this case is elasticsearch so everything seems to be running properly so if that's the case i'm going to run this setup phase so what is the setup phase the setup phase is the part where the file beat will contact the output and start creating all the resources and artifacts that will be necessary for follow b to actually ship the data to a lot in this in this case elastic search and we're going to be able to see using kibana so pragmatically speaking the process is actually uh creating indexes and index patterns and loading up some dashboards already so we we can actually check this um if we go back here to our kibana and click reload indices we will start seeing some indexes already created so probably there will be some index patterns and if we go to the discover session of kibana we're going to see that your index parent is already being used but obviously there are no data because we're not executing we're not necessarily executing file beats is just the setup phase right also keep in mind the setup phase you only have to execute once right or at least every time you change the configuration but every time you run file beat you don't have to run set up setup uh reason i'm what i'm saying is is because usually takes some time to complete like not sure if you notice here but it takes up to a minute so keep that in mind now that you have everything interrupting running we can execute file beat um i'm going to provide the option minus e so we can see the log of everything but as you can see here a lot of different harvesters were created for each file that was uh discovered so the file bit is up and we should have some logs already shipped to elasticsearch and kibana so if you go back here to kimana and click refresh we're gonna see that there are some blogs already created so those are the logs that has been harvested by the file beat and indexed here on elasticsearch so everything is working as expected right now what i'm going to do is something that is going to be either interesting and strategic because i'm going to change this my configuration to instead of shipping logs to my local installation of elasticsearch i'm gonna actually change the configuration to ship for elasticsearch run on elastic cloud right and by doing so we're gonna prove a very important concept that we're going to discuss a little bit later regarding resiliency right but as of now let's change the configuration to make sure that we're going to ship to elastic cloud so i'm going to stop my file bit for a second and here on the configuration file right in the session that says output elasticsearch if you scroll up a bit you're going to see those two options cloud.id and cloud.off those are the options that you need to set for file b to be able to communicate with your elastic cloud deployment so i'm going to uncomment them and keep in mind that setting those properties completely overwrite the any configuration that you have previous set before so you need to provide a cloud id and a cloud off so where did you get this information first of all the cloud.id so in order to demonstrate this i'm going to open this deployment that i have here on my elastic cloud account called filebeat tutorial um so if you click on your deployment you're going to see the session over here where it's going to show this cloud.id that already gives you the the information so just click here on this copy and then you can simply pass here on the configuration file from yaml so the first part is done the second one has to do with the username and password that you gave to your deployment so uh for those of you that know how elastic cloud works you know that every time you create a deployment a username and password are generated for you automatically so when i created this deployment i made sure to save that information here on this file called file tutorial csv so here's the information i'm going to simply cop them from the csv file and pass it here uh only make sure that instead of using a comma you're going to use a column to separate the username and password right so with that you're going to see that everything will be created so just make sure we're also going to start from scratch here on elastic cloud let me open up my kibana from this uh elasticsearch and kibana instance uh and make sure that everything is fresh and we can use it so yes so it's good so what we're going to do now is to run the file bit setup one more time because again we need to run this to make sure that all the resources and artifacts will be created on elasticsearch and kibana so like i mentioned before every time you change the configuration and if this change inquires in something that points to a different index to a different endpoint you're gonna need to run the setup command one more time right if you don't run this it's not that file beats is not gonna work but it won't create the server side resources that are needed for the shipping part right so keep that in mind uh so it's running so if we check here on our elastic cloud instance click here on reload indices we should have just like before we should have a new index created we should have a index pattern also created and if we go to the discover session of kibana we should see the file index pattern already being used but of course with no data right and now that let's wait until the setup part finishes and that's the part where i would like to emphasize something really important that we're going to discuss later regarding resiliency right so what i'm going to do now it won't make any sense right so i'm just emphasizing this because i'm going to delete a folder from filebeat but don't worry i will explain what this folder does later on so i'm going to delete this data folder over here um and as i said before i'm going to explain what this does so now that i've deleted i'm going to simply execute my file bit one more time and now we should have all the harvesters for each file created now harvesting and shipping data but now to our elasticsearch running on elastic cloud so if you click here on refresh we should see the very same set of documents indexed that has been indexed before but before it was running on my machine and now it's running on elastic cloud so that concludes our first example of today right so let's get ready to see three more examples of situations where you would need to configure file beach to handle specific types of logs let's start by discussing multi-line events multi-line events are those events that comprises multiple lines within a log let's understand this a little further with one example in here you see that we have this java stack trace that is very popular because it like fills multiple lines with a bunch of uh lines that represent a piece of code but all of them represent a single event so how we instruct file b to interpret aldo's line as a single event we need to use something called multi-line pattern let's understand how to use this within this example so in order to demonstrate this i'm going to stop one more time this file beat and uh i'm going to use this javascript trace configuration file that i have prepared right here so as you can see here this is another entry that i'm going to simply copy and pass here on the list of the file bit inputs we're going to actually insert another uh prospector here in our file bit configuration and as you can see here this prospector points to this log file over here which happens to be this file over here so this is a javascript an example of javascript trace that has pragmatically 59 or 60 lines within this file but conceptually all of those lines have present one single event so what we're going to do is to use this configuration here called multi-line pattern to create a regular expression that will parse that those lines and interpret it as if they were essentially one single event so now that we have added the uh the configuration on our file bitcoin sorry uh the new input the new prospector in the file boot yaml file we can actually start the file bit agent one more time but before doing it let's take a look of how many documents we have so far so currently we have eighteen thousand six hundred fifty five documents so what needs to happen is that by the time i start a file beat it will detect that log file and send only one new document containing the one single event that we that i've just showed you so instead of sending new 60 59 or 60 new lines or new documents to elasticsearch it needs to send only one so let's test it out uh for this i'm going to start my file beat one more time so file b i'm going to use the minus e to see if everything is working as expected and let me try to locate the line that says that it's reading the new file here ago so as you can see here a new ravister has been created for the javascript trace file and since our file bit is already running let's check what happened on elasticsearch site so let's click here on refresh and we should have here go we should have a new document so 8656 so this last one over here should be the document that uh has the java stack tray so as you can see here it adhered to the same columns for the ecs schema and here again the message itself as you can see here is comprised by let's uh look this in the json view so you can understand better the strictly of this so the single message comprises all those lines because the message uh multi-line pattern detected that all those lines belong to the same structure now it's very important when you are coming up with the regular expression that you test at first to make sure that we'll um we work with different type of stack traces that may come and analog right so that may require you to test a few stack traces i'm using the example of stack trace but it could be something else right it doesn't necessarily have to be only four stack traces in this new example we're going to handle what we call structured events structured events are called like this because they have a very unique and particular structure that delineate when the event starts and when the event finishes let's take a look in one example and here you can see that there is this event that always start with this start new event prefix and then there's the content of the event that comprises one or multiple lines it doesn't matter but it will eventually be finished with this end event mark so just like we did with the javastack trace we need to come up with a way to force file beach to interpret this as a single event let's take a look and how to do this i'm going to stop here my file bit one more time and we're going to take a look into this another type of log over here that i have this example where i have this uh three entries of this event one two and three i'm going to use this yaml configuration file over here that as you can see here create a new prospector that basically points to that file over there i'm going to copy it here and pass into our file bit yaml file in the list of inputs and this new version of input you can see that not only is pointing to a new log file but it also is making usage of the multi-line pattern but in this time we are not using a regular expression we are using a pattern that indicates the prefix of the start and then we are using this additional configuration called flush pattern that indicates when the event has to be flushed let's test to see if this is actually working as expected so just like before i am going to start our fire beat one more time now that we have a new configuration in place and then this new configuration should indicate that three new log events needs to be produced to elasticsearch uh so the file bit has been started so let's go back to our kibana and then let's just refresh it over here so as you can see here uh we got four new hits right the first one probably is because we have only filtering here for the last 15 minutes but if we increase this for the last let's say 30 minutes we should go back to the same list as before but now we have 18 615 events where the three last ones should be if all everything is correct should be those events yep so this is the body of the third event and then if we expand this one here it's going to be the first event and then this one over here will be the second event so as you can see here we now have three new log events that has been produced to elasticsearch and filebit was able to actually parse those events using the configuration that we put in place this is important because uh there are a lot of lag systems out there that sometimes make uses of those like pragmatic ways to delineate when the event starts and when the event finishes and sometimes they might be using some in-house notation or some logic that only the people or the organization that actually build the application will understand uh the important part here is that filebit is actually able to deal with those situations with some extra configuration and since we are talking about custom and weird formats let's take a look in another example of format that you would need to configure file beat to handle it so in this new example we're going to take a look into a weird format that you might need to set up file bit to handle in this example here we have this log file comprised of three lines in which each line has its timestamp the name of a superhero and the number of movies from that superhero now the interesting thing about this format is that there are sep the each field is separated by this double column over here which is not a standard type of format but anyway we're gonna we're gonna need to use a custom configuration here so file bit is able to handle it the good news is that you don't need to actually create everything from scratch the elastic stack actually has support for generating a lot of stuff that are going to be needed in this example here using a wizard so let me show you how you can actually generate all of this and explain what is going to be generated so uh the first thing you have to do is actually parse this file as an example so if you go to your elastic stack and go to the home folder you you have this option here called upload file where you can click and you're going to be given this choice over here to upload a format file so i'm going to upload that example that i was showing before this weird heroes log and once you do this what's going to happen is that the the wizard is going to try to parse that and it's going to try to apply what we call a grog pattern so for those of you that know and come from the long seash word you know what i'm saying drock patterns are basically a way to parse content using some expressions so in this case it tried to uh infer the expression and it almost succeeded like uh we can see here that the first column was indeed a timestamp uh and secondly uh the last field was actually a integer field but we're gonna need to actually give a hand on this wizard so you can click here on override settings and then you can uh help with the grok expression so i'm gonna replace this here to set the hero name which i'm going to use this expression over here that basically is going to use some string parsing and i'm going to call this field hero name and the last few is almost correctly parsed it is an integer field but we only need to give a name which is the number of movies right so and the timestamp is correct so i'm going to apply this grog pattern and as you can see here now we have a expression a grok expression that when applied to our sample could actually be able to parse all of those so now we can continue this wizard and click import where you're going to be given the chance to actually create a new index where all those uh by the time you index documents using those uh this the graph pattern that has been created it will end up in this index that you're gonna specify here which i'm gonna call heroes i'm gonna click here on import when i do this a lot of things are gonna happen happen and the most important one is that you will be given the chance to copy here you see this option create create file bit configuration you can simply copy this template that has been generated for you that you can use in your file bit as your yamo file configuration file so that's what i'm gonna do i'm gonna copy this to clipboard and i will create a new configuration file here on locally on the machine i'm going to call this zero four weird heroes log dot demo and i'm going to pass this configuration over here uh we have to tweak a little bit this configuration because we still need to provide the information about the path of our logs as well as the uh information about how to contact elasticsearch uh on elastic cloud but before we do this let me let me show you something really interesting so as you can see here we're gonna actually parse using the multi-line pattern like at least uh initially but the content that will actually be sent to uh to elasticsearch will be actually post-processed by a ingestion pipeline so if we take a look here on the stack management um you're going to see that here on ingest node pipelines a new pipeline was created called heroes pipeline in the here is the grog expression that we've created before so what's going to happen is that file beat will transmit the content that will be read from the log and we submit to this pipeline that we process and the result of that process is that what would be actually written into the elasticsearch index pretty cool huh so everything as you seen can be was automatically created for you so you don't need to actually do anything about it the only thing we're going to do now is actually finish the configuration of the yaml file so let's do this right now um i'm going to set the path here of the file so the pass of the file i can take it from here will be essentially the same path that we use it for the other ones except for the file name so this is going to be 0 4 this file over here 0 forward weird heroes log dot log this is the file also we're going to need to actually set the configuration for talk to elasticsearch on elastic cloud so we can actually remove all of this over here because we're not going to contact any elastic search running locally but instead we're going to use those two parameters cloud id and cloud off that we have set before this one over here we can simply copy from this file and pass it over here so that way the configuration file file bit will be able to talk with uh elastic cloud so now that we have done this let me stop here this file beat execution for a second and one more time i will delete this data folder over here don't worry i will explain why i'm doing this later but for now what i'm going to do is actually use this new configuration file as my configuration file for filebeat so i'm going to run filebit minus c to specify a new configuration file so in this example i'm going to use this configuration file over here let me just run a test config to see if everything is correct so it seems to be and also i'm going to run a test output to see if my configuration is able to contact elastic cloud so it seems to be able so everything is up and running so what i'm going to do is to execute the setup part one more time just to make sure that all the resources will be created so i will replace test output with simple setup and keep in mind that now this time we are actually uh using a different configuration file specified here on the common line so that means that whatever the setup that has been executed here would be executed giving what you've described here on this file bro that's one of the things that you have to keep in mind every time you change your configuration or provide a new one you might need to actually re-execute the setup part but like i mentioned before the setup only has to be executed once not every time so let's wait until the setup finish and then we can actually test our configuration at this point we should have a heroes uh index as well created so let's just make sure here an index manager so as you can see here we have a new index and we have a index pattern as well created for us so this is looking good um and this setup part seems to have completed which is also good so because of this i'm now going to execute my file bit for good so just that's gonna run for the first time and then uh we should have a harvester for that log file here we go we have harvester created for that weird log file created so since the file bit is running let's check what actually matters which is data coming in so let's go to discover session on kibana so we're now looking to the file b index parent let's change this to the heroes uh pattern and although it seems to not be showing anything right now but trust me there is data over here we can actually double check this if we go back to stack management and if we click here on the heroes you're gonna see that there is three documents here indexes now why is not being shown here on discover tab because if you look here to the actual data this is something important if you look here to the actual data where it's a lot here you go you're going to see that uh the data that has been set the time stem field was set to february the 10th right and today is february the 11th so what we need to do is instead of filtering for only for the last 30 minutes we have to check for example the last seven days seven days will be more than enough and then voila we're gonna be able to see the three records let's just double check his three records yep captain america ghost rider black panther and then if we if you can see here you're going to see that the schema created contains the hero name field the message and number of movies and as you can see here those are the data those are the uh the hero name that has been loaded with this data so is the proof that is actually running as as expected so filebeat is an amazing technology as you can see here there it supports a variety of use cases that you're gonna need to handle either for uh common formats as well as performance that sometimes is not coma and has been only created for a specific organization or company regardless filebit provides you with the support to handle and parse all of them in our next stop we're going to actually discuss the concept of modules and outputs so let's continue one of the key characteristics of the beats family is simplicity it was originally designed like this because the idea of using beats is for users to spend less time doing implementation or custom coding and more time effectively shipping data to elasticsearch or any other technologies the way elastic created this was by creating this concept of modules modules are reusable extensions that users can simply enable and right off the bat if you have to deal with some parsing of a known technology such as a mysql database logs so there is a module that knows how to parse this and thereafter they can be shipped to the destination or outputs one of the key things about the modules concept is that it has support for pretty much all the known technologies that we have all these days for example messaging technologies or database technologies or virtualization technologies networking concepts and more importantly there are custom extensions that were created by the community and thereafter incorporated into the project so the result of this effort is that there's virtually a huge amount of modules pre-available for you to use let's take a look and how this work with an example so what i have here is an example of log for a radish database so in this example we're going to actually parse those logs over here and ship it to elasticsearch however different from what we have done before where we need to come up with regular expressions or custom logics in order to parse the lung we're not going to do anything for this why because we are dealing with a known technology so what's going to happen in this case is that i'm going to enable a module that knows how to deal with the reds logs and then i'm going to simply update my file bit configuration with this information and start up the bit agent one more time what needs to happen is that all the information regarding the logs will end up on elasticsearch and more importantly there will be some additional dashboards that will be available on our kibana everything made completely automated and who is going to take care of this firebeat so let's take a look and how this works uh so the first thing i'm gonna do here is stop the file beat one more time and because i need to enable the reds module so before i actually do this let me show you how the structure of modules work in your file bit distribution so if you look here on the structure of the folders you're going to see that there's this folder called modulus.d where if you expand it you're going to see that there is a considerable amount of pre-built modules already created so if you scroll down a little bit you're going to see that there is this radius module here so i'm gonna click this file so we can take a look on this you're gonna see that there is no much information about how to parse the log it's more about you indicating where to look for the redis log files so the key idea here is in tensioner so as i mentioned before you are not going to effectively come up with any expressions to know how to part this is the responsibility of the module the module just has to have to be enabled so you can start shipping the logs to elasticsearch so what we're going to do here is effectively the it is already enabled as you can see here so the only thing i'm going to do is to change the location of this log which is going to be pretty much the same location that we have set before for the other modules i'm going to just copy the location from one of them for the uh the java stack trace for example and then i'm gonna pass it here but the log file is different the name of the file is um ready server dash zero zero one so let's put it here ready server dash zero zero one dot block that's correct that's correct so uh now that we have actually uh enable the location of the log we're going to actually enable the mod for file beats to actually pick that up during the bootstrap part when it starts and starts doing its job so in order to do this a couple of things you need to do first of all not sure if you notice here but the each one of those modules has been shown here they have this suffix called disabled or dot disabled right so that indicates that the model itself is not being used or should not be used by file beat how can you change this well there are a couple options first of all you can manually do this uh there's an option right or you can simply go here to your file beat configuration and use the command file beat modules and use the words enable or disable if that's the case and mention the name of the module that you want to enable so if you do this what's going to happen is that not only it will actually rename the file for you appropriately so if you ever not you don't know what exactly it needs is the convention about the file name this using this command will take care of all so i would recommend using this command because that way you can ensure that you are using the right approach so what happened right now the module has been enabled step number one and the model itself has been configured to point to the exact location where to find logs it's worth to mention that regardless if this is a module or the standard configuration that we've done before the the concept of harvesters and prospectors still applies so what we have done here in this situation we've configured a prospector right and uh actually not in here in the reds module uh we configure a prospector that happens to point to one specific follow but it could be an expression that could filter different files so regardless this perspective is going to instantiate different harvesters so now that we have the module enabled what we need to do is uh remember what i said earlier so every time you change the configurator file bit you have to run the setup one more time so that's what we're going to do now so we're going to run the setup and what needs to happen right now firebit is not necessarily going to create new indexes or new index patterns because ultimately all those logs will end up in the same index that all the other logs will uh were being been sent but remember that modules are not necessarily only logically related to parsing it also comprises some custom and nice dashboards that you could reuse and kibana those dashboards can be used as is right you don't need to change anything in order to use them or you can actually pick up the pieces of those dashboards and create your own that's one of the key characteristics of using modules because the intention is to minimize or sometimes even eliminate any effort from your part to actually getting things done so let's wait until the setup part actually finishes so we can start up our file bit one more time and see if the logs from raddis are going to show up in our alaska search and kibana so in order to double check this i'm going to leave it here the log actually open so we can see what are the lines that has been sent there so the setup has finished so let's just execute it one more time right so uh unsurprisingly uh a lot of how the a pre-existing harvesters needs to be instantiated again but among this line probably there will be one saying that the log file from radius was also instantiated a new harvester for this uh let's check if that's true or not so if we go here to um if you go to dashboard that's the best way to check this out you can actually filter here for new dashboards so if you simply type red is in here you should see that there is a dashboard called overview ecs or from five beat radius and then this dashboard will show up some information that was inferior by the information came from the logs but what i actually you need to check is those this table here on the bottom that basically contains the same lines from the log for example there is this line here that says warning no config file uh specified that's part of the log and then if we look our log from the uh from radius we're going to see this line over here that basically is the same log that was shown here in the dashboard but keep in mind that radius has this weird it's not weird actually simple to read but it's very unique to raddis about how to structure the log for the usage for example of this uh the hashtag for example in here or the usage of this like asterisk over here so there's no logic if you think about it but somehow the module uh managed to actually know how to read this and parse all the fields from it even there are some extra details here that if you expand it you're going to see the same structure so pay attention that there are some additional fields that also were inherited from the ecs schema last common schema so that means that the same amount of views that were used before for the other logs are also being used here so the modulus concept knows how to do this transparently for you so as you can see here you can speed up the development of gathering and shipping logs from existing and known technology pretty fast by using the concept of modules using uh filebeat what we're going to check it now is another type of extension that you can plug into filebeat to change not the way you're going to parse data but where you're going to send the data and we're going to do this by using the concept of outputs outputs are just like modules reusable extensions that you can simply reuse and apply to your file bit configuration and all the sudden you're going to change where the data will be sent in this example we're going to actually not send the data anymore to elasticsearch but instead we're going to send it to a kafka cluster that i have it running here on my machine locally in order to do this we need to obviously change the configuration of file beat but the change itself is going to be replacing the output configuration so again i am going to stop the file bit one more time and the first thing we need to do is to go back to the file bit configuration and comment out the two parameters that we have set to send to configure file bits to send data to elastic cloud so those options are if you scroll down going to be this one the cloud.id and cloud.off so coming out those again and why you need to comment out because the way file beat things is like this if those parameters are present in the configuration they take precedence right so what that means is that if they exist in the configuration any other outputs that you have configured will be ignored so if our intention is to send data now to kafka instead of elasticsearch we need to remove this uh perimeter so they're removed and uh we need to change actually this session over here we're gonna actually configure another output so i'm gonna comment out the output for elasticsearch we're not going to no longer send to elasticsearch and we're going to use this configuration instead so i'm going to copy and put there where it says output actually can be anywhere the comments are just there for illustrating uh what they means so the configuration is similar uh mind that they all start with the word output and then you have gonna specify the name of the output those names are unique right just like we use it in the reds model before uh the word actually has to be unique throughout all the modules so in our case here we are using kafka which has to use the word kafka and then all the parameters uh subsequently for the output will be unique to the um to the output so for example hosts or topic or required acknowledges are things that only make sense in the kafka world so in our case we're going to send uh to a topic that sorry to a cluster that is running here on port 1992 which is the standard port for kafka and the topic that we're going to use we're going to call file bit dash topic right uh the file beat actually the output itself is responsible for in the case of this output to create the topic programmatically so you do not don't need to actually uh create the topic by yourself uh for those of you that know kafka you know that this sometimes might be inconvenient because you might want to have a specific top with a given configuration uh sometimes with a specific number of partitions because you're gonna implement a custom behavior regarding parallel consumption so the point is if you want to go ahead and create your topic beforehand you can do it right so the output is smart enough to like if the topic exists so you just reuse it authorize is going to create it right so uh now that we have the configuration in place right uh there's a couple things that i would like to show the first of one is that you can always use that command test and config to check if your configuration is okay and that seems to be the case but also you can actually change the output as well so uh remember when i've when we've done the same test when it was communicating with elasticsearch rather if it was on pram or in elastic cloud so the output is actually the component from the file bit architecture that actually uh implement this behavior of checking the out the connectivity with the endpoint right so in this case this was a total different set of tests that were applied to now the kafka cluster so this is another thing that you have to keep in mind like not all the tasks would be the same it depends of the output that has been configured and your file bit configuration by so everything seems to be okay so before actually running file b i'm going to do something strategic here since we are going to send this all these logs that we have right to this topic called file bit topic i will actually uh create a new terminal over here to execute a cli common line 2 from kafka that we can actually inspect all the messages and records that are being sent to that topic so we're gonna get this command line ready for by the time when file bit starts we can actually measure if we're at receiving or not messages in kafka so the uh the command line is called console consumer this is something that comes with kafka so if you ever have a kafka distribution in your machine uh just look for this common line and you have to specify the what we call the bootstrap server which is the server where you are one of the servers from the cluster that is going to be used to actually discover the other uh broker so you just have to specify one or multiple servers separated by comma and obviously you have to specify which topic to read from so in our case we're gonna read the topic five beat topic and just to be on the safe side we're going to actually read this from beginning so that means that even if you had it ran this command before you're gonna force the cli to actually read all the data from from the topic since the beginning right if we execute this now we are we're going to be able to actually connect with the cluster right just like here but there are no data right so the topic itself is empty why because we didn't start the file beat yet that's what we're going to do now so let's start the file bit configuration but one more time before starting i'm going to delete the data folder right i know that you probably are extremely annoyed by all this delete data folder so far but if you think about it i've created the desire for you to understand what that means so that's what we're going to see next so just bear with me now that we have deleted the data folder we can start the file bit configuration one more time we are actually inspecting here the topic and we are going to execute the file bit here one more time and we should start seeing messages coming in here we go and the kafka topic so that's the proof that even though we have also the the log here from the java stack trace remember because one of the harvesters were dedicated for that prospector and as you can see here we very effortlessly we were able to actually change the output of your file beat and replace for this output that points to a kafka cluster but the idea is that it could be any other output so you name it you want to send data for cassandra you want to send data for other technology messaging technologies other databases any output that is compatible with file bit can be used here why because one of the key characteristics of the file bit architecture at there is convention for everything so if you're a developer a go developer that knows how and wants to create your own output you can simply follow the conventions and start writing the code and you can actually use your own implementation with your file bits now let's discuss some aspect related to how file b transmit data into the configured outputs file bit has this concept of ensure at least once delivery what this means is that filebit we use some techniques to ensure that whatever data has been gathered from the sources would be written into the destination at least once how firebit does this is by using a file called registry file this file is written into a folder called data that's why i have been deleting this data folder so far to make the experiments in this episode to fully understand the behavior of this let's check how this data folder actually influences the behavior of writing or not into destinations so what i'm going to do now is to stop file bit one more time and i'm going to re-execute it again but this time i won't delete the data folder so those are the messages that has been written into our kafka topic so far so presumably by the time i re-execute file beat again no new messages has to be actually written into the kafka topic why because the registry file kept track of all the messages and data and logs that has been working out so far so the behavior that we want to expect is that no new data actually be rewritten so let's re-execute our file bit one more time and as you can see here uh although filebit is running again no new messages are sent to the kafka topic and why this happens because it is important to ensure consistency in terms of the duplication and data that can be like sent twice or more than twice and that can affect the behavior and consistency of the back-end system that will receive that data right so that's why file b takes care of making sure that information will be kept so what you need to do if you want to force file beat to send all the data that was sent before just delete the data folder and then it will force file b to rescan the harvesters we rescan all the files and send each line from these files there are some knobs as well that you can use to control the behavior of file bit regarding how many times it checks for new files or perhaps for an existing file uh how many times or how long it takes to actually check for new lines of that file so the knobs are going to be uh first of all from the perspective of the prospectors that checks for new files there is this knob that basically says how frequency you you're going to scan for news files so just put this scan frequency property here in the uh in your file bit yaml file and if you want to customize the behavior of this by default it prospectors scans every 10 seconds but you can actually change this either for increase or to decrease just keep in mind that it's not appropriate to set anything that is less than one second because otherwise you might increase the cpu usage of your machine also there is an option for not for the prospector but for the harvesters right for when to check if there is a new line on the file the behavior of the hardware search is too by the time it is instantiated it was run and read every line of the file and by the time each line is read it first check on the registry file if that line has been sent already if not it will continue to process and transmit the lines for the back end for the output that has been configured however eventually the the harvester will reach the end of the fire right so there is a configuration that you can use to turn and change the behavior for how frequently the harvester will check for new lines by default it checks every one second but you can actually set this property here called back off which is the time that the harvester will wait until it checks again for new files you can actually increase the time by how much the back off will actually takes place in terms of checking your files so you can also use the back off factor to give an extra back off behavior for the backup also there is the configuration for the max backup but that basically tells for the harvesters how much in the maximum they have to wait and to check for new lines of the file speaking about harvesters there's one thing that you have to understand uh if you understood the concept of prospectors and harvesters so depending on the number of files that were discovered by the prospectors you you will end up with multiple harvesters that let's say you have a folder that perhaps has like a 1000 files right so pragmatically speaking you will end up with 1 000 harvesters that will take care of each file right that wouldn't be a problem if it wasn't for the fact that certain operating systems has some pre pre-built or default limitation regarding the number of open file that you have available so that is a configuration that you have to take in mind if you see that the file beat is not like handling the exactly amount of files that you currently have you might need to turn up the number of file handles that your operating systems supports also the registry file the file that actually takes care of the consistency of the the file beat and the data from the file beat it may end up being too large depending on the amount of data that actually is transmitted to the back ends remember that registry file will consolidate lines it's not necessarily the whole line but an entry a pointer that corresponds to the line that has been read for certain files so if you have 1000 files and each file has a line so that means that at least 1 000 lines or entries will be created on the registry file so it's a matter of measuring how much files and each lines of those files that you can have you might end up with registry file that is too large if that happens and your operating system or file system is not catching up with the size of the file you can use some extra configurations such as clean inactive or clean removed to force file beat the agent to remove lines from the registry file that is no longer interested that will help you to manage the behavior and the efficiency of filebit while reading your vlogs i hope this episode has helped you to understand a bit more about how 5bit works if you ever get in trouble using filebit i highly encourage you to visit discuss.elastic.com and go to the category that is focused on the beats family you might be surprised about how much answers to common problems are existing there also if you ever have any idea about which other content you would like to see here in the community channel please send us a email so we can actually uh take your suggestion in place and produce the content that will help to understand better the technologies from elastic this is it for today and see you next time
Info
Channel: Official Elastic Community
Views: 8,710
Rating: undefined out of 5
Keywords: Elastic, Elastic Stack, ELK, Elasticsearch, Logstash, Kibana, Beats, Filebeat, Logs, Elastic Cloud, Redis, Kafka
Id: ykuw1piMGa4
Channel Id: undefined
Length: 67min 9sec (4029 seconds)
Published: Sat Feb 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.