Open Source Logging: Getting Started with Graylog Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
tom here from lawrence systems log management it is well not the most fun part of sysadmin trying to solve where to put all the logs how to get them all from many locations into one location so you can consolidate them parse them go through them and create some actionable intelligence from it and those tasks well they can be daunting with all the logs that all the many disparate systems we have generate and the solution i came across that i really like is an open source tool called greylog now they do have an enterprise version and we'll talk about the differences but i'll be using the open source version in this demo today and show you how to kind of get started with it which is relatively easy they've even made ova files so you can just download a vm they've got a docker image and their instructions that i followed to set this up on a standalone debian server well they were pretty easy too we're going to talk about how to ingest some logs how to configure a couple different servers and kind of the process flow and how great log works this is in no way endorsed or sponsored by graylog i'm doing this completely of my own accord because well i want to share a really interesting tool that i use to solve my log management problem with all of you before we dive into those details let's first if you'd like to learn more about me or my company head over to lawrences.com if you'd like to hire a short project there's a hires button right at the top if you'd like to help keep this channel sponsor free and thank you to everyone who already has there is a join button here for youtube and a patreon page your support is greatly appreciated if you're looking for deals or discounts on products and services we offer on this channel check out the affiliate links down below they're in the description of all of our videos including a link to our shirt store we have a wide variety of shirts that we sell and new designs come out well randomly so check back frequently and finally our forums forums.lawrencesystems.com is where you can have a more in-depth discussion about this video and other tech topics you've seen on this channel now back to our content all right the first thing i want to cover is open source versus enterprise yes they do have an enterprise version it comes with support and licensing and we're going to be doing all of this in the open source version which does include quite a few things and particularly ones that i was just asked about the other day was alerts and triggers do they work with the free version of graylog yes they do i think that's an important distinction and that means you can not only ingest the logs you can create some data on these and create a trigger point by which you want to send notices goes to the law scope what we're going to cover in this video this is really just getting started but i wanted to mention that's available now if you're doing only five gigs or less a day in logs they do give you free the enterprise version as well which has a few more features which includes the offline archiving which is pretty cool and if you're working with a team of people it's got uh user audit logging so you can figure out who looked at what log and you know sometimes for compliance it's not just about logging it's about logging where people went in your logging server so they do have some advanced enterprise level features that come with their enterprise version and i don't know how much it costs i have no idea because like i said we're gonna be doing this with the open source version they also have an entire library of data sheets case studies webinars tech talks et cetera and it will give you ideas for example um of how the dashboard looks like if you want to build a house case really cool dashboards they have a bunch of tutorials i wanted to do this video because it give an overview of how to get started and maybe something even though i went through a lot of this i didn't really understand until i started really using greylog was how to build the data sets and how to get the data in to gray log so this is like i said more of a getting started video but they do have quite a bit of documentation now let's close these here and talk about actually installing gray log it's rather simple to install goes out of scope of this video i'm not going to walk you through the installation and i just don't feel it's necessary because you can just go here and grab a virtual machine file they made it that easy so if you just want to play around with this no commitment to a bunch of time loading a linux vm you can just download something like virtualbox um vmware virtualbox import the ova file and you're up and running they also have ubuntu debian centos suse chef puppet docker amazon web services any fully manual setup so if you are a fan of building it or pulling a docker image and putting this together pretty straightforward there for a quick start i did this one right here which was follow the debian installation now my production environment versus the test environment i'm going to be showing today which is right here for the test environment i did not bother putting a proxy in front of it that is an extra step you can do to put something else to handle certificates in front of it because by default it just runs on port 9000 i chose to just leave it at this because this is a demo that's going to get destroyed but yes that's why it says not secure for those wondering there is another step in here about putting in something like nginx or whatever reverse proxy of your choice in front of it if you'd like to you know probably because you want security on there now let's actually talk about the process flow because this is the part that confused me a little bit and we're going to go over to this little flow chart that i made this was the missing piece at least in my head and maybe this will help you understand how they ingest data and grey log they have a couple different visuals in there about how things are parsed but i wanted to break it down this way to kind of explain things because this is what was a little bit troublesome for me when i first got started and we're going to be doing all this with syslog but it doesn't matter what your external log data is there's quite a few different formats that it supports we're going to be using syslog for all of the demos today but we'll cover some of the other ones that are supported in there we take the syslog data we have to define an input in the system so we take that defined input we have the option to add an extractor now extractor seemed kind of unusual but it's a parser essentially and it's a parser that can use grak it can use regex and it takes the data then parses it into fields and basically organizes into the elastic database but you don't need to have an extractor you only need the extractor if you want things put into nice organized fields so yes while they're great to have and by the way you can have mixed content you can have an extractor to extract some of the data but some of the data can remain unparsed so it kind of goes back and forth and the extractors get attached essentially to the input and we're going to walk through all this i just wanted to kind of define a workflow so we have an extractor for example on my pfsense that extracts the filter logs the pf sends into parse data then we have no extractor on the unifi one that we have in here and it just has a bunch of unformatted data now the unformatted data versus formatted yes you can search any of the data whether it's formatted or not and you can write your own extractors and i will leave you the extractors that i have posted on my github for my pf sense there is a trickiness and there's a few of them out there the ones i will leave you with you can just import from my github and they work perfectly fine we'll talk about how to put those in now once you have that parsed and unparsed data then it needs to go over to an index or it can go to a stream that's why this is kind of an or if you don't create a stream and it sounds like the stream should be part of just the way you stream data in but it's not a stream takes and you build rules around a stream and those rules land it in a different index if there's no stream everything gets to the default index just everything lands there that can be a little bit tricky with greylog because you've dumped everything in one single index each index has its own retention settings and parameters so you can parse it still even if everything's in one index but the index let's say you have a 30-day rotation everything has to follow that index 30-day rotation but if you have different retention for different servers you really want them each in their own index not just for the filtering but so you can have retention that is relative based on how the stream parses out the data and lands it in the index but i just want to cover this part real quick to kind of give an idea and an overview it may seem confusing but hopefully this will alleviate some of the confusion later on as we create these and we're going to walk through the process of creating another input in addition to the two inputs that i have so i can show you how all these inputs work now let's look at the inputs that are existing we have one for pfsense if since you go to the logs go to settings and my greylog server is at 192.168.3.200. so 3.200 is the ip address and then we have the syslog right here port number so it's 1514 is the one i chose let's go back over to the greylog server we're gonna system we're gonna inputs here's the pf sense input more actions edit input we called it pf sense bound it to this ip address which is 000 which means bound to all ip addresses we defined 1514 as the port now 514 is the default syslog port i put a one in front of it to make it simple you can use any port number you want but remember to deal with firewall issues such as do you have ports under 1024 allowed open by the greylog service depending on the permissions you have running that could be something that hangs you up so i'm choosing a higher than 1024 port number and i know i have port 1514 open this will kind of depend on the configuration of the server actually this particular server i've got the firewall turned off because just makes my life easier for doing these demonstrations now port 1514 allow overwriting data allow to override data current date if not parse yes store full message yes give me the whole message and store it now pretty simple as far as that goes for creating these and when we go over here to manage extractors this is where the parsing comes in with the pfsense so if we actually want to look at the partial we can hit extra extract and it's a json file now i actually have this same one that i'll leave a link to in my forum so you can just copy and paste this in because you can literally copy the extractors or import them you just paste it in here so any extractors that get built they can be dropped in i only have a couple in here right now um they're a little tricky to write because i'm not the best at regex but you know if you are you can actually manually write these and there's a marketplace they have where you can find a lot of these in here it can also be done with more than just red regex they can be done with grok and a few other things in terms of how you want to parse the data but this just takes the data that comes in and if it matches the pfsense filter log that's all this is filtering not all data that comes in pfsense it parses it out into the fields and this is what that looks like in action is just a string match so filter log and it parses that out into this and then right here matter of fact uh this one's udp so i'll actually change this that's just a sample you can load any log you want but this is just for demonstration we can get you an idea udp and you can see this matches and then it assigns all these right here which are all the different fields it's just basically parsing as a csv assigning the field names with the delimiting character and putting all the data in there all right let's go back over to the inputs again so that's how the pfsense comes in now the unify one no extractors we're just grabbing raw data from the unify and like i said we can still parse it let's go over here to edit the unify input port 1515 we go back over here to the cloud key and port 1515. so this is what allows the data to flow i've actually got debug logging turned on just because i needed more data this is a demo lab i set up in unifi just for this particular video because we're going to actually show how the correlation data works and how we can find things within it now i mentioned the streams where the next step so the data comes in here and goes into streams before you can set a stream up you need to have the indexes so here is the in dice and index sets here is the default one and i was doing some testing so there's actually a lot of data in the default one because if you just create something and turn it on it'll just land in the default index until you parse it out with the stream i have one called unify and one called pf sense so there's two indexes here and two indexes here because it's doing a rotation now we're going to actually create a new set in a second here and i'll show you how the rotation is handled but you can actually edit one of these go down here by default i have this set up to i don't think that's the default we'll look at the default when you set it up right now this is set up to one day uh rotations and after 30 days delete anything older than 30 days you can actually change that by going here instead of same time we can say by size for example we can say don't let this index get more than one gig what happens it rotates the index and after so many index rotations of one gig it purges the next one there so you can set it by size you can set it by date you can set it by message count only hold this many messages and then afterwards delete the index this right here though where you have the option to archive the index this is one of the enterprise features so archive creation is disabled um once again i said we're doing this all with the open source version of it that's why that happens there we're not going to make any changes this pf sense one is the same the same setup i bought both of these up to a rotation of once per day and a number of indexes to 30 and after you're done delete anything older than 30 days because that's all i really need to keep now we go over to the streams and this is where the filtering happens so there's one called pf sense demo and unify the messages that come in from these we'll go look at the actions so we'll click on more actions edit the stream i called it unify unified demo and this is the index set that this one lands on now the relationship is where the confusion may come in is how does that calling this one unifying demo make it actually land on that index that's where the rules come in so you hit manage rules and there's a lot of options you can do for these rules we're going to do the most basic and that most basic is we're going to open up a another window here called inputs and we'll go to the unify we'll hit show received messages it's just one of the fastest ways to do this zoom in a little bit and we see right here it says gl2 source input and we have this long number right here which we're just going to copy let me go over here zoom out so it's not quite so hard to read and we'll look at what this rule does gl2 source input and by the way it auto completes as you start typing so gl2 source input match exactly to this so what this rule says is when the messages come in if they match this gl2 source input we will take them and send them to that particular index this is what i was talking about right here where we have the stream data if you don't have stream data it lands in default if the stream matches a rule that we created such as gl2 source index equals those indexes that equal are going to be what streams it over there but you could also create rules so we could add stream rule and because this parses out so many different things that we have in there like icmp data ips specific data so filter log data that i have tcp flags for example you could create a rule create a new index to say parse these and send them over if they match this so you can create different indexes with very specific data this is just a really nice feature and this is some of the flexibility i really like with gray log they give you very granular control it seems like a lot but once you kind of get the flow of it you can start parsing certain things and maybe you want to keep certain data longer than others or just put all the data in one particular index uh for convenience and logging sake so these are really nice features they can do that we're only going to create the one rule now we go back over here to the streams and we did the same thing here for the unified demo so pfsense demo manage rules there's that gl2 source much must match exactly if we go over here to system inputs and we look at the pfsense you can just click again on the show receive messages it's the fastest way to see what the gl2 source is there's other ways you could find it but i just found click show messages it puts it up on there by the way let's go back over here to search where we actually see all the log data and we can say search all messages within the last 30 minutes here's just raw all the data coming in by default it searches for everything and that search query can be put just in here and it will instantly filter for that anyway so if you did want to just filter for one particular thing the search query language is common across wherever you want it to work and etc from there but now let's talk about creating a new one so here's what it looks like for the existing ones and let's create a new one i want to actually take my xcpng server and ingest all the logs from that so we're going to go here to greylog first i want to create a place for it to land let's create an index set we'll call it xcpng demo whoops all right here's our xcp and g demo server analyzer standard index message count we're going to do this one by size so i don't want these to get over one gig and i don't need that many there's a lot of logs created by some of the virtualization servers they maybe i don't need an archive of them that's kind of an up to you thing but once again this is why we want to put them in certain buckets because well storage is not unlimited all right we've created the place to land it so we've created the index now we've got to create an input so if we go over here to inputs we're going to select our inputs now i said we're going to be doing things in syslog but if you want to do a lot of different options aws flow clog trail kinesis aws logs an older version beats beats deprecated there's actually a handful of different options and you go here to find more inputs they have in a marketplace people who have wrote other functions and other features that you can grab off of github it's actually kind of cool all the different things in there's a lot of support for things like let's say payload palo alto networks url filtering download from github and it's just a link to the parser you can paste in there so they give you quite a few things on the greylog marketplace so you can kind of find a few different things but that sometimes can create a problem because anyone can post in there so if we search for pfsense there's a few different content packs for pfsense and i'll leave you the one that i'm using because they're not all the same and some of them are a little bit older that may have a little bit of issues working but if someone finds a better one there's a great discussion in my forums and if if you link to something that's better than the ones i share please let me know because i'm still learning all of this even though i've been using it for a little while to parse data so we've selected the input and we're going to choose syslog udp launch new input xcp ng demo what's the port we'll go one five six six i just like that number storeful message really straightforward we gave it a name we set a port up unless these are something you need to tweak for the most part the default should work other than in special circumstances so now we're ready to start receiving data now before i receive the data because it'll just land in the default bucket i want it to land in that new index instead of the default index let's create a stream so it goes there we're going to do that by show receive messages open in a new window there's that gl2 source input again i'm going to go ahead and copy it streams great stream call this xcp and demo again and default index set no we want to put it in xcpg and demo and remove from all message stream yes i don't want it going in both places but there may be times you do this is one of those things where i said the diversity of the streams you can create a stream that parses out some data out of another index drops it into another index but you may want to leave it still there so you have all of it one place but then you have that extra data somewhere else just one of those options of why that lets you do that so we're going to hit save again we're going to go manage rules and we're going to add a stream rule gl2 source input paste that value that we have right here so i left it open in another tab in case i forget it but that's the value put that in description if you want hit save now when the data comes in it's going to match that rule because it comes from that gl2 source which is the input source we defined and when we go here to inputs we'll see the data flowing through here there's no extractors again we didn't do any type of extracting we're just going to let it go in flow with unformatted raw data so now we're going to go here to xcpng and put the server in which is right here on this page which is the host page specifically remote syslog server 1566. so 192 168 3200 colon 1566 and now we can start ingesting logs from here and let's start making some logs by turning on this this is the windows 10 vm that i have in here and we can actually probably see right away some data moving yep 1.3 kilobits let's go over here to search there's all the data and let's filter it for just this right here clearly i did something wrong because there's no data in here let's close this real quick and say let's search for this particular uuid which was the uuid we copied from here okay the data is going to default index set so i clearly just messed up the stream let's go back to the stream oh i didn't start the stream whoops there we go start stream go back over here to search go back over here to xcpngdemo there's a little play button right here to say update automatically and okay now all the data is going once we started the stream some of that data is now left over the default index but here we are and if we go back here and copy this again let's filter for only data related to this particular vm so we'll go ahead and stop it and we can see it now parsing that data and it's only giving me the data for this it's pretty easy even though the data itself is unformatted so it's all just dropped right here into the message format it's not parsed out into individual fields but you get the idea that even with that i can still index this and sort it and start understanding the data that's flowing through now let's go over and talk about the unify side so here's the data for that but we also put in the unify system and the efsense demo system here oop delete that hit this it plays so it automatically updates now some of the data such as the filter log data i'd mentioned we have that parsed if we look here at the parse data for this it's all nice and organized not just dropped into a message so we can say protocol tcp and we can then filter so we can maybe build statistics on it or add to a table we can actually say show top values there are udp and tcp so we have 1200 over this time period actually let's change the time period a little bit further started updating again and you can see how we can start slicing the data up there we go and we have that many over the last day and because we told it to keep updating it's still changing these counts as things go on now my phone's actually connected to the firewall behind this so we can probably open up stuff on my phone and let's do an update if there's some app updates and this will probably make it go that much faster with all the connections that are going still not too much but like i said this is a lab environment so that's what's on there speaking of my phone let's figure out what i p address my phone is by going over here firewall actually dhcp server under services and right there is tom's phone 192.168.40.138. let's go back over to the graylog interface i've got two of them open kind of start this over and we're going to find my phone's ip address and everything it's doing but right here we filtered for the ip address and once again here's the filter logs here's places it's contacting i bet that's google if i had to guess where it probably looked up because i'm downloading some updates and source ip so let's actually filter for add to query so let's instead of just filtering free form we can go through and say source ip just for this particular piece of information in here and you can see it's going through the filter log and pulling up all that information now let's actually go specifically to the unify and we'll query this again actually get rid of this turn this on to update and i'm going to take my phone and uh drop it from the wi-fi and put it back on we turned off wi-fi on my phone we actually see in the log here it's going to tell me that it came off and now we put it back on so now we can go through and say all right what's connecting to here and oh right there i can capture the mac address and the ip assigned to it so right there's the mac address for my phone so we'll go ahead and hit copy we're just going to paste it up here show me the mac address now let's add in the pf sense again then update the query from here now we've got logs from both pfsense and from unifi going in one spot and this gets kind of cool because this is that correlation data from two sources especially and i bring these two up because i know it's a popular question that comes up is when i'm having a dhc problem looking at the logs from both devices the logs that are going on inside of unifi and the logs that are going on in pf sense simultaneously and walking through each device and parsing it out this is where you can kind of get the information and consolidate it to try and troubleshoot some types of networking problems because if you didn't see the logs because we have a discover request right here from dhcp we actually start with right here the handshake so custom wireless event i think we go a little further there should be a wireless event that is the actual authorization so let's go down area here status code w e0 all right found it right here zoom in a little bit here is the first piece of this where it started the authentication for the wpa key you can then start with the surrounding messages for up to 10 seconds around this open up a new window and start organizing the data zoom it out so it's a little more on the screen and start organizing the data around it to try to sort out what happened this is some of the ways i've really grown the love grey log as it allows me to do this relatively quickly and find certain pieces of data i'm looking for whether it be wi-fi data whether it be data in a syslog server from something else such as apache from a web server putting all the logs in one place and be able to follow like from someone logging in on a vpn and figuring out what other servers they may have touched internally is great from a security standpoint and also from a troubleshooting standpoint so hopefully this kind of gave you an idea how to get started with greylog i will mention um there is a whole bunch of dashboards that can be done i've done a little bit of testing with it such as building source dashboards and all kinds of analytics i'm not great at it maybe in a future video i'll cover that but they do cover some more of this in their own documentation of how to set these up and how to build things like messages per source based on source of the filter logs etc now also beyond the scope of this and something beyond the scope of me at the moment this will do threat intelligence looks up on your firewall uh that's something that at some point i might get a little bit more into is actually using this for a deeper level of threat intelligence now the last thing i want to touch on is hardware what we have it running on and how it runs this is a xcpng server with an intel xeon e5 26070 nothing outstanding in terms of processor this is an older r720 server this particular machine is running on because it has a lot of storage to be able to store all the logs right now we have about 120 gigs worth of logs in our server and go over here to the console for stats i was playing around digging through stuff but generally speaking if we look over the last week it just does not use a lot of processor it's not all that intensive on the processor now let's actually talk about what it looks like to pull some data right here we have h-top pulled up and you can see the load average is relatively low and here's the interface which we're looking at uh roughly four to five million logs coming in a day over the last 30 days let's find my ip address inside here 168.3.9 we're going to search back 30 days worth of logs and everywhere that it has a log where my ip address did something you see it's updating results you can see the processor and the load going up a little bit while it does the query and 30 days of results took a few seconds to do and you can see what days i was here not here i was really doing something on january 20th to generate quite a few logs that particular day but it's not a problem it's even faster for doing things like when you say let's just say what did tom do in the last two hours that's pretty much instantaneous one day pretty much instantaneous 30 days well we've done it is now indexed so it was able to pull it up relatively fast we haven't really had any problems running it in production it seems to work quite well i haven't had any crashing issues or anything like that at all with it and it's not on the highest end of hardware that we have in our office i did that on purpose well one because i had a lot of storage on that r720 server and two i wanted to see how it handled or does it need something really high-end to run and it turns out no it really doesn't seem to i've even with the amount of logs that we ingest on it and as much data and is going through the rotation because we mostly for most of the logs are purged out after only 30 days that's all we need to keep for certain logs we have longer for others et cetera but uh yeah the system has been pretty stable it survived a couple updates and hasn't given me any trouble that's why i wanted to review it because the question comes up all the time about what do you use to ingest all your logging server and i know there's a lot of people with home labs that are looking for you know kind of an easy setup log server and it gets you in started in that logging so this is actually something i really recommend for people if you want to start out using your home lab it's relatively easy to set up you can just grab the ova file and you start just finding all the data that you have and all the different things you set up and just start piping it all there let me know in the forums because i'll leave a link to the json file that i have for my ipf sense but if you find some other really good use cases or extractors and other things that i didn't cover in this video that you would like me to do a follow-up video on let me know i will leave links course to their documentation it's pretty easy to go through that they have a lot more tutorials on some of the once you have it set up that's why i wanted to make this getting started video to kind of try it out learn with it test it out and kind of wrap my head around how to get the data into it but once you have the data into it playing around as you can see is pretty easy to kind of sort out how you want to parse the data and just figuring out what dashboards you want to build and figure out what statistics you want to generate and of course figuring out the extractors i'm not great at regex i'm still going through all that so if someone has a good suggestion on that or wants to help write some i i would probably hire someone to do some of that um it depends when you're watching this video whether or not i've got that accomplished or not but leave a message i'll leave a link to the forum post that'll be the place to post that information thanks and thank you for making it to the end of the video if you like this video please give it a thumbs up if you'd like to see more content from the channel hit the subscribe button and hit the bell icon if you like youtube to notify you when new videos come out if you'd like to hire us head over to lawrences.com fill out our contact page and let us know what we can help you with and what projects you'd like us to work together on if you want to carry on the discussion head over to forums.laurensystems.com where we can carry on the discussion about this video other videos or other tech topics in general even suggestions for new videos they're accepted right there on our forums which are free also if you'd like to help the channel in other ways head over to our affiliate page we have a lot of great tech offers for you and once again thanks for watching and see you next time
Info
Channel: Lawrence Systems
Views: 65,330
Rating: 4.9579225 out of 5
Keywords: lawrencesystems, Graylog, graylog tutorial, graylog dashboard, graylog windows event logs, graylog tutorial for beginners, graylog 4, graylog install, graylog streams, graylog https, graylog search tutorial, logs, elasticsearch, linux, mongodb, ubuntu, monitoring, log, server logs, pfsense, howto, debian, suse, java, networking, systems, centralised logging, open source log analysis, centralised logging for microservices
Id: rtfj6W5X0YA
Channel Id: undefined
Length: 34min 58sec (2098 seconds)
Published: Fri Feb 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.