Splunk Configuration Files : Search time field extraction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay today we'll be discussing about field extraction in Splunk using props conf and transformed corner of configuration files okay so field extraction process we can think about is like telling Splunk how to extract key value pairs from our raw events data so that in the interesting field portion those fields will be coming right so that we can use it in our search query for for building any any kind of result set so so if we can think about it the overall field extraction process can be subdivided into two parts one is the field exertion can happen during indexing time or the field extraction can happen while searching as well so for that I actually made a cool diagram using a mind map tool so let me let me zoom it so if you see it here the overall field extraction process is divided into index time field extraction and search time field extraction right now index time field extraction is generally not recommended by Splunk because there are some kind of performance bottleneck over there so Splunk generally recommends search time field extraction but in this video what we will do we will first discuss about the different kinds of search time field extraction then in the next video we'll talk about the index time field extraction now all the configurations can be stored in either the props conf or the transform don't come in both both of this scenario and you may you may see lot of configuration this is the this is the these are the exhaustive list of configuration it can support so it can some of the configurations suppose like match limit it can appear in props conf also it can appear in transform dot confess will based on certain scenario well I'll talk about that one as well now now the first thing come into picture when we talk about field extraction is we need to give a regular expression right when you tell Splunk how to extract the field from a Roy vintage' so based on that like where the expression we can write the main configurations we can think about for field extraction is this transforms configuration report configuration and extract configuration now this all three are doing the same stuff is basically it is allowing us to write a regular expression now the transforms configuration is only applicable to the index time field extraction okay so we will not discuss this one today the extract and the report configurations are basically this search time field extraction now if you think about this this both are supporting regular expressions now what is the main difference between them is the lippert configuration requires the transform cons because the configuration in such a way that you have to give this word report then in name this is called the report class name any name you can give but there is some restriction and then the configurations tangen in in the transform conf okay so the report this configuration cannot exist without a corresponding configuration in the transform constantia okay we'll see that one in a demo as well the opposite is the extract one where you can just give the regular expression in line you do not need any kind of transform conf so you may think this this two are the same right so why we need this report expect only can do the all these things the main idea behind creating this report is report is like you can have your own regular expression in different different strands in a separate file calls transform coms right that means those regular expressions you can reuse it that's the one of the main purpose behind me and creating this one okay now as you have seen there are a lot of other configurations as well we will talk about them when we will reach that point now for overall demo purpose what I have done is I think total 11 demos today so any props not gone file in the search app I have placed this Bob Scott Caan and also the corresponding transformed country will be discussing one by one okay so and then restart it as Splunk so that I no need to do it do the spunk restart again and again it's gonna take time okay so this brings us to the first demo we'll be discussing today about this report configuration okay so for that I have a file called data free if you remember from the from the event line breaking video I have posted before we use this data three files right this is a normal XML events if you see all the all the different fields are stored in this message tag there are different messages take so we we have broke our event from here after this message is tagged right and inside that there are a lot of fields value given field value in the different XML tags okay so I have to keep this data very file same so that we can we can refer it and if you also remember we have used this prop settings to break those raw data right and there is a reason I have saved this XML file to dot txt file because if I save it in XML file Splunk will auto-detect so that I won't be able to demonstrate what I wanted to do it now okay so source form for that so so the props conf props out con name the standard name is demo so now this this is for things we have discussed in our event line breaking right so I am it is I won't be discussing it now so the new stuff we have added over here is the report extraction and after report there is a class name you have to give over here okay so now it's not as ready to give it within the angle brackets it just I have given it over there but you can you can give any name over there okay and the name of the transform conf file stands a name so if you see I have given a XML section over here right and then the tents from cons this is the external extraction stands over there so I have given the rejects so this is how you give a regular expression right into the into this re GE X radix name okay see if you if you if we just see this regular expression so for that what we'll do we'll go back to our regular expression engine right and then what I will do here is in the data three I'll just copy this one of these events here we are making this event right and then what I will do is I will copy this regular expression now if you if you think about how the regular expression works you can check out my regular expression video I have made the exhaustive video for that so what it is doing it is doing not no fancy stuff what it is doing it's basically creating different different groups if you see here so basically my intention is to I should get this value for this thread field similarly for location fields I want to get this value message key class field I want to get this value something like this okay so basically it is extracting the thread location message key class message key value okay and I think message is that it is not extracting the parameters one if you see here so if I show you over here thread location message key class and message Q value and message okay now if you if you see in this plank right in this plunk regular expression we need to in to have a named group right named capture group so that Splunk can create those field names whatever group name you are giving here like this right so in in the transform conf also you can give similar kind of regular expression so when you have given a named group Splunk will automatically create these fields and and apply that regular expression to extract the field values okay so let's see how it work now there are other ways if you if you don't give the named group name there are other ways to ways to formalize this regular expression as well as demonstrate that was better in this video so for that to check it out how it's working I will go to my Splunk right I will go to settings I will go to add data from there I click on upload I'll select that data 3 file right so we'll go to here so I will go to click on data 3 files and click on next now to break this whole raw data I will choose the source type as demo because that's the props can't stand your name correct so I give demo here ok so if you see this breaking those events into those messages dead while clicking on next I'll click on review I will click on submit start searching that's right now if you see I am running a search over here right so the field extraction will happen when I am running the search but compared to search time field extraction during index I'm field extraction this field extraction happens when you are indexing the data that means this field X this this extracted fields are permanent this extracted fields are not permanent this will be only come into picture when you run a search okay so so as we have as we have seen from this regular expression so it should create this field values so if you see let's say it is creating thread with this values this is getting message its kissing message class it's creating message key values correspondingly okay so this is how the report extraction works let's move on okay so we have seen how the report extraction works now in the report extraction there is another cool feature called we can give a more than one transform con stand your name that means you can apply more than one regular x Bishan on your raw data so so for that you need to give the comma-separated transformed constant Jeanine if you see the parameter extraction transform conf whatever I have given here is if I I have discussed it over here it was not actually extracting this parameters field right so now for that I have given a regular expression to just only extract the parameters field if you see it here okay so that means my first regular expression will extract all other fields the second regular expression will be applied after that which will be extracting this parameters field correct so to demonstrate that I will just delete this we will reindex the de tagging so that my source type will be changing because I have applied this source type here as a demo one right so if I apply the demo one then only this parameter extraction will be successful settings add data uploads select file data three next I'll choose demo one now okay so it is breaking these events properly reviews of me start searching okay now if you see it should be extracting this parameters value as well okay so this is how you can apply more than one regular expression using the transform that count file let's move on let's move on to demo three where we will be demonstrating the extract option right whatever I have discussed it over here so if you see the extract is the inland inline transformation right so you don't need no need any kind of transform Khan file here so that for that demo 3 that's why I have not given any the stand any of the transformed calm I have just given this full XML extraction whatever we have done it over here the regular expression same regular expression here as means it should create all this field except the parameters field right because I have not given any parameter vi parameters extraction where we are so to demonstrate that if you see parameters is here I will just delete it now we'll choose the separate source type now and go to settings add data upload select File data three next so I will choose the trans prompts conf demo 3 now d3 ok so it is breaking next review submit start searching so if you see now it is again doing the same stuff whatever we have done in our demo right it is creating the three message key class message key values and messages it is not extracting the parameters 1 so that means it is working as expected so this is how the extract configuration works okay so here also you can give any classes name over there ok mmm now let's move on to the next one demo for where now think about it we have given a regular expression right to extract the fields now Splunk also provides another option called kV underscore mode okay so if I if I show it to you here this is the one we are talking about now kV underscore mode so it's basically works well when when you have structured data so I have a clear this text file for that this one I have I think taken from the Splunk documentation so kV underscore mode can have different values like none auto auto escaped multi okay XML and JSON okay so none means generally they are by default the kv mode generally Splunk breaks the events based on the equals to suppose if we have N equals 2 in your event so the whatever coming before equals 2 it will take this a fill them and after that it will take this as a field value okay so to override that you can give XML so that Splunk can automatically extract the fields from XML you can give Jason as well so that it can automatically extract the files from Jason in order to write any any kind of radix for that right now similarly for multi also if you if you if you remember about the multi kV command I have discussed previously so for that it can breaks them those Stabila event automatically we need to use the multi kv command over there if you give this transform over there okay now for auto generally the this is the by default behavior what I have said like it separates based on the equal signs and the auto escape it also gives you a extra power to escape some of the characters like single quotes and slashes so that you will have better representation of your data or here right so as we are dealing with data three lets xml events right so i will be choosing the kv mode as xml so in the props count in the demo for that's why I have just given the XML here there is no transform needed here as well we transform leader only for the extract and reports okay so let's see how it's behaving the behavior will be slightly different but ultimately it is doing the same stuff okay so you go to settings add data and click on upload select file data 3 again I click on next I'll choose demo for now because this is the kv mode now they move for okay so it's working properly next review submit start searching so if you see now the extracted fields names are little bit different now so to demonstrate that let me show you so if you see it here all my different values are wrapped under this message is tagged right so if you see all the different key fields it has been created all serve messages dot that fill name okay this is one of the difference between your own regular expression compared to the auto extraction using xml or json okay so it is creating this level field right time stem field everywhere whatever it is able to break this whole butt event using a tags it is it is able to extract those fields right so key class key value something like this when if you see here for these parameters right mmm if I show it to you here this is our parameter side inside the parameters we have another key value pairs using this one right the the attribute names it is also extracted those things like this okay so this is how the auto extraction works the similar stuff works like for Jason as well okay hmm let's move on to the next demo where we will be discussing about the field Elias so field Elias now we are discussing about we have discussed about kV mode mmm now with kV mode before we discuss about field Elias with kV mode there are two other configuration comes up okay which is called auto kV Jason and kV trim spaces okay now auto kV Jason is basically like if you have Jason beta if you put it to auto ke with Jason equals to true it can take either true or false so in that case if you have Jason like structure Splunk will try to break them as a Jason even though it is not fully Jason may be can it can still try to break them as a JSON format and and create field fill value from that there one and give it trim spaces I have a document over here so so the cavity of space is basically this configuration modifies the behavior of kV mode when you were set to auto and auto escaped for this too because if we remember when we said kV mode to auto that times blank will try to create this key value pairs using this skill whenever it is finding equals two it will try to create those key value pairs right now when your K V trim spaces is equals to false what is happening okay mmm so generally Splunk what is what it will do when it finds the field value okay so in that case it generally dreams the spaces which come in before and after this field values okay so if you give it to floor falls then those spaces will also be written by Splunk this is one one of the key differences in the kaiba team spaces of true and false ok and now one one thing you to remember that in Splunk web currency do not have that two feature where it will also show the spaces in in the event or the the field value but when you run a search this field where this this spaces will be will be retained as well so you won't be able to see like CP were searching for even here you we may not be able to see those pieces but when you run a search it should be there as well okay so let's move on let's talk about match limited little bit before we move to field Elias so match limit and depth limit requires when you are giving some regular expression over here right so that you can control the regular expression engine behavior generally it's not needed to change much but you know the match limit specifies how many times the regular expression engine calls that is their internal program match to get all these different different matches okay so now the by default value is I think ten thousand if your event has very huge event huge even then you may think about changing the otherwise I think ten thousand is sufficient enough and the depth limit is when declare is an Indian backtracking it's right if you want to know backtracking you can check out my regular expression video so that that that how many times it is trying to backtrack that limit you can specify as well so again this one you will generally not change much but based on your scenario we can change it okay let's move on to Phil Elias okay so we discuss about all of this here so field Elias basically what it do it allows you to create or another field okay allows you to create another field from the existing field okay it is some kind of renaming but the existing field also getting retained okay so if I give like Phil Elias some class name okay can give any unique last name over here just like the other report and extract class names here okay Club snips must be unique throughout the configuration file that that you need to make sure okay so if I am giving suppose here I was using this Cavey mode equals to XML right over here so it was creating this messages dot level messages dot location this feels right suppose I want to use it as a level field only okay so I can either use a rename command over here like like something like rename this message is not level as level so but the rhenium current problem is it will the level is coming over here right it will the the original messages dot level will not be here right so you going forward you need to use the level on linear in your search everywhere okay but the field Elias is you can create you can create another field from the existing field and the existing field also will be retained field Elias generally we use when you want to normalize the field name suppose you have data coming from different different sources right suppose one source may have the IP address as IP underscore address another source may have the IP addresses just only the IP the field name is only the IP so but in your application you want to use a common name in that case you can have this field Elias over there so that you can always rename those fields whatever is coming from the source during the search time okay so this is the search time field such time feel like extraction configuration only so let's demo it demo five now you are talking about right so I will delete it again okay we'll go to settings add data upload select file data three next I'll choose demo five now right okay so for demo five that means it will still break as Cavey mode as XML that means tool still create those messages dot fill names right next review submit start searching now if you see here messages dot level has been created by kV more under equals to XML and using field Elias I have created this field as well that those two fields having the same values right but we have created a separate I have created a separate field names so in your search you can use any of this field now okay so this is how the field Elias works now this lead search to demo number six over there we have to use the eval okay now you you heard about evil right in my channel also I have discussed very detailed the evil command right now this evil configuration also do the similar kind of stuff but without running is explicitly the evil command you are achieving the same stuff over here using this evil configuration so it is doing basically an evil expression only so here if you see I have kept the Cavey mode as XML I have kept the field Elias as well so now previously filled Elias I was creating new filled form level and location so this was I was this one I have created the fill new field from message key class field okay and now one question arise over here any well I can write any kind of evil expression right so generally well expression involves the different different field names right now what kind of fill I can use it over here so for that we need to understand one stuff here called search time operation order in Splunk okay so this is the full order list when a Splunk search runs what are the different different things happens right in terms of field extraction okay so if you see the first thing Splunk run is the in-line extraction okay so whatever in-line extraction if you are providing for that particular source type it will first run that then it will go ahead and run the report one if you have mentioned anything over there but I am talking about a single source type here okay it is not like across the source type No okay so then after the running the report one if if it has any it will try to see if you have min mention any automatic key value pairs or not we think using this Cavey mode it will apply that okay the next one is the field Elias if you see here right and the next one is the evil one so if in evil I want to use a field suppose in my evil I was using this message key class field right if my field Elias do not expect this message key class field I won't be able to use it in my evil I have to use it this field only but as I was using this field Elias so field Elias will be applied before this evil expression will be evaluated right so it will be creating this message key class and then what I am doing here in with this message key class I am just appending this message given value field name basically I am doing a appending of my sista class and my sis key value this to field okay in different way right let's see how it is doing so demo six so I will say delete I will go to settings again add data upload data three file again next I'll choose demo six now the more six next review submit start searching so if everything goes fine there should be a okay now now one thing I forgot to discuss over here this is the evil expression right but what will be the field name that field name should be that class value of the event so this is the difference between evil class value so if you see it should be creating a field called class underscore value over here somewhere see here okay with appending all those two fields so what it is doing first KB mode is basically doing the auto extraction using this one this message is those dot field values then field extraction is creating this message key class field if you see it here okay from this message dot message queue class field then I am appending this message key class field with this message key value field to create a class underscore value field this is the one okay so here in eval this class name is important because plunk is going to create the field name with this class value okay so we talked about evil as well to talk about field arelius we talk about evil now we talk we will talk about the lookups now if you remember from my previous video where I have discussed about lookups like heavy store lookup sectional lookups we have created automatically comes right now this configuration this automatic lookup configurations is actually this one so when you create automatic lookup configuration Splunk create this lookup configuration into the props dot-com file where basically what we are saying we are basically whatever lookup command is doing we are doing the same stuff over here right using this configuration so I will not be discussing this one now now in terms of search time operation order lookups comes after you well and field Elias that means the fields created by field Elias and evil can be used in look up look up can reference these two fields also it can reference the automated other fields created by this one okay so let's move on okay we talked about this already let's see in the demo what we have now demo number seven now this one is an interesting one this one we will be discussing about that delimiter based event extraction sorry filled extraction okay so we have seen like how automatically heavy particles takes ml was doing the filled extraction we have another way to do the filled extraction is using a delimiter based that means suppose you have even something like this let me open the file dealing with an excellent okay so suppose from I think this one is from WebSphere log so it is generating is this a single event okay so this is generating even something like this the field name then a colon then a tab so if I turn on the all characters it's a tab separated right and the field value everywhere so our intention is to extract this field value pairs from this particular event so I have taken just only single event over here okay now we can do lot of stuff over here right either we can write a regular expression for each and every fill value which we I have discussed it before or we can do a delimiter based extraction where you can give the delimiter for different different field value pairs okay the first delimiter would be the delimiter for the field value pairs to the hole pair right so if I turn on the all character so all the field value different event field value pairs are separated by newline right so that's why I have given black /n over here then the next delimiter should be how values are separated from the fields so it is separated by a colon then it them right so I have given it over here let's let me turn on the all characters I have given it to : and black slash t as a temp okay let's see how this activity report is getting and and I have this is the trans this this delimits has to be in there as a part of the transformed conf file okay so this is a stand on him so I have given this activity underscore report in my report standing him over here similar stuff right so the delimiter option also you have to give it as it the port stands on him as well with the report configuration okay so let's see how demo seven works to do that I will do a delete first go to settings add data so this time we'll be working with this file delam extractor next so I'll choose Taymor seven as it is a single event though there is no event line breaking is happening over here will click on review I click consume it and say start searching so if you see it here Splunk created those field values based on those delimited values looking process ID product if you see this one component versus ID thread ID everything it has been extracted so this is also one way to extract fields using delimited list okay so now let's move on to demo number eight okay now this is an interesting one again so here if you see you will be extracting a multi-value field over here okay so before we move on let me tell you how so for that for that I will do I open this file now think about this file this is also another log file I have just created okay I think based on the email session messages so it has timestamp and different different key value pairs using separated by equals two right so now if you think about this one so I can write some kind of regular expression to extract this fields as well but if you look at it closely there are multiple key fields here okay same field name okay here now in this scenario if I right a expression in Splunk what is happening is see if I go over here we have just written this this regular expression here right so based on the regular expression Splunk only extracts the first occurrence of the regular expression as a field value this is the by default behavior of Splunk so if you have any other occurrences the regular expression can match any other patterns as well in the same events right Splunk want to be able to extract those things as a field value I will demonstrate this one as well okay so now to overcome this one we use this this one okay let's let me close this one we use this MV extract okay so what we do in the transform con file will say MV underscore add equals to false or true now when MV understood articles to true it will consider that as a multiple value expression that means if the regular expression has multiple matches it will all this matches will it will consider as a same field value okay so now MV extraction if we see it here this is the MV extraction transformed confined right so I have given MV R equals to true okay now here I also have given a regular expression right now let us talk about this particular regular expression here so what I will do to do that I will copy this event here okay and then I will copy this particularly glar expression here you see this particular regular expression is matching only the key and the text so it is basically looking for the TN text as a field name and then whatever corresponding value it has it is extracting them right so this underscore ki underscore one in this case it's a group name right it's a group name over here but this underscore ki underscore one has a special meaning in Splunk we will see when we will discuss the format option okay in the next one so currently we we will think of it as a named fear another group name okay and let's see how it is working so MV extraction so I will be using the demo 8 and I will be using the data in the extraction file right so first I will delete this event so two things we are will be achieving using this demo 8 one is the multiple value expression if you see it over here there are a lot of key right the field name is key is lot of places so it should create a field with all this 1 2 3 values and text with value 1 value and value 3 right so I will add that one I'll say MB extraction okay next I'll choose demo 8 now then wait okay it's a single single event so no need of any event line breaking live you submit start searching okay so now if you see it over here right it created a key value it is a key field with all these values it created a text field with all these values okay so all these values are getting extracted well one will have tooth relative that means all the occurrences of the regular expression now let's see it first when I am said stating we are equals to false that means I will say no in the extraction so knowing the extraction is my demo 11 so I will directly jump to them we live in before I discuss demo 9 and the more thing ok so demo 11 is my MV extraction I am mentioning s false that's why I have given the chance from can't stand your name as no MV extraction so name is just an NB articles to false let's see how it is happening we'll discuss about format later so so ideally so we will expect that it will so currently we are seeing key one has three values right because of this all the occurrences because of the MV MV underscore articles to true now if when mb i underscore ad equals to false we should expect the key value is only one it only extract two entry let's see so I will delete this null SEC settings add data and then upload I'll select I'll select the same file later in the extraction I will click on next this time I'll choose the demo 11 okay I'll click on next review submit start searching now if you see that he has only one when value one text has only one with value 1 the other occurrence is gone okay that's the main difference between MV and the score add equals to true and MV underscore add equals to false ok now this leads us to another one which I wanted to discuss is the format I have told you right so there is a special meaning in this one underscore key underscore any string okay and underscore well underscore any string in Splunk okay so now we will talk about and also I have stated before if the regular expression do not have any kind of named group that means group name we have to use the format okay so the Matt is working in a same way okay in the format you are using dollar one okay two to refer the regular expression output first regular expression output that means if you have in a regular expression in your if you have in groups in your regular expression to be very specialized so dollar one represents the first group okay if you remember from my regular expression video we can refer a group either by in name or by a group number so here we was referring as a name here we'll be referring as in number okay so and dollar two that means represent the second group in your regular expression so that means if I just modify this particular regular expression okay and remove this named group okay this will become this one right so only key then pipe text and then without this value one this regular expression now if you see in this regular expression I have two groups one to on group is extracting my field name another group is extracting my field values that's why in the format I have given dollar one which is basically extract the field name which is referring to this regular expression then this is the syntax double colon okay then dollar two so that means this will be representing the second group in my regular expression which is the field value okay that means ultimately if you give this format it will create a field with this regular expression and the value with this regular expression okay but the behavior should be same it will be always capturing the first occurrence of the regular expression antara unless you are adding MV underscore R equals to true okay so to demonstrate that the format stuff okay so what we'll do we have this MV extraction format transform and I think I have done the in demo 10 okay so the output of the more pain should be same right because here also MV R equals to true and in the extraction all same we are equals to 2 so both of these are doing the same stuff only difference is here the regular expression is having a named group okay here the regular expression is using a format but ultimately it's doing the same so the output will be all the values it will be extracting okay so to demo that I will just delete this settings add data so now we are talking about demo thing right so upload select file so MV extraction next so we will see the thing next do you submit start searching so if you see here again it is extracting all these values because of MV underscore radicals to true and and the format also properly creating this key and takes two values filled with field names right so it is doing the same stuff over there so we talked about MV extraction in detail as well so there is another format called clean key extraction but I was not able to make it work I don't know maybe it's the internal Splunk issue or not the clean key is basically suppose you have a data something like this ok the field name has some kind of special criteria in this case it is it - right so if you see here in Splunk how it is extracting I will first show you that one so to do that what we'll do we'll say delete ok and then what we'll do we will go to settings add data we will go to upload will select this clean key where I have given it - in the user ID field right if you see it over here now we will choose the demo nine okay so we're here if you see the transform name is clean key extraction right so here I have given the same rejects what we have given in viata equals to true also I have given this is the same stuff what we are mainly worried about is clean underscore keys that means if I if I index this data Splunk will replace this - with under school but when you mention clink equals to false it it should not it should not basically replace this - with underscore it should keep the decid as it is but I don't know for some reason this this does not work in Splunk I'll go on Nick so he canceled I will select demo nine now next review submit start searching okay if we see it is still replacing with user underscore ID I don't know for some unknown reason I was not able to fix it maybe you can try and let me know if this is working for you or you have some solution for this one okay so we have discussed the this thing's as well so let me go back to my original diagram I think we have discussed most of the stuffs over here daily meets clean keys keep empty wells is another very easy configuration where suppose plunk is extracting those key value pairs right see if some for some key all the values and null Splunk generally don't extract that field so in that case you can give this one key PMT wells so that it Splunk will extract those fields as well okay there is another option called can optimize which you should not touch it's basically telling like whenever it a links plunk whenever necessary you should optimize there is no reason we should say it falls okay so this is how different kinds of such time field extractions or works and and this is also Splunk recommended way to do the field extraction as well in next video we'll talk about index time fill the extraction in detail see you in next video thank you
Info
Channel: Splunk & Machine Learning
Views: 17,838
Rating: undefined out of 5
Keywords: splunk, how to, field extraction, props.conf, transforms.conf, REPORT-class, EXTRACT-class, MATCH_LIMIT, DEPTH_LIMIT, KV_MODE, KV_TRIM_SPACES, FIELDALIAS-class, EVAL-fieldname, LOOKUP-class, REGEX, FORMAT, DELIMS, FIELDS, MV_ADD, CLEAN_KEYS, KEEP_EMPTY_VALS
Id: zIjeCYafLCE
Channel Id: undefined
Length: 48min 32sec (2912 seconds)
Published: Thu Jan 10 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.