Splunk Configuration Files : Index time field extraction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay today we will be discussing about index time fill the extraction in sprung okay so if you remember from my previous video we talked about search time filled extraction process right where we talked about lot of props conf and transform conf different settings to determine the search time filled extraction process so in this particular video we'll be mainly considering on the right-hand part of this particular diagram where we will be discussing about the index time filled extraction process and if you also remember from my previous video we we say that Splunk generally they commence the search time filled extraction only whenever possible okay so that means we need to know in what kind of scenario we should be using index time filled extracts and compared to search time filled the extraction okay so the scenario is like suppose you have a very large number of event in your in your index okay and you are searching with some expression like it likes some field name equals to one let's say company ID equals to one okay but your most of the events are not having company ID equals to one maybe were among those 1 billion events only 10 events as company ID equals to 1 okay in that case when use you search company ID equals to 1 Splunk has to go through all the events to match this this company ID which with the value equals to 1 right so that that actually slows down the search so this is the perfect scenario for indexed time filled extraction already okay now it could be another thing like if you are searching for company ID not equals to 1 right so in that case and also your events most of the events do not have company ID sorry have company ID equals to 1 okay this is the opposite scenario of this one again in this case when you want to use the search time field extraction Splunk again have to look through all this large number of events to determine which event do not have company ID equals to 1 right so this is this two are the best scenario in which we can apply Splunk index time filled extraction okay so in this video we'll be mainly discussing about the configuration so we can do in transform cons and props con 2 for index time filled extraction and we'll see how in this time filled extraction happens mainly will be discussing about this settings called radix format match limit look ahead okay we'll see one by one now to save time and complexity what I have done is actually for all the different demos okay so I have created a props conf entry so Rudy will be mainly seeing 12 to 13 demos okay and their corresponding transform conf entry as well okay so what we'll do is we will try to analyze this one this this configurations okay and then we will see how Splunk behaves when some configuration changes okay so for that what I am doing is I'm going to my Splunk Enterprise now the first configuration we will be seeing it today is called radix okay now if I just go to our mind map okay so the radix is it transform dot conf settings okay now before we discuss regex so if you remember from my previous video when we will do the field extraction right so when we do we are doing the search time field extraction we were using either extract or report class right for and their corresponding transform conf stanza for field extraction and when we use this extract and report we used the radix for to break that event right similarly or basically not dream the event from to extract from that particular event right now similarly for index time filled extraction the class name is transforms so it is it is having similar structure transforms is the keyword then - then the class name here you can give any class name any name you can give but we will try to give we try to give some meaningful name over here okay and then a stanza name which is basically it transforms can't stand on it so if we are using this particular settings then we have to rely on transform cons okay and in the transform cons we can give rejects format these things with this particular transform dot cross transforms class okay so let's see how how we can define but before that let me show you that data so if we remember from my previous video as well so we used there are three so I just kept that data three file name as it is so that we can relate to it so it's a if you see it's a XML data right so where we have lot of messages tagged so these are these are word my single event resides okay between this message is tagged and when we were doing the search time field extraction we were actually extracting this this fields right the trade value the message key value the this this string name and the message the message value okay the trade value this this field we are extracting at this search time now this fields will be extracting at the index time okay similar concept so I just keep the same level so that we can we can compare it okay now apart from that I just made it slightly modified later three file we just introduced dummy card number over here there are some use cases in index time field extracts and it's very much specific to index time field extraction only we will see that those use cases so for that we this this data will be required okay so see if I just show you the radix okay so we have our radix 1:01 calm so if I just copy a single event okay we'll just copy it here so when we discuss our rejects we will see how the rejects is extracting field over here okay so the first demo we'll be seeing about we'll be talking about demo one all right so where we have if you remember from my previous video these settings I have kept unchanged that means this is how the event line breaking is happening so if you are not sure about how how this is you please check out my event line breaking video I have discussed the same file over there as well okay with this all these settings okay so let us let us consider so our in even line break happens then the transforms if you see the transforms I have added over here I have given a class name called demo instruction so it is basically having a transforms tangen in XML extraction right so if I go to XML extraction stanza over here so I have given three things over here so at the basics when you use transforms configuration in props con for index time filled extraction okay at it's very basic you need three things regex format and the right meter okay right meter equals to true you have to give for index time filled extraction only so that it will write it to the metadata okay so that you will see those fields will be extracted when the index time only okay so now radix I have given this reject so let's copy this rejects I have discussed the same regex in my previous video as well I will be giving that video link in this video as well so if I just copy this particular regex okay so if you see I am extracting for field here thread field okay from a single event look shann field message key class field message key value field okay and the message field if I feel so ver here okay from each event okay now if you if you are also not sure about how this regex works I have created a lot of rejects videos please please look out that for that one as well now format even in my previous video also we have discussed the format so what basically format does so in the format basically you give two things in the left hand side of this double column and the right hand side of this double column okay the left hand side determines the field name okay and the right hand side it reminds the regular expression occurrence that means dollar one means first occurrence as if I go to our regular expression window over here as my first regular expression is this one right if you see it over here it is capturing the thread field value right so that is what i am telling it over here so whatever first you are capturing put it as a value of the thread field similarly location is my second capturing ladx right so I will be putting it as a location field similarly for message class message key value and message field okay so now and write equals right underscore meta equals to true so let us see how Splunk behaves with these settings okay we'll see an interesting stuff over here so I will go to settings and add data I'll click on upload I'll choose my data three file over here okay I'll click on next now from here if you see currently it is the whole row even no line breaking happened so I will be selecting my source type this one the source type in the props comm the stranger named demo as you remember from my profs dot con basics video thus the Stanga name of the props Khan file if you are not mentioning the host and source specifically mentioning that keywords so by default it shows it's basically tail said it is the source type name right so that is the source type we'll be selecting from here called demo okay so if I select the demo if you see it over here some interesting things are happening if I show you over here first of all the event line breaking is happening it is breaking this whole raw chunk data into four events right and also if you see it's also extracted this field see currently our indexing is still not done right because we are still in the wizard itself but it is it's already extracted the location field message field message key class field and message key value field so this is the basic difference between a index time field extraction and a search time field extraction and the search time field the extraction under analyst we are running a search the field extraction not happened but if you've seen over here this is extracting field while it is indexing okay that's why it is called in the next time field extraction okay so just to show you it is extracting field so I will just click on next okay I will keep my main index as default index okay I'll click on review I will click on submit if I see this start searching so if you see those fields has been created over here okay but it was created during the index time okay so in in next demos we will may not be indexing the full data I just wanted to show you like those those index time field extractions fills are persistent over here as well so I just mean Dec State okay so let let us delete this data okay so again let's go to a detector so you saw how the transforms stanza works like this this particular settings works with the transformed scones settings okay with rejects format and light meter so three settings we have discussed over here okay let's move on so first we will cover all the settings related to transform conf okay then again we will move to prop scones now the next settings will be discussing about called induced underscore eval okay now if you remember from from any Splunk search right you can create lot of evil statement like new fields by using any well statement right so similar kind of stuff you can do in the while you are doing the in this index time field extraction as well okay using this particular settings called in just underscore evil but what I had couple of things you have to remember over here you cannot use all the fingers okay because if you see here I am using the same rejects over here right same formatting and right meta so this particular feels like trade locations my class my value it's still not indexed so over here you can only use those fields which are indexed okay have to remember this one and also there are certain fields if I just go to transform Khan documentation so there are certain fields called underscore raw and it's called meta underscored time which plans creates before or automatically when it a notice they even like mirror it a host metadata index right so this kind of fields you can use it over there as well okay so you have to remember this one so now let's go back over there so the how this is works in this underscore evil equals to you similarly as our evil statement you give a field name okay and you give a evil statement here right so over here I am using a if Clause I am saying if my underscore oo is something like backup if I if I just show you over here there are a lot of backup fields here I think back of back up keywords here okay so if it is back up then just it's just some dummy name back up start otherwise do not give any name over here so let's see how its ml extraction one works now this is a transform Constanza so that means mean my props gone this is my explained extraction one we are using if you see this both are same in terms of props gone settings there is no change over here we are still using the transforms then the transforms class settings okay but only thing is that my props calm the source type I will be using now demo one okay so let's see so again I am in settings add data page I click on upload I will select my data three file I'll click on next so I will be choosing demo one no okay so now see over here it created a new field called even underscore category as I I think almost all my event have this backup as a keyword okay that's why even category field has been created as a backup start okay so this is how you can you can use you felt like statements okay while you are doing the index time field extraction okay so let's move on to the next demo XML extraction 8 okay so don't don't worry about this name because I when I was doing the POC so this this came this idea came long after I was doing the other stuff so that's why I just don't want to lose this particular see the sequence so that's why it is given this name so let's let's see how this particular rejects works first okay so if I just copy this rejects over here okay so if you see as I told you I just included a dummy card number in each and every even though it is it is not related to any of this one I just kept it because I want to show you that one so it is just a extracting the card number over here if you see the whole card number just like get it card number okay now let's go back over here okay so my radix I am now only doing extracting a card number that's so in the format I've just given a field name called card underscore number and the radix the first one okay first occurrence of the Zzyzx okay right miracles to true as we discussed already now we'll be discussing about the look ahead look ahead means if we give look at actually takes some number okay if you give any number so that means look ahead the Splunk will take that particular numbers as a event size total even size it will not look beyond that one for this particular regular expression okay so now if I just take this guy okay copy it and paste it in a notepad okay Ryu I remove the word wrap okay my card number is actually coming as if you see column number 427 right so if I just give look ahead equals to 400 way below this number okay what will happen and I am I am giving I am giving my regular expression here right to extract the card number so it's basically it will basically fail over there because this particular reg X will not be able to extract the card number over here let us see this first so I'm going to add data so we are in a data window again right so just go back and next so this time I will be choosing so external extraction eight I have used XML extraction eight I am using as it as a demo 11 okay so I'll be choosing demo 11 here okay if you see the regex now did not extract it that particular field it failed over there okay so now the next one XML extraction 10 I am you I am giving the proper look ahead 447 okay now let's see how it is doing so XML extraction 10 I am using here demo 13 so if I just choose demo 13 now if you see the card number it is able to extract it so this is the working functionality of look ahead okay so whatever regular expression you are using so if we just give look ahead that means Splunk will only consider till that point the that but to search for that particular regular expression okay so let's move on we talked about look ahead now there is some other settings of industry well let us let us discuss that okay so so the first use case of industry well we have seen is like create a new field just like evil like field right now if you see this one if you want to create multiple fields you just need to give it as a comma-separated one okay now if you see it over here I have given even category equals to the same thing underscore y equals to back up then back up start then I have given you well category equals T well category then I am appending some string over here okay so my basic instance basic intention over here is to create a single field called evil event underscore category with this back up start then appending with this some string okay now if you see this is this is some kind of evil chain if you remember from my evil video we discussed about evil chain right so this is this is the same thing over here right so my intention is to create a single field now let's see house Bank is behaving over here so XML extraction - so XML extraction - I have used here demo - okay so same data I have so demo - now if you see it's creating basically a multi-value field know how I'll know because there is a multiple entry over here even underscore category one with just the backup start another week backup start then some string so let us see how ultimately Splunk is creating I will be clicking on next okay I will be choosing my main index by clicking on review submit okay start searching now if I just see the event category field okay and just stable it just copy the name if int underscore category right so if you see for all events it's a multi value field if it has created right now if your intention is not to create a multi value field which is in this case four hours okay because we want to create a single single field over here okay so in that case what I need to do is instead of equals - you have to use colon equals to so this particular extraction is same as extraction - only difference is that instead of equals - I have used clone colon equals - okay so in this case plunk will create a single field first it will create this even category field which as a backup start then it will overwrite that event category spilled with backup start then some string so let us see XML extraction 3 so 3 is my demo 3 so if I if I just remove this data okay again I'm going to settings add data upload select file data three open next I'll be choosing demo three here right so now if you see it is creating a single entry over here backup start something okay so this is how colon equals two works okay so in this T well this is another use case of ingestible now let us see vu I just make this about rap so we talked about this guy over here right XML extraction 3 X 7 X Direction for we are talking about now so similarly I have my event category backup start so now also I am now I am trying to create a multiple value of multivalued fields himhe fields over here ok so that's why I have not given colon equals to over here ok so now let's say I want to create if new field called M me underscore field okay with M V join now if you if you remember what M V go and do whatever the multi value fields are there like for a particular multi value fields whatever it has values with its basically create a new field ok with comma separated one whatever you are giving it over here so if you are not sure about how every join works I have created a separate video for every fields related functions so please take out that one ok so now let's see how Splunk behaves in this case okay so there is another behaviors Blanc has it over here so XML extraction for so XML extraction for we have demo for so demo for if you see it over here the MV field value has only a single value over here so that means my the giant failed is it but internally it it is a event categories my zmv field right as I as I've shown you before right so that means somewhere this ma join is speaking up only the first value of this particular mu field over here okay so to overcome this one so basically we have to tell Splunk that means Splunk is basically treating this guy as a single value field when we we are trying to use it as a multi value field okay so that means we have to tell Splunk to treat this particular guy even under score category as a multi value field okay so to do that what I need to do is we are moving to XML extraction v okay we have to tell Splunk this is the way to tell Splunk that my given underscore categories a multi-value field how we have to happen in V : okay within this dollar sign okay so if we do this one then in video n works as expected so let us see how it is working XML extraction v that means we are moving to 1005 demo v okay so now if you see the Amir field has corresponding both of this value as a comma-separated okay so this is how it works this is how we can treat the multi-value fills as well while we are indexing and we are getting evil fields okay index time let's move on to xml instruction 6 okay so this is the one this particular 6 & 7 are basically related to when you are creating an evil field right index time you can mention that particular field type as well okay so suppose I am creating a new field called event underscore length okay which is just the length of the raw data okay you can give something like this after event on the school length field name you can give within the square bracket the type of the field either it's a integer float that one you can give okay then Splunk automatically creates those type of fills over there as interesting field if you remember this this these things are important when if I just if I just index this particular data I'll just show you over here okay if you remember interesting field this a and hash blanks creates four numeric field value or string field event right so this you can govern using this particular settings okay so as it is length so Splunk will automatically detect so i will just not go ahead and demo it but this is the main purpose of this to extraction six and seven okay so to go back to our previous state and just delete the data okay go to settings add data again let us see now MV extraction we are moving to multiple value extraction from the event okay so now if you remember from my previous video when we talked about such time field extraction right when we have let me show you the file first so if you remember we worked with this kind of file data m base extraction where we have key value is multiple times we have key values right this particular field key is coming multiple times in my event similarly the text field also coming multiple times in the event right so we used MV underscore add to extract multi value fields so our intention is to extract this key as a multi value will not separate separate field similarly text right so we used this MV underscore ad true or false over there with some other settings right so in index time field extracts and scenario that is underscore match okay so we are moving to repeat underscore match so before we see the rivet underscore match let's see how by default Splunk is working so if I just copy this guy we have a single event over here so I will just copy this event here okay and we will copy the regular expression here okay now if you see I am just only extracting all that key and takes the values okay key value and text values now if you see here in the format I have given two rejects over here as I said the left-hand side of this double clone always extract the field name here I'm extracting the field name as well with regular expression that's why the first occurrence of the regular expression okay similarly the second occurrence of the regular expression should extract the value okay so now if you see over here I have not given repeat underscore match let's see how it's working so you have MV underscore extraction so what is my props dot-com okay so this is my demo 9 if you see in in terms of props con we are still discussing the transforms okay we have not discussed any other props conf settings till now okay so demo 9 in this case we have a separate file so I upload select file I'll choose data in the extraction for just now we have seen it like click on next okay so I'll select the source type demo line okay so if you see it is only extracting the first occurrence of the regular expression because according to this guy this particular transformation I am saying the first occurrence of the regular expression should be my field name second occurrence would be my field value that is what Splunk is doing as well over here it's only extracting equals to 1 it does not know about anything else okay so it is also matching with this guy over here the first capturing group covered here match one right so if I want to extract all of them I have to use this repeat underscore magicals to true so in this case what will happen Splunk will run this particular regular expression repeatedly okay under the unless it is finding all the matches okay so it will start from the place where it its finds the first match and will start from that that position itself okay so let's see how it's working so in this case I have MB extraction 1 MB extension 1 is my team or 10 so if I just go over here select demo 10 from here ok now if you see all my key and text values are getting extracted indexed time okay so this is how the repeat match works okay and it's very much similar to mb underscore ad settings for my such time field extraction ok now let's move on let's talk about the last one of the transformed form is called clone source type okay so this is very interesting settings in transform counts so if I give clone source type equals to a source type name any name you can give it over here any source type name you can give it over here so what sprung do is basically if I just XML extraction 9 is for mine okay it's my demo drill okay so let let's see so in this case I think my file is still we are going back again going back to data three file okay now as demo 12 is extracting on this one right so they move it will will see behavior first then we will discuss now if you see it over here I have a transformed called XML extraction 9 over there I have our normal regular expressions which is basically breaking the event and formatting it right meticles to true one thing I have given is clone source type equals to a source type name okay and if you see it over here what Splunk does it previously if you remember I have only four events now it has created eight events that means for each and every event when you have as clone source technical source type name Splunk creates a new event copy that particular event creates a new event and then on that new event okay it applies the transform okay now why we need this if you if you come across a situation where let's say a when we do a credit card transaction right so similar sometimes it may happen that in our system we have to keep that particular credit card data full credit card data before masking it okay - for let's say seven days okay and then after seven days we can mask it okay or it can happen it may happen that I have is I have a single event okay which I need to send it to two different system okay in two different way one is without masking the data another is with masking the data okay so in this case this clone source type comes very handy over here right so the transform you can play on the scroll source type okay so if I just go in to see it over here so this transform if I just index the data we will come to know okay so okay now before I index the data if you see I have given a new source type name over here right so I have created another stanza in props con okay which is basically a CDC MD settings which will be discussing very shortly from now okay which is basically using for we you can use to transform the data before we index it takes the steak credit card data transformation like so if you if I just show you the data over here so the card data if you see it over here it is coming as X X X X only the last four digit is visible over here okay so now the transport the transform has been applied to the new copied event see if I just click on next leave you submit okay start searching if I just give index equals to Maine okay if you see I have two source type now new source type and demo to demo 12 right and if I just only see the new source type my cards are already my cards numbers are already modified late musk already done masking already done okay and if you see it's also the transforms like the regular expression turns from applied over here as well like the location message message key class already extracted as per the regular expression right if I just select the other source type source type equals to demo 12 okay here if you can see my all the cards numbers are there in their pure format right and also there is no field extraction happened over here because if you use this particular settings the transform will be applied only to the copied event you have to remember this one okay not not to that relevant okay so this is the basic fundamental nature of clone source type and their use cases okay so let's move on we are I think we almost discussed most of the settings all the settings in transform con so this let's move on to the profs conf so I think the next big settings we are talking about in props dot coms perspective is the ACDC MD the data masking right mmm so if you have heard lot of times different problem statement like how I can mask the data before I index return into Splunk right so this is how you do it so this is the settings name ACDC MD okay and similarly a after date - you can give any class name over here and then you have to give this said expression over here if you saw my video on rigs function regular expression functions were there I have discussed this said expressions very extensively okay so what it is doing basically it is searching for this particular pattern that means I have 4 digit then a - it is repeating three times okay that I am replacing with the xx xx okay so this is the regular expression it is finding - the data and this is the basically the replacement string okay and G means we will be expressing globally that means whatever occurrences we have all the occurrences will be replaced okay so if I and as you as you already saw it over here when we actually discuss the new source type so that that that settings has been applied over TR only so new source type so the card number is already masked over the end right so this is the use case of this particular things it can be applied to in any others thing any other data masking things only okay so let me go back to my discussion thing if I missed anything or not so you almost discuss everything so match limit let's talk about match limit I think we we already saw in my previous video for such time field X takes and it has having the same meaning as well it is it is for the regular expression engine basically okay what will be the total match limit and total diff limit that means how many times the regular expression regular expression engine will call that match function okay so generally we do not change these things internalize it is too much it's actually needed apart from that there are certain settings called default value default value you give when your regular expression it's a phase we get case you can give default values Splunk will create the field with this particular default default value and source keys like over here we are applying radix right on the by default we are applying on there underscore off field right so source key with source key you can define another field let's say you have already one field which already getting already indexed right so you can give that field name to extract some more data more another field from that particular field so in that case source field comes the source underscore key comes very handy audience okay so I think we discussed all all of the stuffs over here okay hopefully this will be helpful for you guys see you in next video
Info
Channel: Splunk & Machine Learning
Views: 7,518
Rating: undefined out of 5
Keywords: splunk, how to, field extraction, index time field extraction, props.conf, transforms.conf, TRANSFORMS-class, SEDCMD-class, CLONE_SOURCETYPE, INGEST_EVAL, REPEAT_MATCH, WRITE_META, LOOKAHEAD, REGEX, FORMAT, MATCH_LIMIT, DEPTH_LIMIT
Id: u5gcVKymwwI
Channel Id: undefined
Length: 43min 23sec (2603 seconds)
Published: Fri Mar 01 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.