Splunk Commands : Discussion on tstats command

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay in this video we'll talk about these stats command in Splunk okay so what these that commands do it basically performs a statistical queries on indexed fields that means in TS ID x-files okay so now one thing we need to understand what is TS ID x-files are and where they stored and in which cases this psi DX file is generated before we go deeper into the command okay so now first the question is that what is what is the PSI DX file now T is ID X Files our time series index file that means this files are created by Splunk by each unique word in your in our local in our raw data with the correspondents reference to the events so that whenever Splunk is trying to search those suppose you you run a search query like index equals to something then then some some some value let's say some text right so how Splunk search internally basically it first look into that TX ID x-files okay and see whether there is any reference for that keyword or not okay then it gives the reference from the draw event from there and then picked up the raw bean from from the disk right this is the how the whole operations worlds so this this TS ID x-files and the raw event data that combined lee what as a to create a index bucket or whatever index we can see in Splunk okay so that means whenever we talk about a Splunk index it's basically a combination of a TS idx file and a raw event data file and the PSID x-files is basically works as an index file that means it's basically map's your each and every unique cure in our data to a grade or even data okay now these ID x files also generates when we do a data acceleration okay now data model is a separate Splunk knowledge objects right so whenever you create a data model Splunk eternally creates a separate kind of index data okay we will see like how data model is different and if you want to know more about data models I already created a video for data model previously I'd be giving that video link as well here just just stay for that one okay so we got a fair idea about what psi DX files are now using T stats what are the data we can query that means you are essentially talking about what are the different sources of psi DX files okay now as I already discussed normal index data always have at least ID X file now there is another option data manually collected with TS click comment but from Splunk 7.3 I think this J's colored command has been deprecated okay and the accelerated data model so will be mainly concentrating on the normal index data and X dirty data model mainly in this particular video okay so we talked about what psi DX pfizer and what are the different sources which generates 13 is ID X Files okay now let's let us see one Q side X file I just wanted to show you this one so I just go to Splunk home log Splunk leave Splunk so this is our this is our Splunk DVD rectory right where all the indexes that has been stone let's see or main index let's say default it means our main index right now instead the default DB there is a DV folder okay now if you go inside the DV folder this is the separate separate folder it creates four different different time ranges right so these are the epoch time values over there okay now if you see inside that folder there is a file file called TSI D X file okay now this is the file it sprung creates when you when you have when you ingest some data into the index based on the data available in the index this this ID x files get generated now if you want to see the content of this thi DX file there is a way to do it okay I'll just show you that one as well this is some kind of advanced plong concept so from the command prompt okay first we will go to Splunk bin folder okay so we'll all go to our Splunk home CD now we'll go to green folder from here we'll run it command called Splunk CMD walk legs this is the tool to see the content of the PSI DX file now you have to give the whole path of this TSI DX file I will copy this whole path I will within double quote so we give this one in slash the file name is ID X file and copy the whole file name here then space with this space it is Choi so now if you see this is the content of the TSI dex file and if I just show you in my Splunk index index there are 12 events over there if you see there's some JSON events as well there's some non-jews and events as well for each and every event it creates a key value if you see it over here it creates a key value over here okay so this is how the psi DX file got generated okay so it is good to know like how to see the content of this file that's why Jason I just wanted to show you this one okay now let us let us get back to our main discussion on the T steps command so basically the T steps command work on this psi DX files on me okay so that means it's only refers to the CTSA Dex files and do the statistical computation on it okay now if you see the syntax of it like these stats there are a lot of options over here what I did is I did the color coding of it okay that means if you see the similar color coding that means those options are related okay and now if you see this this red color coded stuff generally we don't touch this stuff okay that's why I color coded it wait wait wait okay red color so the if you see the green colored stuff that means those are the steps only the required part of this whole command rest of the stuff everything is meant an option on over there okay so that means if I just go to our search over here and write a command called T stats okay stats count now it supports if I just open this screenshot over here Steve stats supports a lot of aggregate functions now if you see this if you see this signature over here okay so and if you see there are a lot of options this these stat functions are mandatory right now these are the stat functions supported by these stats and if you see my evil video I have already discussed some most of the stats functions over there it works in very similar way okay so that's why I will not just cover all of these functions today I'll be covering some of it today okay but these are the list of functions these T stats T'Pol's now if I just write something like this these stats count okay more time let's see if you see it only returns that twelve events over here okay and by default if you do not mention anything over there it works on the default index now if I just go back over here now there is one thing you have to remember over here when we talk about TS ID x-files right that means the data which are already indexed that means the fields after index in the data the available fields only those fields these stats can work on that means these stats will never work on the fields which created at search time that means search type field extracted such such time extracted fields will not work for T stands okay so this is the thing you are you need to remember that one so that 30 states if you see the host source source type those fields will definitely work for these stats but if you are creating a field using a ricks or something that will that will not work for distance I'll show you an example as well for that but before that let us just see the syntax why is how it is working and one thing we I just wanted to mention over here as well like as these tabs access all work on only these DSi DX files right it is comfortably more Ferster than stats command because the stats command generally work on there are or on the output of the search command right so if I if I just show you something like this I just duplicate it so if I just write something like index equals to underscore internal okay then stats found by source let's say okay if you say this this stats commands is actually working on the output of this guy over here and this guy is not working on the case idx files right it is working on the raw data because if you see in Splunk search prompt whatever search you're on it always there is a such command before that this command is actually doing all the stuff in the search prompt the search command is not needed right so that means that that such command is actually working on the raw data to fetch the data that's why the stats command is very much slow compared to T stats will also see like what are the limitations of distance so that we can definitely understand which scenario what command to use okay so we saw one of the use of the city stats commands is like it just accessing the normal index now if you see it over here if you want to mention the index over there okay so what you can mention is to the fire cross okay where index equals two so if I just mention that one now now it is working on the internal index so to mention in index you need to use the word loss if I just go to this diagram over here so this wire Clause you can mention over here and also after them after were close you can mention it field value list as well so if I if I just see this command index equals to internal okay and just see let's say and if I just say only these two source type I want okay something like source type Splunk D and this guy so you can write something like this source type source type in inside the bracket you have to give spelunky and then the health mattress I think mattresses so if I run this one so you have an option to pass them as well and if you want to use the by clause just like we use it in steps right you can do that when the sir Clause name is by clones if I just gave by source type so you can see only this two source type has been filtered and corresponding count has been shown over here so we discussed the by clause as rules right and in T step by Clause even you can give time as well that is our underscore time right and over there if you give time you have a option to give a span as well over there so if I just give a span okay so let's say 100 let's say it's two days and we will just search here for last three days so even this is also you can do okay so we talked about the by clause as well the wire cloth as well so why we need to mention to just to filter out which indexes we want otherwise it will always if you don't mention anything over there glorious work on the default index now there is an important clause called form now here the main sources of the PSI DX files you can mention okay namespace means either you can mention the TSI DX file location okay search ID that means the TS collect job ID which currently is flung 7.3 version it is only the pic duplicated or you can give a data model name over here okay so before before we discuss the forum clause let me let me show you another stuff over here okay so see if I just go to index equals to internal okay so there is a field called group okay so now this particular field is not indexed okay so now this question came to my mind when I was I was basically researching this this particular topic is in Splunk how to how to know whether a particular field is indexed or not is basically like it it says search that particular whatever field is showing up over here it came from a search time field extraction or index time field extraction there is no way to know in Splunk and I think I found my answer this could be like this is by using the T stats command suppose if I write something like this index equals to internal okay stats it always give me results right because there are a lot of groups over there there is a field count group now if I do the same stuff over here these stats so what will be my command over here three steps count where index equals to underscore so that's my internal index now what I need to do I already have in my account I mean in by clause I have to mention group right so if I just say by grow and run this guy over here let's see so this guy I have run for last 24 hours so I will just run this one for last 24 hours only and see if that particular field is not an indexed fail okay that means is not a part of the meta metadata go made a return it will these steps will already give you known isn't found for the particular field but if you seen the normal stats command it's always give us the value the output so this is all of the basic difference between 80 stats and stats comments where stats commands work on both index time field extracted extracted fields and the search time accepted fields but these stats will always work on the index time index exactly feels only okay now let us let us concentrate on the data model part over here okay so if you see the color code of the data models there are a couple of couple of options like allow old summaries and then somebody's only sequels to true or false these things these three things are related over here right so that means these three things will be coming into picture when we will be having a data model in our form cross right now one thing you to remember is when a data model comes into picture in the forum clause of a T stats commands okay the T steps commands that access only t side the x-files only friend a data model is accelerated okay when the data model is not accelerated there is no TSI the X file has been created by Splunk so in that case if you window you apply these sets command on that particular data model that that will be working as a normal search command itself okay so you will not get the performance benefit of dirty stats command to show that let's let's see if I just duplicate it over here and further if you know if I just go to settings and data models over here when you are you installs plumb to by default data model has been created by Splunk okay one is for internal server logs another is for internal audit logs so if I just go inside the internal server logs so there are lot of nodes about this particular determine like scheduler acceleration licensure so if you want to know more about data models I already created a video for this one please refer that one I will be giving that video link as well over here okay and if you see this is the data model logical name right now if I just run a command something like T stats count this is the to measure stuff we need right then form data model equals to the data model name okay so what is happening over here if you see it gives you a count but that case this particular data model is not accelerated if I just go back over here okay if you see the acceleration is not turned on over here okay see if I just show you on this one this one if it is good able acceleration the acceleration it's not turned on over here right so in that case the normal that this query performance will be same as if you run a normal search on the data it's something like using a database data model command right so if I just run a command something like data model then the data model name then the search right so it will be something similar to this performance the performance of this particular coin then stats count to get the to get the whole count over here but whenever we accelerate a data model right in that case the actual performance their benefit of the day's test command we will be getting in that sense now I just wanted to discuss this guy called somebody's only cost boolean that means if you accelerate a data model that means the data model somebody has been created right that means the PSID x-files has been generated for the data model in that case you can mention it over here you just want to access the summary summary data so if I just say over here somebody's only equals to true okay I don't mind this this particular data model is not it's it's not accelerated till now so that means this guy should not give me any data over here if you see this is this is giving me cars count equals to zero now because we are only trying to access the summary data and this as this particular data model is not accelerated that's why this this particular query is not giving you any data okay so we will see after acceleration of this particular data model how this particular query is giving you result okay so and if I just discuss the another one they allow old somebody's equals to true or false that means suppose you have created your data model now after certain time and you new accelerator accelerated or data model as now a certain point of time you change the definition of the retarder that means you added some more child elements over there okay in that case the definition the return will has been changing over there okay in that case after after you change the data model again you are enabling the summary like that means there are data model acceleration it will create an old someone new somebody as well right now if you want to still access your old somebody you have to use this particular option over here and how olsommer is equal to true or false okay in that case the data will be fetched from both the new change data change summary data and as well as the olsommer data but you do a very careful over here if you need to do it only fine you are sure that your old summary data is good enough to generate the data okay so we talked about the some of the options over here actually most of the options over here related to data model the two other options like local equals to true or false means if you make it true that means this particular the query will be mostly running on the search at otherwise it will be burning on the indexer okay and the chunk size is the size of the PSID x-files it will be read by image by Splunk okay so ideally should not be changing this one now there are two important important input over here pre stats and a pen now what is pre stats as the name suggests over here if you see the piece of the statistic that means if you say priest x equals to true okay so what will happen is whenever we say priest articles to true Splunk generates the data in such a way so that it can be consumed by the statistical functions like time chart stat starts command okay so internally sprung creates a data structure something like that or the output something like that okay and happened equals too true we will see that that use case as well today okay so so before that okay okay let me show you let me show you that one so before that what I will do is I'll just make this particular data model I just turn on the Samadhi okay so just go to edit acceleration so that in whenever the summer is building in between we will talk about the three steps and App Engine so summary range I will let say seven days okay same so it is currently building the summary if I just see it over here the status is building over here how much space it is taking this is all this information it will be showing up over here right and if you see the T stats we are basically relying on more than we are basically taking it trade-off between a CPU usage or the dicks usage over here that means our search would be more faster but it does the data are needed for my search will be taking more space because psi DX file will be taking more space compared to the raw even data okay so we'll come back to this particular accelerated data model because we will be running this search again to see now the summer is only equal to true we'll be working okay so now we will work on the another search window over here okay so so you are talking about the free states and the happened one right and please that as I said like it will be creating a data which will which is consumable by the statistical functions for that what I will do is so there is a use case okay the use case is let me show you the data model over here okay so if I just see this particular data model which we are working before okay so if you see now there is a there is a sub sub node for this later model is this API right which is basically having all the logs it is sticking up right and they stick to this plug that says love now for that again this is a child called job in point and if you see it over here it's the job end point is filtered filtering only the services last search less jobs URI paths okay so if I if I just wanted to see well what is the total number of API calls and among them what is the total number of job API calls if I just wanted to see that report how I will see through three steps command okay so that's the use case over here okay so first thing is we need to build a report first we need to get the count for the total number of REST API calls right so for that what I need to do is so this is we need to access the data model first so that means if I like just writing stats still st stat count form data model equals to internal underscore server right so that's our data model name but we need to access it's this API calls this particular node right this node name is plonk D access and it is under the root node Splunk server that is the server that means this is how we access the node of a data model when we use a T steps command is called where okay node name node name equals to okay node name equals to this server okay this is the logical name right sorry about the noise there is a crazy dog barking outside okay server dot the node name over here which is our Splunk d axis right so if I just run this one so I will be getting the whole cow so let us let us run this one for last eight days I know for eight days there is a job access over there okay say days ago so there are there this is the count right and if I just take this particular count let's say I'll just create a new field called count four okay I'll tell you why I am creating this one okay so this is required for the whole logic logic creation okay so then again now over here one thing I have to do is is we have to mention three stats equals to true okay so here is the catch over here now generally these stats is a reporting generating command that means it has to be the first command in the search string okay for any word you are mentioning if these stats preach that equals to true it becomes the event generating function okay that means it can appear in latter part of the search string as well which will do it now okay so we will run another three stairs okay command over here now we will access the count for this job in point level as well so what we'll do we'll just copy this guy over here then dot the job in point okay so in first three steps command what we are doing we are calculating the count that overall count at this this level the rest available that means how many overall count has been there and the second a stats command what it is doing it is calculating the count at the job and point level that means how many how many search job endpoints has been called right so now over here I need both of this count that's why over here I have to give happened equals to true okay so this is important over here because I need to enter see I need to build a rapport with both of this count that's why I'm giving a pentacle so true over here okay now if you see it over here after this particular T stats come and run this I have given this count four equals two total right so now that means whatever data has been generated internally Splunk generated okay so this all those data has this count for is total now the new data has been generated as well right now for that I will be having a count for a separate let's a job in point so for that I will be writing this this kind of statement it's a very simplistic logic over here but what I am doing I am checking for whether my count four is null or not okay if it is null then I'll be replacing by this one that means for these guys it will be still null because this is the new data I have added to this whole data set right so for them the count for value will be search appear and further the comm for value will be total okay now if you see just I just wanted to show you how the data internally looks like it is some kind of format which is not which is conceivable by the other commands if I just now charting chart count by okay count for if I just do this one now it will be more easier if you see now the total and thus search of API has been coming up over here okay even though the internal if I if I just do this chart command the internal representation of the data is not readable us by us at all right but it is consumable by the chart command that's why the this this pre stats come equal to true comes into picture over there okay so now one thing we have to remember over here okay and there is a analogy of this one with the with the summary index okay so where if you remember from my summary index video the way we push the rate of the summary index the same way we read the data from some money index as well right number index just work like a black Vox so this is similar concept over here though the sophistical function we use to generate the data the same surgical function we'll be using for the command which is consuming the data okay so this is important over here so this is how the priest stats and the append stuff works over here okay let's let's see whether our data model has been already computed or not okay it is 100% completed acceleration now I will run this over here previously it was showing me zero count right now it should show me some count over here if you see because of this summary sorry somebody's only equals to true I just wanted to demo that that stuff over here now let us go back to our diagram over here so we talked about in very detail about all the different syntaxes of T stats at one because as I as I already told you it's faster than stats because it's working on the T side DX file okay while stats working on the auditor now one of the disadvantages you can think of it is like this as I already say these stats only work on the indexed metadata it cannot work on the fields which is which is basically created because of the search time field it section you cannot work on that and it uses more disk space that that's the trade-off we have made because CPU we are basically getting faster access by using more more more disk space and the limitations if you see now T steps is really good for aggregating values and reducing rows that means if you say in in T steps commands if you if you group by underscore time and you give just one second over there ideally you are expecting each and every event right if I say your events are generating every second so in that way ideal you are you are not reducing any rows over there right so even though you can click two steps commands to do that but that will actually hamper the performance of the distance so but if you so for an example if you see that we have an example about here as well if you have 10 million rows in a data model and your T steps is grouping everything by time span equals to once again and running 8 million rows then it is not a good way to use these stats but if you are having 10 million rows and it is running only thousand roads thousand rows then then definitely it's a good use case for for the please test now the T steps command does not support any while while this character in the my Clause that means you cannot give in there in the field name basically you cannot give like any star in the field name over there okay so this is not supported and we cannot use any complex evil statement inside the aggregate function like count event something like that it is not distance so hopefully this video is helpful we just have a brief or I think very detailed overview of of the T stats commands it is one of the important command in in in Splunk okay so in future videos we will see more more more more other Splunk commands as well okay see you in next video
Info
Channel: Splunk & Machine Learning
Views: 8,723
Rating: undefined out of 5
Keywords: command, spl, splunk, tstats
Id: NLPlIiHS1OU
Channel Id: undefined
Length: 36min 46sec (2206 seconds)
Published: Tue Jul 23 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.