Index Lifecycle Management work-with-me and AMA

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Applause] hello everybody welcome back to the lunch session live stream our elastic live user group i am your host jay miller um apparently one of our guests has a motorcycle behind them but that's okay i'm happy to have with me two wonderful guests from the elastic team to talk about a topic that i was not all too familiar with until we started planning this and now i feel like i still don't know all that much about it but at least we'll i could do a little bit better in explaining it my guests are josh and george folks how's it going doing well thanks how you doing doing pretty good what about you george i think you're still muted so maybe that motorcycle's still happening so we're gonna assume that that's the thing that's me and of course all right well um i'm gonna let george figure that out and then we will just kind of go in but we are talking about index life cycle management this uh week and josh let's let's just kick it off with a very good introduction what is index life cycle management or ilm sure thanks for uh thanks for that introduction uh so real real quick let me kind of tell you who i am really quick i'm one of the support engineers here at elastic i've been here for a couple years uh was a customer before that so i i've really developed a love for uh for the elastic products and very happy to be able to come here and share that with you um really quick the ilm is elastic is built on indexes built on very small pieces of data uh saved into larger pieces which finally join into a index uh the end of the ilm is a policy for moving these indices around to different nodes uh with different retention periods so you keep data that you need on hotter nodes what we call hot nodes but it is also more performant nodes so that you gain that performance when you're writing and reading from the data as it's no longer needed it can move to lesser performance nodes warm nodes cool nodes you don't search it as much it's no longer being written to you don't really need the performance there so it just moves along and then finally you go to the delete phase if the data is no longer needed so it's just really a policy to keep your data as long as you find it relevant to what you need yeah and just to add what what josh said you know quickly when you when you first start elasticsearch if you're playing around with it or you're kind of evaluating or just hey you're curious about it you'll quickly realize once you set up elasticsearch you get data ingested and then you start to to to you know look at cabana and you know view this data you'll quickly realize like hey you know maybe you leave it for maybe a week or maybe you leave it for longer and then quickly you run out of disk right so and and you're like hey my dashboards aren't working or i'm not able to ingest data what's going on and that's just typical because i think you know when we talk about the life cycle of data or having data in elasticsearch um you kind of have an idea of of how that data is going to be managed how it's going to flow and and then yeah as josh was saying how are you going to match kind of like you know your the assets the value of your data with your assets um because you know as josh said if you're looking at very you know the data that's just ingesting for like observability you know the first data that that arrives onto the cluster you know you want to you know if it's an alert or if it's a an issue you want to be alerted right away and that's why you want to have the most performant nodes you know ssd drives you know a good amount of cpu and memory to be able to drive that and of course as data ages then it becomes less searched and thus less valuable so then you want to to match those resources the value of that data with those resources so you want to you know traditionally go to like a warm architecture something that might be you know warm um you know something that might be uh ssd front cached or maybe even spitting discs and again as the data continues to age even further where it's maybe read only data or data that you need to keep for regulatory reasons or let's say hey in case of a security breach you know and this of course for the enterprise standpoint you might want to keep that data for 18 months and then how then do you assess that you know given that value of that data how do you align those resources there yeah so so data management in itself or the index lifecycle management is all about that how and of course it has a story on how long it's it's kind of you know that journey that we've we've done in the daily life cycle management so one of the things that i think what made it hard for me to wrap my head around this so much was i tend to work with finite data sets so you know i take a csv file or i take a series of csv files as it's becoming more and more for me and i'm uploading them and i'm uploading them about like i'm doing that like one time and then i'm having to work with you know 8 million 10 million records from you know six or seven csv files and then i think it finally clicked once i implemented like a log stash solution or once i implemented like beats and it was like oh you know 10 million that's nothing that's that's absol that's just gigabytes and what we're talking about is being able to effectively scale this system to where you have you know terabytes of data billions and billions of records um how soon should you be thinking about hey this is going to be an ilm kind of thing i need to be worried about or is is there never you know a time you just start it out always do ilm always have that policy well um so ideally we would like everybody to start with that thought in mind but when i when i first started playing with elastic and started spinning up clusters one of the first things i read was your first cluster is just going to be terrible it's going to suck people don't know the the intricacies and how to structure this data so just go in with that mind where you're gonna you're gonna learn through iteration you're gonna just keep putting on another cluster you're gonna learn from your mistakes you're gonna implement ilm be able to save your data for longer periods of time and then remove it over time but ideally we would like people to start thinking about these questions before they before they put up a cluster if you look for what could potentially come in build your ilm for for the best case scenario so you're going to have a lot of data you want to keep it for a long period of time because ideally it's not if you if you need it and you don't have it you wasted wasted all that time so i think winston churchill said that hope is not a strategy and i think what i mean by that is i think it's all the best intentions right i think if you know you yeah you get a chance to play learn kind of explore and understand that and and then from my comment earlier yeah usually the first you know kind of pitfall is oh my my data is full or my disc is full like if it's on a container like whatever you're running it on so i mean uh yeah you quickly realize oh i need to manage this data so hopefully if you haven't tried uh our stack yet or elasticsearch yet we can say hey start to think about what are some of those you know things that i i want to start off with you know you know traditionally we see um and again i think you know i'll ask you josh because you work with customers and and our users but you know what i personally see when i when i when i speak to people is you know seven days is is kind of like the you know where the data will arrive and your hot notes and you may not have it really depends because i'm sure you know the audience uh here that that's or that's listening to this they could be you know very varied right it could be very from someone that wants to play with the the software to kind of learning get more and then maybe there's some people here that are very experienced and and are or maybe a part of enterprise looking to implement this so you know regardless of your range you know you can start thinking yourself well you know i've got about um you know x amount of data and i want to think about how much data that i'm ingesting and i can start thinking about okay well you know and then the value of that data how how long do i want to keep that data and retention for before i ultimately yeah yeah do i can i delete it can i save it right um you know what can i do so you generally and and one thing that we'll kind of show you know as we're kind of talking along is you know we have uh as as josh was talking about island policies we can kind of show that within kibana and also in our elastic cloud but um you have flexibility of really defining what that is so to your question jay in a long roundabout way is yeah i would start thinking about how uh what are you trying to accomplish if you just want to evaluate it yeah keep it maybe for seven days like you think about how much data you're ingesting or look at it and figure out how much disk space you are and then just set the the the next phase as delete you don't need it um but as you continue on with your experience and growing then you'll understand okay yeah i need to to then move this and then how do we do that we'll actually kind of talk about some of that in tco type piece as well so it definitely sounds like it's one of those questions you should be asking when you're about to undergo a new project and by the way speaking of questions this is an ama so i do want to throw it out there to the folks watching on the live stream if you have questions put them in the comments put them in the chat i can't promise we'll answer every single use case here because we all know every use case gets answered with it depends um but if you do have some of those it depends type questions we do have a great discuss channel that you can go head out to and maybe you we have some other videos that are show some examples similar to your situation and you can do that on this youtube channel make sure you subscribe um and then also make sure you're connected with our community over at community.elastic.com but i i will say you know kind of riffing on that same idea of questions um a lot of my friends are like consultants so they're they're the kind of people that they make it their job to say you know hey company you know acme.com says i want to build this new architecture i need to build this new infrastructure how how hard is it to implement from scratch versus going back and like trying to retrofit this existing you know infrastructure with iolim how hard is it to go back so you already have the infrastructure in place you've already got your elastic search data there so if it's already built properly and you already have it sized you can you can obviously add or sorry not obviously but you can add policies to your cluster and then apply those policies to your indexes so it's really a simple move from uh infrastructure that's already in place to start adopting the ilm policies and and to add to that um usually the those consulting conversations you know i always like to when i'm in a consulting type role i like to answer questions with questions just to make sure that i have the whole scope right because in in talking with a lot of customers i think the the idea of how much data they sh the the retention is is is an open-ended question because it's going to really depend across industries across use cases depends if you're just playing with it or you're actually implementing it so you know the one thing that that i like to to bring up and and talk with other you know folks are you know okay you know you know let's kind of you know ask the business itself okay let's just say it's security data and we're looking at we're gathering data on you know metrics um you know logs security event data maybe some other you know telemetry data that's you know really kind of builds out your you know data that's going to support your sim or security use case and you can ask the business if uh if for whatever reason you know that we need to go back to this data for security investigation investigations what what are we going to you know how much data we need to keep and if having those conversations with the your you know your stakeholders really get an idea of uh you know well legally we need to be able to search this data uh or you know it for for a max amount of time um you know i think a lot of the industry i think a lot of just based off of uh what other vendors limitations are you know i see a lot of people saying hey 90 days because that's all what they can hold and they're all capable for for all intents and purposes you know um elasticsearch will scale you know way beyond the the terabytes and petabytes worth of data it's just matching again those resources and and how valuable that data is and how readily available that data needs to be so obviously you know looking at your retention period and of course that's me uh the my next-door neighbors just decided to do some construction right now but anyway um but and of course hey this is gonna be the the the thing about uh uh consulting right is unexpected things that come up right you know maybe the business has uh you know we want to keep a year's worth of data well obviously if you keep that all in ssd storage and hot nodes that's just economically infeasible unfeasible so how do you match that data how do you how do you get that data to match those assets and we'll kind of go into that a little bit but i think having those real conversations with um you know how much data we need to have and then say okay what is our budget kind of based off of that what what what am i actually allocated because then from the the business gives you the requirements and then at least you'll be able to match okay that means i can keep this data here for a certain period of time and then transition out so hopefully that kind of gives a little more clarity on on having those conversations i mean first initially just with your stakeholders and asking like what what do i actually need um and how long do i need to keep it for well one of the things that i kind of thought about as you were saying that was also the idea of what it means to retain data like i think that there's a lot of you know you see these security breaches that happen because someone's like you know we have the last 50 years worth of data stored inside of this database that no one is querying no except for like the few times where you need to find it that's a few times you need to archive a thing um i feel like what this does is it almost gives you a good reason to say hey this stuff doesn't always need to be like accessible right then and there and i think that that can be a part of your strategy too of like what's the natural life cycle of this data how long am i going to need it for this moment in 90 of the use cases that i have and then the 10 percent that i do have let's let's look at you know viewing and reviewing that data in another format or maybe bringing it you know saving it as a snapshot and then like maybe just loading that snapshot and running with it then and there versus always having it available which then of course makes you know any type of data you know intrusion that may happen because it happens you know we have to acknowledge that it does happen we can at least prevent all of the data from being collected at once actually josh i had a question for you and we were kind of talking about this and and i you haven't got a chance to answer this quite yet but i mean from the customers you speak with what are what are some of the retention type you know you know the customers to talk with uh what's the retention period that you see i mean kind of like min max so as with everything it that really depends so you've already done it it depends on the data type it depends on policy it really depends on what their use case is um majority of them like you said is hot seven days and that's pretty much industry-wide everybody wants seven days data quickly from there like i said it depends on what type of data you have you have web logs you have metrics metrics you're not really going to need beyond the first few days so those are going to be gone they're unusable web logs though that one when i was a customer we would want to keep those year over year because we want to be able to see the busy season previous the previous year and compare that to this year so some of those questions are are what you need to go back to your customers with and find out what you need so seven days hot 30 days warm 90 days cold and frozen but in my case it was year over year so it was 18 months frozen so that it's gone but the the problem there was is we didn't have the frozen tear at the moment at the moment so it was yeah trying to work trying to save space as much as you could so we've kicked out some terminology i want us to like give a quick definition of this for those that are just getting into the space what what is hot versus warm versus cold versus elsa as i call it yeah so uh the hot hot is your the newest data that is being written to uh it's it has replicas you have you want to be able to write as fast as possible and query as fast as possible so that goes on to your most performant nodes it has a lot of ram for for your heap it has ssds for the storage it just performs as fast as possible and typically just to match that is it's the most value in terms of like security or observability use cases typically that's the the most valuable data that you need you know so elasticsearch you know we're in near real time and what i mean by that is yeah there is going to be a delay um and really that delay is is in a lot of ways configurable but i mean down to the second when that date is on you know on on the uh on the cache and it's ready to be searchable so it is near real time so i mean if you are receiving you know if you're seeing high cpu like on a metric side or you're receiving an error message that you feel is critical and you have you know alerts that you can configure inside um you know kibana on then you want to know that as soon as possible if you're just doing a observability or even security there's a weird event that's you know there's some correlation there that i i'm getting some details on you know let's say there's um on access to a resource from an uh unknown ip address that has never been there i mean those are some of these actions that you want to be alerted to right away so having that data like for example on what josh said is is critically important yeah um it's that value that data yeah and i think it really you really want to take the time to plan out your hot data your hot infrastructure as much as much as possible because it's also the most visible if if it's not uh planned out well that's also the most expensive in most cases as well it is definitely by far yeah yeah so so that's the hot warm is very similar very similar to hot it's still very performant uh but it's no longer being written too so you you've gone hot for seven days once you're no longer write to that data you move it to warm so you can still have replication for the reed throughput but you're no longer writing to it so it doesn't have to be as performant you can you can gain some reperformance if you add replicas and whatnot so it's not not as expensive it's not as big of an issue if it uh if it's a little bit slower so it's not as visible and there's a lot of really quick george we have a good question in chat and i'm going to throw it up here can we define the ilm that the data landing starts in warm rather than hot um i'm i'm interested in this because i i do often wonder when it comes to hot that's like i feel like that's almost like you know app date i want that to be so responsive that you know the people that are searching on a yelp like app where they're looking for a location date or they're looking for stuff you can just move things into where they are based on their recency versus stuff that gets added may not be required right then and there so maybe it is like i'm just i'm adding this to add it we can move it to hot when we need to move it yeah and and one thing because i think uh vikram and my apologies if i i mispronounced your name but you know if you're doing if you're self-managing elasticsearch so you have it installed on a laptop a server or whatever you really have the ability to choose what's the hardware hardware profile for hot so even though when jay josh and i are talking about hot you know let's say that hey i don't either i don't have the budget or you know the use case doesn't really call for having ssd drives maybe my hot in on our definition it's kind of an abstract definition maybe disks that would be like spinning disks or maybe disks that are um you know less performant or you know not as much cpu and memory right so when we say hot this is kind of what we've seen in experience when we're looking at like maybe some larger enterprises but let's say for budgetary reasons or just for use case reasons you're hot or that as the data comes in could be whatever you want for the underlying hardware or even if you're using like you know in google instances ec2 instances in amazon or azure or whatever you can decide what type of instances that you want to have that will be your hud now now i you know we kind of talked about warm and we haven't really talked about that cold and frozen um which really kind of um uh definitely adds more of the you know reduction in the total cost you know the tco um that really can help out that kind of comes in the territory of like some licensing stuff so we want to be upfront when we talk about some of that because some of it is just paid features but again i think kind of understanding the full breadth or some understanding around that and then also introducing some other concepts that are still your you know open and uh free and open terminology but we can kind of discover some of that you mentioned tco i'm guessing that's total cost of operation that is correct it's not um tops um no chickens and no no no no don't confuse them no but i think that that is that is a good point is there there's a lot of you know as ricardo mentioned you know hot does it just means most access it means it's the thing that's that's up and running it's in your face and it's always there um i i do wonder though because josh you mentioned warm is a lot like hot except for it's no longer being written to it's kind of that read-only state so could warm be the most frequently accessed and hot you know not be that space so this is so i think what you're talking about is a different form so we're getting more into the data tiering when you're talking about warm you're trying to move it over to a different node this is with the with ilm the the warm is never is is never written to so but it's just a policy but it just tells us where to go so below that he uses some of the settings in the in the hosts to to define where you can move your data which is i believe more about where this question is we want to move the data over to the warm nodes first instead of instead of the hot nodes [Music] and i think hot and warm is a industry term in a lot of ways i think hot is synonymous with expensive most accessed um yeah very expensive or as warm as you know traditionally a step down not as access as frequently like i think what ricardo had said is accurate it's just not as accessed this frequently and then cold or even frozen it's just it's it's less and then you could almost think about performance also under that same type of um temperature gradient but but again you know we we're in conversations with some customers that just want to do hot and then you know after seven days they don't they they are want to still keep the data maybe for regulatory reasons or whatever so that data can go directly to cold so you don't have to go through all the temperature gradients from hot warm cold and frozen you can go hot directly to frozen and you know there's some um and that will help also you know reduce greatly the tco because if you you know the warm nodes for example they keep some of that the replica data that data on the nodes themselves but i mean you might not need that so there's there's a lot of different options and kind of really thinking about your use case there and really you know if i got a fixed budget like all of us will have uh you know what is my budget oh i can keep more data if i just do seven hot and then let's say 18 months of on frozen and i know that maybe there might be a slight performance hit um but still it's okay because it it's going to be a lot quicker for me to do than just like uh you know take a snapshot and then restore from from backup you know so there's a lot of these these questions on on on accessing the data and when you need it and when you when you need to search it yeah and i probably need to backtrack a little bit on something i said you can still write in the warm warm nodes but it's just not as common you you try to keep the rights off of their updates off of there you want to do that yeah and as a and then opposed a cold and frozen are specifically read-only as well so it's kind of one of those things where so here's another conversation too that we really haven't uh come to the subject on is um we also have where you know uh customers and users will uh focus on how fast they can index data you know or ingest data into elastic search so they may have nodes specific just for how fast can they ingest data and they have some other nodes that are just focused on how fast can you search the data and you know i think when you look at elasticsearch there are a lot of these tuning knobs like it's almost like if it's a boeing 737 cockpit you might turn here and there we've got a lot of great guides on on kind of best practices for in uh for uh fast ingest speeds versus fast search speeds but you know generally we're kind of when we say hot and we're at least i'm specifically looping the indexing and and search all kind of within this hot category so okay yeah and and we've still got kind of those two areas to to you know look at too so let's let's jump into like cold um and and i love that george even mentioned before that like once upon a time frozen wasn't a thing um and i don't mean the disney movie um like let's let's figure out what cole's role is in this in this realm of you know hot and warm and we kind of know what they're doing and now how cold where's that at yeah and i think you know it's so you know uh for the people that can't they can't let it go they can't let data go like frozen no yeah i got it i got it in there uh you know there's there's the cold um so um i'm gonna put in a hopefully this is not gonna uh conflate um what we're talking about um but i think it's very important because to the people that that we're speaking to either you are using elasticsearch the open source or or free uh open and free or you are licensed and so forth there's there's ways that you could do uh when we talk about cold in kind of two different contexts um one way obviously is the free and open you know you can use with the basic licensing you can use data tiers to move your data using you know ilm index lifecycle manager to move from hot to warm to cold and you as a user of the software you've defined cold as your you know the you know maybe from ssd let's say you have um spinning discs something that you know is is just you're going to take a hit on your query performance because um you know those discs the hardware is not as performant so you could do that today you can define your hardware hardware profile in cold to be that that's kind of like the abstract way now there is a feature that's that's it's a licensed feature so this kind of comes into the realms of this is something that you would you would purchase from us that where if you're using what's called searchable snapshots will allow you to search snapshots that you have taken so traditionally snapshots were used for you know you know if you wanted to back up your data or data that you wanted to archive and then maybe you wanted to restore later to search you know so there's a lot of solutions out there like if i if i know that i'm searching my data for the first seven days and then after that you know i want to archive let's say on uh on a object storage like s3 or maybe um a minio like a s3 compliant storage device that's locally you know you could do that and then you can say hey i know this data is approximately a month ago i'm going to restore my data from snapshot and kind of rehydrate that into my cluster and then i'll be able to search that data um but searchable snapshots um and it works with the you know ilm or you know moving that data into that allows you to search your snapshots without you rehydrating the data so basically what it what it does is it kind of decouples storage from compute so traditionally elasticsearch you know when you install a node it's the storage is attached and and then you use the storage of the local no the local storage there to run but by decoupling the compute and storage then you can start looking at different storage tiering options like for example s3 so you know the cold allows you uh to keep the primary shards which is uh you know the primary uh data shards um uh you know is is used for for search but the replica shards used it also could be used for search but in this case when we use cold the replica shards are now sitting on um s3 backed up uh on snapshots so if there's a uh uh like a hidden uh if there's a issue with the data you know you could restore that particular replica data into your primary shards and you won't lose any any data and by the cold data by using this license feature you're able to store a lot more data at the same cost as your normal data because instead of saying storing your replica and primary shards together you're only storing your primary darts on shards on local data on local storage and then you have your replica data on on s3 much lower cost uh you know options so with cold you could think about and this is only read-only date so data that you're not going to change and modify and move around so the the the cold data tier using you know searchable snapshots allow you know really reduces your tco because you're able to search right and store um twice the amount of data as what you normally would do and then there's a lower tier before i go that um i'll make sure you know if you have any questions jade um or maybe kind of get from audience but uh but from but even there's a lower there's even a further tier called frozen which we can kind of talk about as well but that cold tier itself you know gives a lot of opportunities to say hey i still want to have some you know very performant my data's on local storage it's it's going to be very fast but i still want to have my data being replicated or being protected and i'll have that replica data stored on on on lower cost storage so hopefully that kind of makes sense sure josh you have anything uh yeah and also on top of that we have engineered this though to still be very performant so it might sound like it's it's going to be slower but it is still a very performant search so it's going to be um it's still going to return your results almost as quick as as warm so it's it's really a benefit for the customers that use this because they like george was saying you can get a much higher um index to ram or store storage to ram ratio going from warm to cold so it is a huge benefit for the customers okay and then and then you mentioned george there is that last part that frozen um here and my my guess is as as warm is to hot is frozen to cold is that it or i see i see the face now where it depends is going to come out of someone's mouth actually it's it's it's actually i'm really surprised with okay so let me talk about frozen and some of the performance around frozen um because um we're not robbing peter to pay paul we're not just giving you s3 storage to be able to search on um just so that you're you know throwing you know you're we're storing this data on really low cost storage but then it costs you in terms of search performance because i think that's kind of the the thing there i mean you can expect latency and maybe some slower search queries but when you're kind of understanding the underlying technology i think that it's um it's really beneficial so frozen tier you know is is we're again decoupling storage from compute um and then also again it is a licensed feature so we're talking about things that that are that you know you need to pay for but the frozen tier aspect is um we're we're storing all that data in a snapshot that resides on object storage like you know google cloud storage or amazon's s3 storage and we're able to um we've done some work on the underlying lucine which you know elasticsearch is built on to provide we're only going to um search for what is going to complete the query based on on the s3 and then also we've built in a search cache into elasticsearch so now a percentage of the memory used for you know is used on the on the host is dedicated for a search cache it's a search catch mechanism so you know let's say you want to search a petabyte worth of data well whereas the first result might take a lot longer those results are going to be cached into this into this this uh this query cache this system cache well it's not a system cache but it's a a a cache for the searchable snapshots and next time when you search for that you the performance is going to be greatly improved um so for example and there's a a blog article that's out there and maybe if we you know i can find that in just a second and put it into the comments but it's it's searching a petabyte of data in minutes and you know i mean minutes is when even when we query the first time looking through a petabyte of data of data stored on s3 and we're using the underlying leucine modifications and enhancements we've done we're only searching for the things that we need we're able to search a petabyte worth of data in nine minutes and of course um once you search it again we drop from minutes to seconds so um i think i have a slide on this like i could probably show let me see if i can try to find it real quick because i think we've been kind of talking a lot which was great um but there's this um great um let's see if we can try to find it real quick so while he's doing that i do want i see some there's some we have some comments thrown in here that are very specific instances i think that obviously you know we'd love to help you know as much as we can however on this stream it probably wouldn't be the best type of support but again a good way to to handle that is heading over to our discuss forum they're asking your question you have plenty of people including elasticians that are observing that and will hopefully get you to a good response but uh go ahead george let's see let's see what you've yeah and i think you know we could we could definitely take a look at some of those those questions um of course i just share my screen um so uh let me know if you can see some of the um performance information here on frozen tier sizing yup i'm just going to do this here um what i really wanted to focus on was kind of some of the performance there um now the the blog really goes into great detail of our methodology and how we we test it in size so then there's no um you know don't look behind the the curtain the man in the curtain or whatever um so you know we're we're we're definitely transparent on how we're doing a lot of the sizing um and so forth but yeah we we look at a as a simple term query on a four terabyte data set you know when we do a and we're looking kind of this hot warm cold frozen type of architecture you know the hot and warm you can expect your first run on a on a term query to be on four terabytes to be around 92 milliseconds and then i'm cold you see that there's not much of a difference that you know we run that that cold again that primary data that we're searching across um that sits still on local local storage but we also use the primary uh or excuse me replica shards also in our in our searches or queries so that's why you see still a better performance with hot and warm because we're not only searching the primary shards but also the replicas but now because the cold in the cold tier the replicas are stored on s3 storage we're not searching now we're just searching the primary shards that's why we do see a slight decrease in um in performance we see like 95 milliseconds here but again i mean if you think about that's fine i'm willing to take a couple milliseconds more knowing that i can store two times the amount of data you know given if you have one replica and one primary shard i can still store a lot more data on my primary with just a trade-off and performance and it's not by much but when we look at frozen you know if we do um you know the first time we're doing a four uh terabyte data set we see six seconds but on a repeat run we can kind of see now that some of this data is loaded into cash hot warm runs quicker at 29 milliseconds cold runs at 38 milliseconds but if but frozen we see a great huge improvement in performance down to 70 76 milliseconds so you kind of take a second to stop if you're looking through security data let's say you're looking through you you've identified the ip address of a of an adversary and then now you want to look at what's the first time i see this ip address amongst my security data or my observability data that i can look for that ip address across all that data and if i need to reference that ip address again those subsequent queries are going to be much faster so that that's kind of one of those those benefits where the first time you query it's gonna be slow but it's okay because you know what we're doing here on this next orange line is when you restore from a snapshot it's you know you're the same four terabyte um uh four terabytes that you that are that's in a snapshot you're restoring to the the cluster it's going to take you about an hour and 12 minutes for you to be able to restore and then put that within a cluster and then be able to search it now that i clarify there that is that is you have a snapshot you're restoring that snapshot and you're loading it this isn't if you're using something like searchable snapshots where that's correct yeah yeah and i think it's important to know because if you are using our open source or just our basic you know licensing so you're not paying this is a completely acceptable method for you to be able to archive your data with without any lights licensing costs right so if you're restoring you know if you have a good system a catalog system where you are after your you know seven days of let's say hot and then you automatically want to archive your data and keep that data on very low cost storage that's an absolute thing that you could do you just know that it's going to take you know 72 minutes per snapshot you know to to be able to restore and get that going that might be completely acceptable to you and and to your the your your uh your leadership your uh yeah your business leadership right um but the ability having let's say um to be able to search it you know is is is great so i mean if we look at scaling up to a petabyte and this just we just looked at frozen right on the first run it took us nine minutes to do searching a petabyte of data stored on s3 and the repo repeat runs is two minutes so it's orders of magnitude greater performance when you're looking at using that searchable snapshot on frozen tier so and i think another part of that as well is you know we're looking here four terabytes versus one petabyte sorry about that but if we're thinking about looking at hot and warm and even cold like one of the reasons we we talk about ilm there are performance reasons to remove the cruft if you're looking for something and it has it's gonna have to search everything no matter what so if it has to search everything make everything less and you can move all of that stuff somewhere else where it's easier on the system to find it so if you have a petabyte of data it's better for you to keep four terabytes of that petabyte of data in hot and warm and then move the rest of it over to frozen and then you don't have to say well how come every time i'm running something and it's in hot it's it's going to be it's going to be slower than 92 milliseconds um so just the idea of managing your data in this method has its own performance benefits as well and josh i think you probably have some insight into um when you have data um you know when you and i think jay what you're talking about is um you know elasticsearch is very fast and and it's very you know it's a search engine it's very performant but there are ways to make it a lot quicker right and we're insinuating that you know based upon what data that you commonly search you know you could reduce that footprint to be able to get the best the best search results i mean i um i when i was in support um and josh i'm sure you have a story as well but um you know we would have uh customers say hey i i'm ingesting all my email and i want to be able to do a full text search on all my emails and i want to and some customers wanted to search every email from the beginning of time and that data set is just massive and sometimes as an administrator or as someone running the cluster you kind of have to put um you know guard rails so you don't bring your cluster to its knees yeah josh do you have any any thoughts on on that oh yeah you always have to set the expectation you need to structure your data when it's coming in because it is very easy to cause more harm than uh than what you're trying to to build so yeah that's why we have the the uh templating that's why we have testing it's just you need to do the upfront work to make sure that you're not going to harm your harm yourself with bringing in this data and then i i'm not sure if we should talk about new features we also have like data streams which helps oh yeah um query fast it helps you write fast so i we we are always innovating these new these new technologies to uh to help you uh be as performant as possible because it we are getting so much data now the customers are putting in so much data that it's that we have to stay up with with new technology and provide this new technology to be able to write and read fast so it sounds like we're going to have to bring one or both of y'all back to talk about data streams too um yes but i i definitely do agree that you know in in some cases we're talking about support um i used to be an i.t and i can tell you i would much rather have the one support ticket you know every year or so that says hey i'm trying to find this email from 20 years ago um how can you help me versus having the constant email you know support tickets of ah search is slow or ah this isn't finding or there's there's a bunch of junk in here i don't need um and i i will say i'm pretty sure we get more tickets of of the latter because the the first time someone doesn't have you know the most amazing of experiences you know they're willing to shout uh we've talked a little bit about you know these different tiers and i know we we kind of scheduled this out for an hour so i want to make sure we have some space for questions if there are more questions but then also um how how do we set this up like how do we get started with these data tiers i know data tiers are kind of like one piece of the whole island picture but i do think it's a great place where people who may be looking to implement this for the first time or maybe looking to go to their boss and say hey look this is a thing that could give us some you know some cost benefits and performance benefits they're going to want to be able to flip a switch and and show how how easy it is so i guess the first question is is it that easy to just flip a switch on data tiers yeah i think so so shameless plug i think josh and i and faith we had uh was a meet-up um some months ago and josh did a great job of showing what the uh if you have an existing elasticsearch cluster how you were to introduce ilm so i think that's that's a great um that's a great video to kind of go back and watch you might want to get for the first couple of minutes because i think we're going to do introduction about what is elastic and great if you know that but um but even since the art when we had this talk um you know when josh and i did this um you know there's a lot of advancements like in searchable snapshots for example it's not something that him and i were able to talk about at the time but in that time once we have that i mean we have all this so i mean we know the pace of innovation elastic uh is is crazy uh even internally as customers we're our internals employees we're like okay what do we lose that's crazy like how does that change the the conversation of data management um but i do want to show um you know this is insider elastic cloud but you have the same experience if you're deploying elasticsearch in kambana let me go ahead and go back to my screen here and let me know if you can see this you're all good perfect um and so what i'm looking at is just kind of like the home screen for for cabana this happens to be um i have a a deployed cabana um you know it's kibana on on our elastic cloud but again you'll have a very good you know uh similar experience um within a managed your self-managed cabana but if i come over here um to kind of our uh our menu items on the upper left-hand corner you'll see something towards the bottom that's called um stack management so kibana is not only used for um is the window into your data to be able to visualize and explore and and um yeah and analyze that data but it's also for for stack management now all of this can easily be completed as well through apis and through the the dev console which you know i think um josh definitely goes into but the simplicity side of to getting started i think would be great just to kind of go through kibana and look and look through it but down here you'll see um where we have index life cycle policies so if i go into that under data then we can start creating our policy um and we there's also some policies that we put in here that's kind of a you know a basic you know our system industry system indices and how we we manage and create that but if i create a new policy i'm going to call it you know let's say data management and then maybe i'll say it's test because i'll know to delete it later but if we kind of look through this this is a great ui that the elasticsearch and cabana ui team worked on um that kind of walks through each of these phases and gives you some of these options for example in the hot phase um you know we have some different options like do you want to roll over your data from the current index and then you know there's a rollover where it takes it from one of your hot nodes and then roll it over into into a lower tier or lower phase index and and then do we want to do what's called a force merge um and you know jay do you want to to briefly talk about like what a forced merge and a shrink are um let's let's let's hit it really quick sure oh yes yes on time oh you're asking me to do it i just oh no no i i said i said j i meant josh i was like i i am your host i'm here to smile and um make sure chat doesn't explode smile and wave smile and wave no i'm sorry yeah so force merge really quick guys the quick overview of it as as you're writing uh events to your disc it's almost like a fragmentation you just write it to disk as quickly as possible so you're going to end up with all these events all over all of your disks and segments what this what a force merge does is it takes all of these segments and shrinks them down into one larger segment and it just keeps getting shrink it just keeps getting smaller smaller number of segments larger segments so you uh so you end up with one large segment at the end it's the most performant yeah i almost think about it this might age me a little bit when you had um spinning discs it was almost like a defrag right when you had to run a disk management tool under in windows yeah um i use linux now so don't don't judge but um but yeah i mean yeah so you're kind of reducing the amount of um uh you know uh segments that you have yeah yeah and and then and then shrink obviously is also if you have um we talked about primaries and and replica shards you know they're used to be searched and if you could also shrink on how many of those shards that you need yeah you know if you have the primary you can you can uh tell how many chars uh elasticsearch cluster has you could say you know the default's one primary one replica but let's say you have a use case for having a larger amount of of primary shorts or you want to have you know um some extra redundancy so you'll add uh extra replica shards well having more shards will give you the ability to have more you know threads working on a search of course you can have too much charge that would negatively impact you but a shrink is just the ability to take some of those that sharding that you had and just reduce it down and shrink that down so i think that i think priority also is kind of overlooked also that's a really good one if you uh have important data if you have an issue and you have an important data it you can set a higher priority on your important data so it's recovered earlier in the recovery process so you have some really important data you want it to recover first you can see it at first and you set your non or less important data lower down so it's not restored or recovered as as quickly so you can't see this being something along the lines of you know again if you're if you're just constantly pulling in data and pulling in data and pulling in data recovering that recently pulled in data may or may not be the most important thing there may be like some importance on historical data and making sure that that's preserved whereas the you know the newest data could probably just be reimported or you know you could say error has occurred please try again whereas you know data that you've kept for three decades [Music] it may be you may need that data faster and to make sure that you know that data is yeah i guess sturdier yeah again it it's all based on your what's important to you so just just another setting that we provide yeah absolutely and so by looking at the advanced settings here you know right now we're with the only thing that's configured is the hot phase and then and then you know we're going to keep data in this face forever until you define you know when you want to delete it right but what we're going to do is we're going to say yeah we'll want to have searchable snapshots so that's going to be where we're using that that functionality that license to also use the frozen tier um but when we come down to you know we can configure for example the warm phase so you know move this fit into this phase when the date the data has either reached for example um days hours minutes and seconds old so you know for example we want to say when the data reaches you know um actually if we can come to the top uh sorry on the warm we say we want to have this into let's say seven days so as soon as the the data is there and seven days then it will move from hot and then into warm um and then the warm phase you can also have um you know how many replicas allocation there's some other um priorities here you can set and of course if you want to set cold same thing you do searchable snapshots um and then going into the advanced settings on data allocation and of course finally to the frozen phase on you know when this can move in but as we're configuring this you can actually see for example um at the very top you know where where you at on how much data you have in each phase and if we increase or decrease that that kind of sets that up um one other item i'll say let's say after you know uh we'll say we don't want anything that's uh a cold or warm we want to go from straight from hot uh you know um the hot phase directly straight into the the frozen phase you you certainly can do that um but yeah so we want to move this data let's say um once it reaches seven days again and then we can kind of configure to the to when do we want to delete this phase so we actually click the delete this thing and say hey i want to delete this data after it's it's reached um you know let's say um five years for example um uh do i have yours no i have days i think they're gonna add that on later so let's do yeah 365. so anytime after a year then i'll automatically delete and that'll be that final phase let's say if you don't want to keep everything but and we can save that policy and there's i know we kind of quickly running out of time so that's why i'm i'm kind of you know moving quickly but it really gives you the flexibility in how you can define these policies and then move those that you know there and the surgical snapshot is just only going to that's the license feature that's going to be it gives you the ability to do the um you know half the in cold half the price of the primary shards on local storage and replica charge on s3 or just be able to search on s3 directly but again for self-managed you might still have that cold tier being the lowest of your hardware capabilities and that kind of offsets that tco piece so something else to really mention also is all of our agents have policies with them so when you do a proper setup with it it these policies will go into your cabana island policies you can go in there and take a look and see how we set them up and it gives you a good uh beginning to see see how they're set so if you're running metric beats file beats you do the setup it imports it for you so yeah and so if we go into metrics itself you know and this is kind of based off of our experience of what we've seen our users and customers doing so we have metrics for examples in the hot phase you can go in and and modify that so so we we kind of come up with a template you know we're more i don't know prescriptive is the right word but we're definitely more opinionated on how those phases are but it's just a a guidance you could easily go in and change those policies as you're as you're configuring that yeah yeah very cool well it sounds like you know as i said you click a few buttons and you're off to the races i'm not going to say click blindly uh give it some thought make sure you know what's happening especially when you hit that delete button but uh josh george thank you so much for giving us a good you know quick lunch time lesson and uh are y'all hanging out in the uh the you know community forums as well is that how people have questions they can get in touch with you absolutely yep yes go josh i'm sorry no no yeah i just say community forums is one of the best places to get a hold of us so join up there and i'm on the community slack as well so um well while the slack's good for general information and some things we can help out um i'm usually on discuss for like the larger issues and things that we'd like to solve but yeah the the elastic slack channel is also a great um a great way to to get in touch with other people like-minded people that are they're using the technology so but i think we almost ran out of time i mean we did run out of time i i'm i'm always happy to talk about the data streams and some other aspects of of of ilm and the david tears and searchable snapshots piece but well based on uh some of the comments including this one i do think that uh data streams will be a thing that we have to add to our list of things to talk about but uh as you can hear the bells the bells tolling for us that's the same to get back to work but i want to thank again george and josh for hanging out with us as always we do this every tuesday different topic every time most of them about the elastic stack in some way not all of them but you know we hope that you'll join us next time on our uh our little lunchtime user group here but until next time take care thank you take care thanks thanks josh thank you

Info

Channel: Official Elastic Community

Views: 296

Rating: 5 out of 5

Keywords:

Id: a6rFk7cgibE

Channel Id: undefined

Length: 60min 18sec (3618 seconds)

Published: Wed Jul 28 2021