ElasticSearch and Ruby on Rails - Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody phil here zone master and ruby on rails and all the rest of that and today it's another ruby on rails tutorial as i seem to be want to do on this channel and today we are talking about elasticsearch and how i love it [Music] all right so the basic idea of what we're going to do i'm going to do today is i'm going to introduce you to elasticsearch and how to incorporate it into a rails app you presumably know what elasticsearch is but if you don't it's a document based database engine so kind of like a nosql type database i guess you would call it like that it's um highly indexed and easily queryable and it doesn't deteriorate so much over size so when you have massive number of records it keeps on going so what i've actually started to do in large projects is have two databases one kind of like the source database and that's a traditional you know relational database system let's say like mysql or postgresql or something like that so that'll be underneath and data that comes in is get gets consumed and is kind of stored in that database as a raw data store but data that's shown on the website most of that data ideally all comes out of elasticsearch and what i do is i have triggers and jobs that you'll see a little bit in in a little bit how i do that when the data is updated in the relational database we are updating elastic search so those data are pretty much in sync and the beauty of elasticsearch is that you can kind of denormalize data collate data so that you are generating from elasticsearch pretty much exactly what you need for the front end so it's pretty exciting i i don't know why more people don't talk about it maybe do people do talk about it and i just don't see it but it's a pretty cool way uh to work and so what i'm gonna do right now is create new rails app install elastic search into that create some dummy data show you how that is indexed and how you denormalize and all of that stuff that's what i'm going to do in this video i don't think i'm going to have time to get to other uses of it and you know things like that maybe i'll do another video if people are interested but this should be kind of a beginner to intermediate work up on elasticsearch so let me switch over and let's get into it sorry for my hair do i say that every time i don't know all right so here we are on the old macintosh i have homebrew as i think every mac user does every mac developer does to install packages and is using homebrew that i have installed elasticsearch and um and if we do home or brew services this beyond the scope of this tutorial is how to install uh elasticsearch brew services list and let's just scrape that to elastic and so we'll see that we have elastic i use the oss versions elastic search and we have kibana i don't know why i'm just not showing all of my services that'd probably just be easier so yeah elasticsearch kibana memcache is running i'm not using it in this and i'm going to be using my sql 5.7 i guess i've got running on here so let's create a new uh app right so let's go to rails new elastic app and for a database i'm going to use mysql and let's let that generate and you know sometimes i don't know why i've got this like 20 20 2020 maybe or 2019 macbook pro and it sits and you'll see it sits there and this is where i'll drink tea yeah let's talk about elasticsearch for a second while this thing is doing what it's doing so yeah a document based database system and i find that for me the big advantage well there's two big advantages as to why i'm using elasticsearch so much one is this is not really an advantage of elasticsearch it's a disadvantage of mysql mysql you can put a ton of data in there i mean you know i've got hundreds of millions of rows and things like that that's fine but if you have 100 million rows in a mysql database and you delete 99 million your database doesn't actually shrink in size at all so we have this ever expanding database even if we're deleting old customers and all of that it's ever expanding in size and that really infuriates me so ideally i would like to get rid of mysql but you know it's hard to get hard to get rid of it so it's also slows down over time and you really have to make sure your indexes are tweaked correctly for the searches so that's a problem in mysql using elasticsearch maybe i'm just not so very smart i don't know but with elasticsearch pretty much everything is indexed first of all so you don't have to worry about that it seems to free up space when you delete documents and it doesn't bog down or change so much based on how much data is in there it changes more based on how much data you return so if you have a honking big elastic search database and you search on something and you only return to records and okay that takes 10 milliseconds if you search on something even on a huge 100 million document elasticsearch database if you search on something that brings back you know 100 000 of course that's going to take a long time but that's the piping of data even on a local machine piping of data between the two so ideally you won't do that you can use something called scroll cursor and all the rest of that stuff we can get into more sophisticated elastic search stuff sometime if you want this is what i'm up to right now i don't know if i'm going to cut all this out because i don't know whether you guys i'm going to watch the view numbers and see if you just skip over all this crap because i'll put an index in there it's a very beautiful day here in uh in japan i live in northeastern japan but the interesting thing about today is about two hours ago a massive tire fire started somewhere over there so there's huge plumes of big thick black smoke and the windows are closed and we get a little bit of a whiff of it but not too bad okay here we are we're done thank god for that so we have your rails 6 app you know it's just a joe joe app i'll open it up in in vs code just so we can get an idea and i'll zoom it up and i gotta remember that you can't see the whole screen so i'm gonna do that and then let's just i just want to get the sizing right here there we go you should be able to read that right i would think maybe i can pull it over and i'll just be careful all right let's go back to the console and you know we need to add some goodies here so we're going to add i love this bundle add command i'm using it i mentioned it in the other episode and i've never seen it before and now i love it elastic search model so we want to add three li three gems persistence let's quickly pop over and take a look at the elasticsearch rails webpage on github and so there's elasticsearch ruby and elasticsearch rails i'm not going to delve into what the differences are but you know this has some hooks or rails and all that other stuff and i always find that you got to put everything in there so we're just going to put in model rails and persistence and then we'll get into that if you want to read more about elasticsearch rails i'll put the link of course down in the description to their github and put a link over to elasticsearch if you need that elastic search is free by the way in case you're asking so all right have we added it did i press the button no i didn't there we go so it's going to go out and it's going to dump that into our gem file so it should be in our gem file now and if we look down the bottom there we are elasticsearch rails model and persistence so we're going to create an initializer file um config initializers and we'll just add one in elastic search rb and what we're going to do here uh elastic surge model and i i just want to turn on logging and i'm just going to copy in something that i've got before you can just see what i've got here so um we're just replacing the client when the client is the thing that actually goes out and connects to your elasticsearch instance you don't have to do this what i'm doing is just that then it'll write everything uh so adding logging and we'll do a timeout just so we don't get stuck anywhere so uh good i think everything is fine and actually we're done that's the config is all done now we can go back and go on with the tedious stuff of creating some models and what i'm going to do this time is i'm not using um i'm not using a library example like i've been using in the other ones i'm going to use something that's a bit more familiar with me to me and that's a kind of like a purchase order thing like that so what i've got is um i'm going to scaffold an object called our model called purchase order with a whole whack of fields order number purchase date status sales channel all all that kind of stuff like that ship to name ship to address pretty much what you would have in an order so we're going to create a scaffold for that because we might want to take a look at it someday i'm going to create a scaffold for an item and an item or a product is the thing that you actually sell on the website so we're going to quickly scaffold together an item which has a sku a title a nickname a price in how much is an inventory a fulfillment fee and a unit cost and then finally the way i like to work when i'm doing orders is you have an order which is you know the order but then you have line items because that seems to be the way that all the online um e-commerce platforms work i'm giving you a good leg up here if you ever want to do something integrates with the e-commerce platform so i'm going to create an order item which is the line item and that will belong to a purchase order it will belong to an item because it's a line item it can only have one type of thing on it though it might have multiple quantity so it has a quantity ordered a quantity shipped a price a discount tax and the platform fee which will be calculated so bada bing we've got our what's it's uh our our models defined now i don't know whether you need to do this maybe it's just an old schooly type of thing migrate but what i like to do is for all of my floats i like to create a default like this set it float you would think i would do them in order so easy for me to find there's a float there float there float there so it's just a float that defines a precision and scale comes back to bite you otherwise sometimes and of course some currencies don't have you know any kind of scale like that but whatever so and on my integers i like to add a default so i'll just go through this one's done let's find the instrument how to default yeah i don't do any i don't bother to do any defaults on the other things like that okay so we're all good now one of the problems of doing something like this is that you need dummy data dummy data is such a pain and there's no real way i always just like to roll my own data excuse me and the way i do that is by using this amazing oh i love this thing amazing gem called faker i'll put a thing in there um did i ruby yeah faker this gem is so awesome and it lets you do all kinds of stuff it just generates random data and all kinds of cool data and what i do okay so we've i guess we i haven't created the database so let's do a rails db create now we'll see if i typed everything right or copied everything right rails db a railsdb migrate yay and now i'm going to go into the seeds file as good as place as any uh to put the place to seed your database and what we're going to do is we're going to create 25 items for sale let's take a look at this because it's again totally off topic but i'm not totally off topic but a little off topic using faker so i'm going to create some items and if we saw before our we have sku title nickname price inventory fulfillment fee and unit cost sorry if i'm talking too fast sometimes i get a bit hyper when i'm excited so skew we're going to create just a alphanumeric gobbledygook uh we'll create see faker has a commerce it has it creates a product name i'm going to use a promotion code for a nickname which is horrible but i can't i couldn't think of something else we'll create a price it can be any price and we're going to how much inventory is there i'm just going to take a range of 0 to 10 and pick a number out of it and i guess there's other ways to do that but that's what i do and and then we're going to get a price we're going to choose a price between 0 and 299 for the fulfillment fee and the cost of this unit will be 5 and 12 between 5 and 12.99 so it's going to fill in all that data then i'm going to create 5 000 orders you can create as many orders as you want of course 5 000 orders and i'm going to use a combination of faker stuff and my stuff so we're going to choose an order type between confirmed in other words they bought it but shipped and delivered the purchase date anytime in the last 30 days ship that if we happen to have chosen a shipped or delivered then we're going to add a purchase we're going to ship at three days after the purchase date estimated arrival date if we've shipped then we're going to four days after the shipping date we'll put a our estimated arrival date confirmed at um purchased eight minus an hour lazy and then some fulfillment centers uh you know there's a certain company that has fulfillment centers all around and they have three letter things so i'm just gonna i just created some googly three things there and we'll pick one of those this is so later inside of uh elasticsearch we can you know group by fulfillment center just to get some nice looking graphs or whatever like that so then i'm going to create the purchase order i've with an order number i'm using a 388 order number thing which is kind of like the way that a company named after river works purchase date sales channel i'm going to just pick a random from those just i'm creating dummy data all right this is a good way to create dummy data um fast and easy you may have a better way and maybe there's a more accepted way but this way works and finally we need to create some line items in there so for all my purchase orders i'm going to go through and then i'm just going to choose a product at random and i'm going to using the num item shipped that i created up here which is between zero and one and uh num items unshipped between zero and one or not between zero or one do that again it's just it's just dummy data right and then another price there between 12 cents and 99 cents for a platform fee the tax the price which is how many units times the price of the item yeah there we go so all of this will create 25 items it'll create i'm going to format this a little better uh 25 items 5 000 orders and then 5 000 uh order items i didn't bother to do multiple order items which is just a demo demo guys give me a break okay so we've created the seeds so now we can say rails db seed this will take a few minutes if we want to see what's going on we could pop over into another window into a console and we could say uh product order dot count purchase order cup purchase order this is how i kill time by mistyping things so you can see this number is going up so that's fine if we look at an item you can see we've created the items and durable aluminum pants uh it's got a nickname of incredible promos it's got a sku got some cool price here of 3137 so on i haven't done the right thing with currency but no it doesn't matter there we've got 5000 purchase orders and then if we look at order item which is the line item uh [Music] see what i missed see what i missed there in purchase order has many order items dependent destroy so when the purchase order gets destroyed the order items get destroyed so rather than yeah you know what i guess i can show you another one truncate all here's another cool rails db command truncate all i think it's only on rail six wipes out our database and resets all the counters so let's just let's start over even though well that means we'll lose durable aluminum pants we're instead going to get an aerodynamic marble coat it's going to whip through the purchase orders it doesn't take long i could say something clever here but i don't know what to say you know i even had that written down that i needed to add that line in there so we've almost got all the purchase orders so we haven't even started in on the elasticsearch stuff yet sorry to say just check my camera is still working looks good okay it's got all the purchase orders it looks like it's uh yeah whipping through the order items so that's all good in a second what i'm going to do here why not let's fire up the rails and let's take a look at localhost ah let's not do that why don't we add a root uh root a root to the roots we can do that now let's take a look at localhost let's type it right and then we should end up on a page with five thousand there we go five thousand uh things looks like i did payment method not quite right but uh there we go we've got all of our things i haven't you know um did i do a scaffold no i didn't do a scaffold on order items so we can't go and look at an order item but anyway we've got all that data in there i think we're all done we're done so now let's talk i can put an index down there and let's talk about elasticsearch that's what we're here for how do we get elasticsearch running we now have a mysql database that has all of that data in it nothing in our elastic search at all elasticsearch hasn't touched anything so what can we do first of all we go to our model we include elastic search model and if you want you can include the callbacks and the callbacks just save you from writing an extra line model callbacks and what this will do is every time your object is updated deleted or created it will call out to elasticsearch and put this in the database funnily enough that's all we need to do to get an exact replica of our data in the in the elasticsearch database now note i say a la exact replica so that's what you want fine that's what that's what you can do let's reload this let's choose a purchase order i really wish it wasn't purple okay so we have a purchase order now we can say purchase order we just say touch we can see that it is actually put it 9200 is the elastic search port it's actually put it it's created a index inside of [Music] of elasticsearch for kind of to compare to my sql it's created a database inside of elasticsearch called or table i should say inside of elasticsearch called purchase orders and it's chucked this exact thing in there it has no knowledge of the associated record it's just a one-to-one of our data now if we go over to something called kibana camana is like a elasticsearch front end we can create a index pattern for purchase orders we can choose a default time of purchase date and if we look okay there we have everything you know is in there so that's that easy if all you wanted to do is replicate your info into elasticsearch it's done right exactly like that no more work needed and if you wanted to pull this i we've got one in here id of 1410 if we wanted to pull it out of elasticsearch we can just say purchase order search which is the elastic when you add elastic search model it adds these these uh methods in you can use the query match you can tell i've done this before id was it 1410 and better thing just to save time here results first this object now excuse me is out of elasticsearch and you can see it took one millisecond and this is it it's the es purchase order and we can look at anything confirmed that is that date but it's a strain of course so what's the drawback here you know however fast this was which was probably pretty fast because i'm talking to my online 30 minutes there uh created the rails app created the dummy data you know scout created my objects created the dummy data put it in have it into elasticsearch great but it's not perfect by any means and i wouldn't use it like this because first of all everything's a string and second of all you're doing is replicating the data in my sql which is fine but it doesn't really give us much though um you know maybe there are instances where you've got everything denormalized already and you pull it out so let's let's ditch this and okay we can go to when you have elasticsearch model installed so we can go to purchase order you get an accessor directly to the connection which is underscore underscore elastic search underscore underscore and we can delete index and it is going to blast that index and all the data we don't need it and i'll show you why what we want to do is actually format our data okay and we do that by using creating some settings for what the index is going to be and there's no way i'm going to type all of this but i'll give you the start of it number of shards because all of this doesn't change you always do it the same here do and mappings mapping dynamic we don't want these things to change we want to set exactly how we want our fields to be shown in or stored in elasticsearch so first what i'm going to do is again yes i prepared this earlier these are all the indexes so this is essentially what it does by default except we are adding a type so we're saying that the id is of a type integer because if we go over here and look at what came out we can see um where's bloody id id is a string that's why it's got quotes around it right if i said i might have deleted that by now but you can see it's a string because it changes color if i say to i so yes you can see that it's a string so what we can do in these dynamic mappings is say okay this is of type integer for strings if you don't do anything like tracking number okay it's just a string but for status sales channel payment method though i see i've got a problem there payment method carrier like who's shipping the thing we say type keyword and that does some optimization inside of elasticsearch so we can then cluster those in cabana if we want to create some beautiful looking graphs in cabana or whatever you use keyword but for me the most important well i guess the integer is pretty important that it's actually now a number dates um we don't need this this is just a note to myself if we say format date optional time that will give us the proper i should put that in there the proper formatting for a rails string of a date time which comes out looking like this which translates to that which in elasticsearch looks like that so optional time so yes integers dates with a formatting keywords integer integer half float so for all of the float types the prices and the money and everything like that you can use half float or float if you have something that's really massive and for strings that will not need to have anything special they're just just put them in there as indexed so that will set that up now let's reef reload and let's grab a purchase order and let's update it just purchase or touch and now if you look at this thing that's gone into the database you can see the id is a number it doesn't have strings it doesn't have quotes around it the order total is a number or a float so let's add some more goodies onto this thing so for instance we would would like to know some stuff about the underlying order items so let's say what skus are in this order all the skus a list of the skus what item ids because maybe we want to do links on the page what item ids are in this order how many unique products how many were ordered how many were shipped how many are unshipped so we can delete the index create the index pick a purchase order and touch that now we go back i need to go back and recreate the index pattern refresh this and we should have something called skus no because i need to rebuild it not refresh it and we discover now if we look in here they're not there now why aren't they there because it's all well and good man i'm really screwing this up once we add custom fields we need to tell it to use those fields when we save it so i always write a separate object called a purchase like a denormalizer and i'm just going to create a new one here purchase order d normalizer.rb class okay now what will happen is that this when we deserialize it will call this object the reason i create a separate object is because it can get quite huge so let's reload uh let's for the sake of ease we're going to delete our index and create our index let's pick one and let's save it now there we go so we can see it's trying to use the denormalizer and when we create a denormalizer we pass in the object that we want to update so we have an initialize of the purchase order and why don't we create a reader purchase order and we set it here all right let's reload this yeah so now what we need to do is create a hash a method on the denormalizer that we'll put it out but let's talk about some other fields that we would maybe want to have on here first of course the other thing we're going to want to know the currency and the currency is set right now on each line item but currency can only be one currency per order we happen to know that and we have all these order items but we don't want to create another index we want to put everything together that's the whole point here so we can create an index order items which is those order items and it's nested underneath so when we pull back we will get a nested hash which contains the item id and all those other fields that are on the on the order info so we now have one object and i'm going to quickly show you how i do this in my denormalizers i'm going to have the code available i hope for download and that you can take a look at it so i create a two hash because that's what's going to be called um and as we well you can see that it's called here two hash and then we have all of these keys that we want to pass and these are all the indexes that we've created in here in elasticsearch so we create all these indexes and then i just do i use essentially method missing which is the you know it will call that method on the object and then you can simply he says create all the um the methods that simply return that thing from the purchase order because normally they're on the purchase order but not always because some of these are kind of calculated like skus is a list so we go through all the order items and pull out the sku we go through all the item ids employed a list of all the item ids how many unique item ids do we have how many were ordered so we go through the order items and sum them so now they're all going to be there on the object uh on the record inside of of uh elasticsearch now what you're saying is okay we had this other thing that was the nested info so here's what i do for this it will only call the top level it won't call methods for underneath so it's going to call order items info because that's what i called it here order items info and i'm going to go through all of the order items on the purchase order and i'm going to create a calculate a fulfillment fee which is the again kind of nested inside the order item knows about the item and on the item is where we set the fulfillment fee and the unit cost so we set the fulfillment fee times how many were ordered and that's going to be our fulfillment final fulfillment fee so on like that and i think we even have total costs maybe i didn't do it but total cost would be you know the number of items ordered times the cost of the item so we fill in all of these things now let's delete the index again there's ways you can do without having to delete the index but for the sake of an easy use let's just do that let's just pick one let's touch it and yeah mama purchase order eight did i do something not quite right here oh bloody hell reload that just start over there we go now let's go and take a look again let's update the thing i'm going to just delete it deleting this index pattern from cabana doesn't delete any data it's simply the thing that kibana sees it's a bit silly that you need to have it twice over but anyway there you go now let's go over to discover now we have our object and here you can see the order items info are is nested and we've pulled out a list of skus and there which is the skew from there yeah so uh there should be an item id item id is three that's the item id from there so there we've got all of our all of our data denormalized onto one row that's a lot to take in man i went a mile a minute on this and i feel like i almost i should do it again but i hope you get the idea so now we could do something let's now we can we're satisfied we've got the data in and the structure that we want to do we can say purchase order purchase order import and this is going to import all 5000 records into elasticsearch you can see it's going off and not caching anything so there we go now if we go and take a look there we've got 5000 orders in elasticsearch and right now it's showing it because we set a primary kind of sort if you want as index date so you can see that it's gone back a month and done these we say by day i don't need to oh this is this auto and let's just say what i knew i didn't last 30 hours 30 days right so we can go there you can see things are pretty evenly distributed and then you can get into cabana i mean cabana is just a you know it's just a tool for you so you could do something like a pie chart you can search purchase orders and the account on the bucket is terms let's say a fulfillment center so now we can see how many things were from each fulfillment center it should be pretty even because the way we did it so now you know you're getting an idea of the power of um of elastic search is our item ids it's oh it's only choosing the top five so we change this to the top 50. you can see these are all the how many of this one was ordered 23 of that you know 191 190 yeah it's all pretty evenly distributed so there you go that's my intro to elasticsearch very fast maybe very confusing should i do it again probably will i probably not and i'm going to do another one which goes on and talks about non-active record models you don't have to have your thing tied to active record at all you know it could be completely independent and how you could you know use this from the front end to search and all the rest of that stuff i'm going to try to do another one about elasticsearch if there's any kind of interest thanks for watching hope it wasn't too fast or confusing leave me a note comment of course subscribe and like hit the bell notification because i'm you know trying to bring you guys some good stuff i'm just trying to show you the things that i use every day as a rails developer talk to you soon bye for now
Info
Channel: Phil Smy
Views: 2,887
Rating: undefined out of 5
Keywords: ruby on rails, learn ruby on rails, software as a service, saas business, how to create the perfect saas business, tech entrepreneurs, entrepreneur, become an entrepreneur, become an entrepreneur motivational video, be an entrepreneur, elasticsearch, ruby on rails tutorial, ruby (programming language), ruby on rails crash course
Id: 6i62s2v_2Og
Channel Id: undefined
Length: 46min 41sec (2801 seconds)
Published: Wed Mar 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.