Elasticsearch Tutorial for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so what is elasticsearch elasticsearch is a distributed open source search and analytics engine for all types of data including textual numerical geospatial structured and unstructured data elastic search is built on apache lucine and was first released in 2010. elasticsearch is the central component of the elastic stack a set of open source tools for data ingestion enrichment storage analysis and visualization it is known for its simple rest apis distributed nature speed and scalability it is commonly referred to as the elk stack which includes elastic search lock stash and kibana we'll talk about logstash and kibana later let's actually see what elasticsearch means in very simple terms so elasticsearch is a database that stores retrieves and manages document oriented and semi-structured data when you use elasticsearch you store data in json document form then you query them for retrieval so basically elasticsearch is just a database so let's see why we need to use elasticsearch products that involve e-commerce and search engines with used databases are facing issues including product information retrieval taking too long this leads to poor user experience and in turn turns off potential customers there is also a lag in search is attributed to the relational database used for the product design where the data is scattered among multiple tables and successful retrieval of meaningful user information requires fetching the data from these tables now the relational database works comparatively slowly when it comes to use data and fetching search results to database queries understandably businesses nowadays are looking for data storage alternatives in the hope of promoting quick retrieval this can be achieved by adopting no sql rather than rdbms or relative database management system sorry relational database management system for storing data elasticsearch is one such nosql distributed database the speed and scalability of elasticsearch and its ability to index many types of content mean that it can be used for many cases some of them are application search website search enterprise search logging and data analytics business analytics and much more so how does elasticsearch work so raw data flows into elasticsearch from a variety of resources including logs systems metrics and web applications data engine is the process by which this raw data is parsed normalized and enriched before it is indexed in elasticsearch once indexed in elasticsearch users can run complex queries against the data and use aggregations to retrieve complex summaries of their data from kibana users can create powerful visualizations for their data share dashboards and manage the elastic stack so kibana is used for data visualization and it also provides a very handy search bar which can be used to search your database in elasticsearch now this diagram shows how a query is executed for retrieval of information behind the scenes in elasticsearch after indexing the data into elasticsearch the user writes a query to fetch some data in the diagram the user wants to join to check if there is anything in data that is indexed which matches the person attributed to jack after this the search api is used to understand the query and the search and search for the information in the index and through the same api the result is returned back to the user in json format so the query goes to the api and then that api looks for the query or you look for the match of the person jack in the index which contains some shards and shards we'll talk about in upcoming slides and then the result is sent back from the same node to a response which is also in json format now let's look at some basic concepts of elastic search first is cluster a cluster is a collection of one or more servers that together hold entire data and give federated in indexing and search capabilities across all servers for relational databases the node is db intense they can be n nodes with the same cluster name next is near real time this is one of the popular features of elasticsearch elasticsearch is a near real-time search platform and there is a slide from the time you index the document until it actually becomes searchable next is the index the index is a collection of documents that have similar characteristics next comes node a node is a single server that holds some data and participates on the cluster indexing and querying a node can be configured to join a specific cluster by the by the particular cluster name a single cluster can have as many nodes as we want a node is simply one elastic search instance next concept is shards a shard is a subset of documents of an index so an index can be divided into many shards and this we had seen in the previous slide about the diagram where an index is divided into shard 1 to shard n now in this video we'll actually set up elasticsearch and kibana on a macbook air laptop now the process might be different as just a little bit different for windows and mac but after one point the process is the same for both so let's start now the best way to learn about elasticsearch or to set up elasticsearch is to go through the elasticsearch documentation you can also learn about it through the documentation as well and you don't need to go through some external tutorials for this so even in this video we'll actually go through the documentation and learn from it and actually set it up from scratch so first let's download elasticsearch and for that i'll put the link to this website in the description and what we need to do is we need to run elasticsearch locally on either linux macos or windows so i'm currently using a macbook air so i'll download the macbook air zip file and once this is downloaded you have to extract it so you don't have to use these commands but you can directly just double click on it and then extract the folder and now let's go and check the folder in our downloads directory so here we have elasticsearch after extracting it elastic 7.5.1 the newest version and after this we need to download something called as kibana so kibana is a data visualization tool which you can use to visualize the data which you have in elasticsearch so once we set up elasticsearch we only need that to actually put in the data but if we have kibana it will be easy to visualize the kind of data which we are putting into elasticsearch and also to visualize the data properly so let's download kibana now again to download kibana i'll put the link uh in the description so that you can check it out for yourself and i'll be downloading the mac version this is also a zip file which you extract to a folder and then you can run kibana through a folder so i'll not show the downloading process here but you can do that for yourself so again we go through downloads to find kibana and we have it here with the folder now one thing before we jump on to actually setting up elasticsearch to have elasticsearch and kibana running on your desktop you need to have the most updated java jdk version in your laptop or your pc so make sure that if you do not have java jdk in your computer uh please install that first i will not be showing that process here so make sure that you install java jdk first the newest version or update your java jdk so that you can actually run elasticsearch and kibana on your computer because if you don't have the updated version or the jdk then you will bump it you will come up with some errors in future which you will not be able to actually debug because it will be really hidden that the jdk is not required and i had gone through this problem when i was actually setting up a elastic search for my own laptop and i had to update my jdk so just make sure that your java jdk has been updated or installed to the newest version okay now let's go and start setting up elasticsearch so now we need to actually open up terminal you can just open up terminal from here i use a different terminal app called as item for mac so you can do that if you are on windows you can open command prompt and what you need to do first is go to the place where you have uh the elasticsearch and cabana folders ready so as you can see i had it in downloads here so i'm i'll be going through my downloads folder and let's go into elasticsearch first and now we need to go inside bin so you can see bin here the binary folder we need to go inside bin and then we need to run an elastic search so here as you can see you see a lot of elastic search modules but we need to only run the one which only says elasticsearch so to run it what you can do is just type dot slash elastic search and that is enough to actually run elastic search on your laptop now this might take some time depending upon the ram you have in your laptop it might change but should be done it should not take more than a minute or more than two minutes let's see how much time it takes for me so first it do you see a lot of things here so just ignore them for now unless you run into some error and this process actually closes so it's currently loading some plugins but there were no plugins found and then it was trying to actually set up a local server so you will see where we can actually see our elasticsearch server running and then you have the cluster so we talked about cluster in the previous video the link is in the description and the health is now from red to yellow and it does not show yellow to green so we'll see that now now to actually see that you have your server running of elasticsearch go to the localhost 9200 port so 127.0.0.1 colon 9200 so on your localhost the 9200 port is for elasticsearch and let's click enter as you can see we get some json data here uh you can see the raw data raw json data here but since i'm using firefox you get a pretty version of it and the tagline is you know for search but this is not enough actually to see if your elastic search is running properly or not or if the health is actually green or not so we'll actually see a different way of doing this but let's just say that this is running properly for now let's actually set up kibana so open up a new terminal or just open up a new tab in your terminal and let's go to downloads again where we have kibana now installed the folder uh let's just find it yes we have it here and let's go into the kibana folder and again the same process for kibana as this as it was for elasticsearch go inside the binary folder bin and again dot slash just type kibana and that should be more than enough now i think kibana takes up more time to run than elasticsearch at least on my computer so let's wait for it i don't want to pause and then sorry play the video again because i want to see if some errors come up or not until then we can just check if elasticsearch is running properly or not so 9200 volt is fine still running it has the tagline it has all the builds uh the name of my computer cluster name cluster id and let's see okay so there were no errors there's a lot of information here don't worry about that just make sure that the status for kibana changes from yellow to green because kibana has to be green when it runs only then it will be running properly if the status is only yellow then you might come up with some errors and the server here is running at localhost port 5601 so elasticsearch was running on 9200 and kibana is on 5601 let's go and actually run it and see what we get so kibana already gives you a lot of uh a proper website a proper portal where you can actually do a lot of stuff with your data you can add sample data update data use elasticsearch data so when you uh simultaneously run the elasticsearch server and the kibana server the kibana server automatically tracks the data which you upload or manipulate on elasticsearch so you don't have to have a way to connect your uh elasticsearch server with kibana it automatically takes care of that now let's see what we have to do next on the elasticsearch documentation so we have downloaded it we have extracted the folders we have also started elasticsearch from bin so the same thing which i did in documentation and it's the same in windows as well so the process is same from now on now we don't have to start two more instances because it's just a video to show how to set up elasticsearch so we will not be uh starting two more instances or two more servers and what we need to do is we need to do this step to verify that the clusters is running so for that we need to uh copy this curl command so it's a get api of elasticsearch where you can get some information from the elasticsearch server so now a very tiresome or cumbersome way to do this is to actually copy this as curl and then run it on your terminal but there is a better way of actually running these commands because you will be doing this all the time with elasticsearch you will be actually using a lot of apis to upload your data or to remove it delete it change it so there's a better way of actually using these commands not on the terminal but on kibana so copy this as curl go to your kibana server and click on this tool or developer tools and you get a very nice console here which will act as your terminal here and you can just paste the curl command and then click this and as you can see here you get all the information here so my status is yellow i think that's because i have some minor problems in my server but this should be green for you if you follow all the steps for now i think i had made some changes to my server that is why it's yellow but it should be green for you if it's not then you have to actually make sure that it is green and i'll do that later so this shows if your health of your elastic subserver is good or not and this has to be green make sure that it's green it's yellow for me but for you it has to be green and that is how you check whether your elastic server is running or not and let's see what's next so now we have actually finished this page of documentation and let's actually move on to indexing some documents or adding some data to our elasticsearch server so since it's a no has no sql database elasticsearch is a new sql database let's actually put up some json documents inside the index and see how it actually shows so again we have to use the put api to actually put some data into the elasticsearch server so here we will use the same examples as in the documentation and see how this works so again we copy this as coil because that's the easier way of doing it go to the console and paste it and then click enter and as you can see we get this json which is which has some metadata about the function or the command we just had run so here we have and we have made an index customer as you can see here and inside that we put a document of id one so we gave it a gave it an id of our own if not elasticsearch will all give it an id by itself and it has to be pretty pretty printed so that you can see it properly and as you can see the result is created and you have two shards one successful nothing failed and now let's actually get this back from the server so now that we have put something here let's just see if we can uh extract it back from the server or not and for that we have the get api so the put api is used to actually put some data into elasticsearch and get is to get some data back so the process is the same again we copy it as girl go to kibana paste it and as you can see uh the apis are pretty common uh pretty similar we just put we just change put to get and we get customer and the doc one so it's pretty intuitive as well and we just make sure that it's pretty printed so that's easy to read and let's click this and yeah as you can see we get the metadata first and then the source so the metadata is that the index is customer the type is a document and the id is one and then we found the document which is one so found is true and the source is that the name is john doe so this is how you actually put some data into elasticsearch and visualize that or see that using kibana and after that what you need to do is create an index pattern so we will see this and how to actually upload bulk data in the next video because i think this is enough for one video i don't want to extend it to a very long video so this was a very simple introduction about how to actually install elastics version kibana on your desktop and make sure that you comment down all the errors which you get and i'll try to solve all those errors in the comment section as well so before you uh actually comment make sure that you have the most updated version of java jdk java development kit and after that use these websites which are in the description the links in the description to install elasticsearch and kibana and then follow the process which i just did and you'll be able to see it perfectly in the next video we'll actually uh upload bulk data so in real life situations we don't actually uh upload single files right we upload bulk data to your elasticsearch server we had installed elasticsearch and kibana on a mac and we also talked about the procedure to actually run elasticsearch in kibana on windows or a mac in this video we'll talk about the rest apis which are provided by elasticsearch there are four and also talk also get into uh how searching works in elasticsearch so we'll see the basics of searching as well now just to have a recap of what we had done in the previous video i'll run elasticsearch again from my terminal and then also show how to actually add some data or index some data into our elastic search index so let's get started first we run elastic so you go inside the folder and then go inside the binary folder again and use dot slash elasticsearch to run it now i could skip this part where i wait for elasticsearch to actually get the server running but if there are any errors that come up i'd like to solve those errors real time just to make sure that you also don't get the same errors and if you do you know how to solve them so let's just wait and hope that elasticsearch server runs properly i think it looks good the health status has changed from red to yellow so that's good let's wait and yeah let's test it so uh local host port 9200 is where your elasticsearch server is going to be there so let's refresh this and yes uh you as you can see elasticsearch is up and running now we need to run kibana so we have talked about q1 on the previous video kibana is a data visualization tool for elastic search so the data which you index an elastic search can be visualized and it can do a lot of other features as well on kibana and to run kibana we have a similar procedure so get inside the folder go inside bin and then run kibana i think kibana is a bit faster than elasticsearch when it comes to setting up the server i think because elasticsearch has to load everything which we had indexed back so it has some data in it it also has to manage that data but since kibana is just an abstraction or a visualization tool for elasticsearch it takes a bit less time so as you can see it has begun and uh as you can see the stages has been changed from yellow to green so we are ready and yeah before we go as you can see the server is running at localhost port 5601 so we check that port now just refresh this and yes so kibana is also up and running now we had talked about this little hack in the previous video where instead of copying every command as a curl and running it on a command prompt terminal we can use something called as the developer tools in kibana and use that to actually run our code or add some data to elasticsearch it is a very good way to actually see the output in real time as it happens and also debug your code when you actually add something to elasticsearch so yeah let's get started so in the previous video we were here so we were trying to index some documents or do elasticsearch so let's let's do that again so we had used put so put is an api uh index api we'll talk about apis in a bit reduce that to actually add something or add some data to elasticsearch let's use post so put and post can be used in either a situation uh but we will not get into the depths of what is the difference because we can use both to add something or index something in an elastic search so let's do that let's copy it as curl and just paste it here it's that easy now let's make it as the sixth document and let the name be i don't know uh elastic search that's the best i would think of wow and it's awesome so we use post here uh put here now let's use post and make it seven and let the name be elastic search again and as you can see it was created primary term 7 sequence 6. so it starts from 0 to 6. awesome now we know how to index some documents let's actually talk about the different kinds of rest apis which elasticsearch provides so there are four major rest apis which are provided by elasticsearch the most important is the index api or how you actually add something to elasticsearch so it helps to add or update the json document in an index when a request is made to that respective index with specific mapping so mapping is another property in our research or another concept which we'll talk about in the next video mapping is very important to know how to structure data and to add some conditions i add some boundaries to your data for example if i only want an integer where the where there is an h parameter i can use mapping to do that so we'll talk about mapping in the next video but for now we'll talk about the rest apis and we'll get into the basics of searching so we have the index api which can be used to add some json document into elastic search next is the get api so once we have some data in elasticsearch we have to get that back right we have to see what we have inside our index so for that we use the get api so this api helps to extract type json object by performing a get request for a particular document so as you can see the elastic search provides you very simple rest apis which you can run in your browser to get some data back or to put some data into it you can also use a software like postman where you can simulate the apis get and post apis or you can use the dev tools by kibana to do the same thing again all of them do it uh do the same thing but i think the console or the dev tools controlled by kibana is the best way to do it because it helps you see the output right in front of you and there's nothing else other than your query and the output when you use postman there's a lot of other things which intimidate you or confuse you as because it did to me and if you use your browser it is really not the best interface for that so use the kibana dev tools when you want to write a query and see the result instantly okay next is the delete api so if you want to delete a particular index or delete some mapping delete the document you can do that by sending the delete request to elasticsearch and lastly is update so if you want to update some documents which you're previously indexed into elasticsearch you can do that using the update api now today we'll talk about the first two apis we'll talk about index and get so we just saw how index api works to add something or to update a json document and let's use the get api to actually get some results back and also use the get api to know how search works so basically in elasticsearch you want to search something so that means you want to see the results of that search right so if i want to search for elasticsearch or search for programming knowledge i'll see or i'll get some results back right so that is why we use the get api to perform searching in elasticsearch so let's see that in action let me just close this so let's actually just replace post with get and remove this and let's see if we can get the seven document again yes so we get the seven document let's see if there's a six yes is there a two yes there is awesome so that is how we get uh the documents back and now let's see how search works so again uh to learn anything about elasticsearch the documentation is the place to go they have an amazing documentation and they teach everything from scratch so as you can see we can use the get api to search for something and i will not use this example because we don't have a bank index right now we have a customer index with name so let's see something which uh goes good with name or to search for names so this looks good we have a get bank search and it matches the address for bank so similarly let's search for something which matches a name so let's copy this as coil go back here remove this and put this on so let me just tell you what is happening here so we use the get api again and inside the get api we have the underscore search api so the search api is used by the get rest api and instead of bank we have customer because we are using the customer index here and this is the syntax of how you write a query so you have two query parentheses here and you have query inside it so everything works as a json document so a key value pair so it's very easy to read and very intuitive to understand what's happening here so you can just read this in simple english so i have a query and i want to match something and that is the address to this particular name so if this was a bank i would this query would match everything which had mill lane in it every address which had mill in it but since we have customer here and here we have name let's actually see if we can find something so let's find elastic search and see if that exists in our database okay so uh it did not time out so timer is false that means we found something successful is one awesome uh hits so hits is the number of elements which you find which have the name as elasticsearch so we have two we have the index six which is elasticsearch and seven as elasticsearch again so it also search for the substring elasticsearch in this entire string so that is how good the query is and the search api is so yeah this is how you can actually search in elasticsearch logstash is a tool based on the filter pipes patterns for gathering processing and generating the logs or events it helps in centralizing and making real-time analysis of logs and events from different sources log stars is written on jruby programming language that runs on a java virtual machine hence you can run logstash on different platforms it collects different types of data like logs packets events transactions timestamps etc from almost every type of source the data source can be social data ecommerce data news financial data iot devices mobile devices etc so logstas is a plugin based data collection and processing engine it comes with a wide range of plugins that make it possible with easily configured to collect process and forward data in many different architectures processing is organized into one or more pipelines in each pipeline one or more input plugins receive or correct data that is then placed on an internal queue this is by default small and held in memory but can be configured to be larger and persisted on disk in order to improve reliability and resiliency so let's talk about some general features log stack logstash can collect data from different sources and send to multiple destinations it can also handle multiple http request and respond data logstash can handle all types of logging data as we discussed and some more include apache logs windows event logs data or network protocols data from channel input and more so it can work on all your operating systems and can deal with any type of data which you want it to handle logstash also provides a variety of filters which helps the user to find more meaning in the data by passing and transforming it next lockscratch can also be used for handling census data in iot let's talk about some key concepts when we jump into logstash so the first is the event object it is the main object in logsdash and it encapsulate encapsulates the data flow in the logstash pipeline so logstast uses this object to store the input data and add extra fields created during the filter stage lobsters also offers an event a pair of developers to manipulate such events next is the pipeline it comprises of data flow stages in logstash from input to output the input data is entered in the pipeline and is stored in the form of an event then sends an output destination in the user or end system desirable format we saw the diagram of a pipeline in the previous slides and that is the definition right here so let's talk about what is inside the pipeline so the first stage of the pipeline is the input which is used to get the data in log stage for further processing logstash offers various plugins to get data from different platforms some of the most commonly used platforms are file systems redus syslog and beats the middle stage of logstash pipeline is the filter where the actual processing of events takes place a developer can use predefined regex patterns from logs to create sequences for differentiating between fields in the events and criteria of accepted events the last stage in the log stash pipeline is output where the output events can be formatted into the structure required by the destination systems it sends the output event after complete processing to destination by using plugins some of the most commonly used plugins are elastic search file graphite statistic etc what we're concerned about in this today is this tutorial is elasticsearch how we can use logstash to actually put divi input files into elasticsearch let's talk about the advantage of logstash logstress offers patent sequences to identify and parse the various fields in an input event logsha supports a variety of web servers and data sources for extracting logging data it also provides multiple plugins to pass and then transform the logging data into any desirable format and in this case is going to be json it is a centralized software which makes it easy to process and collect data from different servers as we talked about logs also uses the http protocol which enables a user to upgrade elastic search versions without having to upgrade log stash in a lock step there are a lot of disadvantages with lock slash but i am mainly concerned with these two so working with log stash can sometimes be a little complex as it needs a very good understanding and an analysis of the input logging data so as a beginner it can be very hard or intimidating to actually get into log stash and we'll talk about some workarounds for this we'll talk about a tool which i have developed which you can use to index files into elasticsearch a lot of files without using log stash but at some point when you are scaling up you have to stop making your own tools and given to logstash and that is when lockstein lockstash can be a bit too complex but once you get the hang of it it becomes very easy to use it next the filter plugins are not generic so you have to find a method so that you are able to correctly sequence the patterns and avoid errors when you're passing files into log slash and then open it to something else so yeah that was a very simple introduction of what logsdash status now we can go ahead and install logstash and also stash our first event so basically we are going to perform a hello world of logs now so the best way to learn about logstash is the documentation so any elk stack elasticsearch logsdash or kibana can be perfectly learned and understood with the documentation itself so during these tutorials i'll also refer to documentation every single time and use some pieces of code from the documentation because they explain the concept very easily so uh the link to all these websites will be in the description below so you can check them out and start from right there there'll also be a very good introduction to logstash a very good block and some other blog references which i have used to prepare for this tutorial okay so just like elasticsearch the installation process is going to be the same as we discussed for elasticsearch and kibana so the first step is to check whether your java version is right or not so make sure that you have java installed and once you run this in your command prompt terminal something like this should show up so there are some extra cases for little lag systems as well so make sure you read them before you jump into any downloading step you can also install a binary and go ahead with that i think that is what i did for this tutorial i have a macbook air so i downloaded the tag dot gz file and directly place that into my folder then there are some other ways of doing it you can find the way which is comfortable with you people who use mac os and you use the homegroup packet manager they can also use these two steps which i think are the most easiest and simplest step to download a lock touch on a mac os and start working with it so once you do it let me show you how it looks like so you will have your logs touch folder in your downloads uh folder again but i have dragged it to my home folder because i have elasticsearch cabana unlocked at the same place so i can use them together in the same directory then once you go inside you have a binary file in which you have all the logs you know executables and you can use these to start lock touch so we'll see how that happens in a bit so yeah make sure that you have it is recommended that you have your elastic search server kibana and log stack folders at this in the same directory or at the same level so that it's easier for you to uh manipulate them at the same level and also use them side by side so that is why i make sure that all my folders for the elk stack are in the same level so once that is done let's see how we can start our first event so again i'll be using the documentation because it has a very handy documentation which you can use all the time to actually understand how logstash works so as we discussed previously a log stretch pipeline has two required elements the input and the output and an optional element called as the filter they include the input plugins consume data from a source the filters modify the data and the output writes to the destination and here the destination is elasticsearch so we're going to perform an event we're going to write something in the standard input and use that and check if that log starts actually logged that event or not so to test the log site uh installation we run the most basic lock touch pipeline so this is the most basic one so let's do this let's copy the four line of code and we cd inside the folder so we are inside log stash now let me increase the size so that it's clear yeah i think this is good yeah so let's do that again vcd into log stash then the next is to perform this line of code so we go inside bin lock stash e and write this piece of configuration so while this sets up i think it is currently setting up so we need to wait for that the e flag here enables you to specify your configuration directly from the command line so that is what the use of this is the e flag here is that you can directly specify a command on the terminal or else you have to always make a different file make a different configuration file and then extract the commands from that file okay so now we can uh test things very easily because of this or the pipeline in the example takes input function and input so we'll be taking input from stdin and also moves that to channel output so std out is the standard output here and it has a structured format so once we see pipeline main started we can then enter what we want to enter so as you can see it's now currently configuring so we need to wait for this we need to make sure that there are no errors while log size is building up so once you see successfully started log stretch api endpoint we can start running our hello world example so let's start with hello world so let's say hello world and as you can see we have a timestamp we have a host my name and the message which we just logged so now you can uh type as many examples as you want so hello monarch hello again and we as you can see a log search is logging every single element uh in this video we'll talk we'll talk more about elasticsearch and how search works in elasticsearch how you can figure out the basics and we'll also talk about how we can use multiple queries in elasticsearch to search for data efficiently so in the previous videos we have talked about how to download elasticsearch how to get it up and running we talked about how to set up kibana as well and how we can use kibana to actually look at the data in elasticsearch we also talked about this technique where we can actually run queries on kibana for elasticsearch so we'll talk about that again and we also index some data into elasticsearch so that we can see that it's working properly so in the previous video i talked about how the best way to learn elasticsearch is to go through documentation so i'll continue with the documentation as well and we'll look at the examples given in documentation and start learning from there again so this is where we had stopped in the previous video or we talked about how to index in documents uh there's also a video where we talked about how the apis work so elasticsearch works on the rest api we talked about different kind of apis which elasticsearch provides and that information was enough to actually get started with actually searching and actually working with the main purpose of elasticsearch and why it is preferred over other nosql databases so to start learning about how search works we need to index some data uh not all very less data but some data in bulk so we'll start with that and elasticsearch documentation helps you here as well so uh we have this data here called account account.json so let us see that for now so this is a lot of data which we can actually use or to search for so we'll be using this data as our example data and talk about how we can actually search for data efficiently inside the json files so when we have a lot of documents to index uh you can submit them in patches using the bulk api so we'll be using the bulk api right now or by elasticsearch and this actually makes the operation significantly faster when you're comparing that to individual uh files that have been operated to elasticsearch so if you have if you have 100 files you can use the bulk api to actually divide the 100 files into patches and send them rather than iteratively uploading every single file 400 iterations so the way you divide your patches of 100 depends on the document size and the complexity so as you can see here it also depends on detecting the search load and the resources available to your cluster so it also depends on the kind of memory you have in your computer so according to them according to the documentation a good place to start is with batches of thousand to five thousand and total payload between 5 and 15 mb you can see the payload information in the elasticsearch details so we'll talk about that later and also they wanted to experiment so that you can find the speed sweet spot for yourself so yeah let's get started let's uh start by uploading some documents using the bulk api so we'll be following these lines uh guidelines by elasticsearch so i have downloaded the account.json file from here so you can just copy all of this and uh paste them to a file which is named as account.js1 and this is how it looks so it has some bank details your number your balance first name last name age gender the address your employer email and so on so these are the two curl commands that we can use to actually upload the data on elasticsearch so as you can see first is an exposed or a post api where you post the console.json file to an index called as bank so it creates an index called as bank and it uses the bulk api here underscore bulk and then refreshes uh the server and then after that we use this command uh cat indices where we can actually display all the indexes which we have so we have since we have created a new index called as bank with this command we then run this command to see if it has actually been done upload or not and hopefully we should see this line here so don't worry about the health being yellow or green anything other than red is completely fine so if you have a health as yellow or green that's completely fine i think the server i am currently using uh for elasticsearch 7.6 uh the health is yellow i think we saw that in the previous discussion in the previous video so let's get started again go to your elastic make sure that your elastics are downloaded go inside the bind b folder and get it running so just make sure that the process it runs behind the scenes or all of these just make sure that you give a glance at it so that you don't get any errors right now so we have no plugins loaded as of now we'll be getting to this later let's see if the server i have runs smoothly so it has been started it has been published to the address localhost port 9300 and the cluster health has been changed from it right to yellow so i think we're good to go now we'll be using these commands uh call commands to actually index our pi data into elastic search so let's make a new open up a new terminal and as you can see uh the json file which i have is on the desktop so let's go to desktop and then let's copy the command the first command for now and press enter and if you can see here uh it was running it on the left and as you can see all of these json files have been uploaded now we can see that they haven't uploaded you can see the result has been created but we don't know how many failed and how many passed the uploading process so for that we use this so when we use this command as you can see here for bank the second one the health is yellow open the index has been created as bank it has its own unique id and the documents that were operated were 1000 and i think that is what the documentation also has so yeah now we're going to go we have our data into elasticsearch successfully uh second thing now is to actually run kibana so i forgot that initially so let's do it right now uh go to your cabanas folder make sure it's installed go to pin and the same way you run kibana is how you run elasticsearch let's wait for it to load so currently i was able to run the command using the command line because i have curl installed on my computer but if you don't then we had talked about a different way to actually use these commands it was through cabana so you need to have coil installed in a computer to use it from kibana as well but it's much easier to do it with prana because you can see the outputs clearly so let's see how that works now so yeah i think the server is ready and let me just look for the port where we have it so the server is running at localhost 5601 let's copy that and paste it here oh wow that just this yep i know i'm sorry i didn't want to search this and i think we're good to go yeah so now we have opened kibana and let's see how we can run our commands for elasticsearch via kibana so go to dev tools and you can see you have this really nice console here where you can actually uh give some commands and get the options directly so let's try uh what did we want first so we wanted a cat in this is this so let's copy that and paste it here and as you can see you get the directly here and as you can see the command also changed so we had a curl command and it directly got connected to the api so we have the get api and we have to uh cad displays the indices so that's how we can actually use the console by kibana to actually see our output directly so yeah now we are we have successfully indexed some data and now let's start searching so after this when you click on the next uh heading that is start searching we can actually see how they want us to search so what we need to remember before we get into this is that elasticsearch is a nosql database so everything is in a json format so you have key value pair and even when searching for it you use json format with the rest apis to actually perform the search and this actually makes it really intuitive to use because you can uh just for form these json files and read them as english so we'll just try it right now so without any context without knowing anything about what this is doing we can actually read and understand what's happening here so without learning about what the search api is you want to get something from bank the bank index you have a query okay so we don't know what query is right now and you have a query which says match all and there's nothing inside so it basically says match everything is what i'm being actually understanding from it and next you sort the account number in an ascending order so as you can see it's very simple to read so you the example states that it retains all documents in bank index sorted by the account number simple as that and you get the output directly so it's really easy to write these uh queries or when you compare to sql because you have to actually put your brain and actually sit for some time to actually generate these quick queries but here you can actually just write it the way you you know uh thought think about it in your mind so once you have ingested some data into the index you can search it by sending requests so you're sending a rest api request to underscore search endpoint now this is called as elastic search query dsl uh what we're going to do here so these are the basics for that so let's start with this so let's copy this and see if this works for us sorry yeah so it took 32 milliseconds uh you have a lot of things going on here so let's actually see what all of them mean so we have the same output this is what all of it means so the took says how long it took elasticsearch to run the query in milliseconds so it took us 32 milliseconds to do it then the next is timed out with the request of search timeboard or not so this might happen sometimes where uh elasticsearch is looking for something which doesn't have so it times out the request times out just how a simple uh a rest api works and then you have shards so we talked about shards in the first introduction to elasticsearch video where we talked about how uh elasticsearch works behind the scenes so it divides your document into shards and then each shard had its own number of documents and then it loops through all the shards to see if it has foreign document or not so the shard's uh key here uh shows how many shards were searched and a breakdown of how many shots are succeeded failed or was kept max score is something which is really important and we're going to talk about this intensively uh in the next a few set of videos uh it is a score of how uh of the most relevant document found so the score is null here because uh we didn't match anything so when we actually use the query uh capability then that is when we have a score where you give a score to each and every uh result you get so higher the score more the relevant document is for the query uh less the score the less relevant it is next uh hits the total dot value so how many matching documents were found hits that sort uh it sorts position hit score again the relevant score uh not double cable when using match all so we need to know that now the next thing which we need to know about the search function uh the search api is that each search request is self-contained so there is no uh information that is being maintained across these requests so each request is an individual request and there will not be uh any sort of information that connects any request before it or after it so that's how sdpa works so each rest api request has its own information and it's completely separate from the other ones so that's something which you need which we need to know because uh when it comes to sql that might not be the case but for no skill and for particularly the elastic search each request is self-contained now let's uh look at some other uh ways we can actually search for it so now uh to search through page hits or to search for the search and we have the condom messaging order but you just need uh everything from 10 through 19 so start from 10 or start from size 10 sorry start from 10 and only search until the next 10 files so 10 to 19 and we can run the same thing you can copy it as coil or view in console directly opens up this page so elasticsearch uh has this really handy single between console it opens up the exact same page and runs it for you but we know how to do that so we'll be doing this right now and as you can see it only showed you the top 10 uh starting from 10 all the variables so you have account number balance starting from 10 11 to 19. pretty simple to actually use it directly and now this is good because we are using the sort function from and size which are easy to learn but what we need to do is actually uh search for something internally search for a phrase search for a name or a number so that is where the query function comes into the picture so now uh let's use the query function match the address which is mel lane or mill or lane completely so this piece of function will match the address key the million value but when it only switches the substring so it contains either mill or lay so let's copy this again and see how this works as you can see it searches for all the addresses where it's milling so 198 million uh 990 mil road so it either it neither should have mill or lane but there might be a time where you need both mill lane to be present in the address and that is where you use match phrase yeah so now this will match the entire phase as you can see and only matches the addresses that contain the phrase middle name so let's run this now yeah as you can see there's only one address which has milling together in it and that is how it works so now these are very simple queries or these are single queries and they're easy to use but usually you have to construct complex queries just like sql and for that you can use something called as bool here so you can use bool to combine multiple queries criteria you can designate the criteria as must match uh should match or must not match so it's either required or it's desirable uh it can have it or it should not be present in your results so let's see an example here uh we need uh to search the bank index for accounts that belong to customers who are 40 years old but excludes anyone who lives in idaho or the ide of that place so it's a complex query so we need to use bool and the customers must be of age 40 so it must match the age as 40 but it should exclude or it should not match anyone who lives in id so it must not match people who live in the state of id so pretty simple pretty clear we'll actually uh write around queries in the next video i'm just going through the queries given by elasticsearch but in the next video we'll actually write around queries and see how easy it is to write them from scratch let's copy this and run but we know how it works as you can see uh it matched everybody from age 40 age 40 and none of them have the state id as so uh m-o-o-r v-a-m-t etc etc and yeah so use bool to create some complex queries here now let's take it a bit more complex and use something called as filter so it a filter affects whether or not document is included in the results but it does not contribute to how documents are scored so now first suppose uh you want a request that uses a range filter so you want to range between some files so for example you want people who are not underage but also were not senior citizens so you want people who have their starting age from 18 and they go up till 50 or 60. so that is where the range filter can come into the picture here they have used it to check the balances so if you want to get the results of accounts with the balance between 20 000 and 37 inclusive what you do is you have the range filter and you want to arrange the balance which has the amount and now we have something called as gpe and lt here so gte is greater than or equal to lte is less than or equal to so what this is doing is that match all the documents but filter them by range on the balance where the balance is greater than or equal to twenty thousand dollars and less than thirty thousand dollars so let's copy this and just make sure that it's running again it should be running and yeah as you can see uh all the hits here have the balance can i see the balance yeah which is greater than 20 000 and less than 30 and that is how you actually run or have a generate queries simple queries using elastic search in today's video we'll go through documentation now as i've been talking about in the previous videos that the elasticsearch documentation is the best way to learn about elasticsearch the documentations are really clear they are not complicated at all and they are straight to the point in this video we'll talk about query and filter context we'll understand what a query means and what the word filter or the keyword filter means in the elastic search queries and we'll also talk about what is the difference between them i will talk about relevant scores and how all of it matters and the next video will actually write some of our own queries so we'll actually form a situation where we have to use elasticsearch we'll form a situation where we want to use query and filter in context and the next video will actually write down some queries so that we can actually learn from them and see if they can be implemented on a real database so let's start so before we jump on to this let's just make sure that we have elasticsearch running and keyboard running so let me just go and do that right now i don't know if we need if we need to get it running for this video or not but let's just find out so click search and it's running and open a new tab so that we can run kibana as well let me just clear this go to kibana bin and kubana the same way we run elasticsearch so looks like elasticsearch is ready and now we are waiting for kibana to run so let's wait for that and yeah so elasticsearch is running right now at 9300 port and the cluster indices have been recovered from red to yellow so they're good to go you might not always be green your cluster health or sometimes it gets too yellow and there is a reason for that but since this is just a video to teach about query and filter we will not worry about the health of the clusters and yeah so even elasticsearch and kibana both are running let's just make sure that they are running on the browser as well so port 9300 if i'm lying 200 it's a publisher i'm sorry let's just make sure that it's running yes it is and let's just make sure that q1 is running as well so it is running at 5601 port of localhost once we get these running we can start with understanding what query and filter mean yes so let's come back here now in the previous video we talked about the basics of searching how we can search in elasticsearch we wrote we talked about some simple queries which were given in documentation and there was something called as a relevant score right when you get the output of an elastic search query so we have to talk about that because it's the most important thing which elasticsearch provides so by default elasticsearch uh you know matches the search results by something called as relevant scores so what the relevant score does is that it measures how well each document matches a query so we will understand how perfectly the document which is searching for fits the query and assigns the score to it so the relevant score is a positive floating point number and it's under the underscore score meta field of the search api so and the relation between uh the score and the relevance is proportional so the higher the score the more relevant the document and while each query type can calculate relevance code differently score calculation also depends on two important things so it depends on the query or the filter context so let's talk about query first in the elastic search context a query basically answers this question how well does this document match the query clause so how well does this document match what you're asking for so if i if i have a query which says that try to find all the documents which match the word uh python so and if i have a database full of python notebooks or python uh teaching books then it will match all the books which have the clause python and for example if there is a book called as the introduction to python that might get a really high higher relevant score than something as programming languages python and many more so how you phrase your query is going to define the relevant score and give a relevance score to it based on the query which you write so that is something what query means let's go to the filter context now so filter is basically as the name for this a filter for yes or no so it just checks whether the document matches the query clause or not so unlike uh query query context it does not care about how relevant it is it just cares whether it matches document or not so you can think of the query context as linear regression where you have continuous variables and you can think about filter context as a quality regression problem where you have different classes and you just don't have continuous variables but discrete variables here so it's either yes or no a binary classification and here you can get continuous results based on the relevant score you have so basically uh just to understand how filter context means with the example uh we talked about uh using having a query which matches the document which have the word python in it now i only want the books of python which are between 2015 2016 so i'll have a filter where i will want the query to only search for books which are greater than or equal to 2015 and less than equal to 2016. so if so how it works is that elasticsearch behind the scenes will go through all your documents it will check for the range and if the range is under uh if the range is correct for the document then it is yes the answer is yes and with that document is shown to you if it's no then it's not shown to you so it answers the simple question which is does this document match the query clause or not so frequently uh filters will be cached automatically so this is how it works behind the scenes and your performance will be very fast then let's see an example of how filter and query work so we have a very nice example here given by elasticsearch directly so we are going to search use the search api we have a query indicates query context then we have a bool bool is basically having multiple queries and one thing query so if you want to have multiple things to be extracted from the documents you use bool we talked about this in the previous video next must match so the query must match the title search it must match the content elastic search so what it means is that until now find all the documents which match the title as search and it matches the content by elastic search next let's filter let's filter the term our status is published so it'll only put uh get the documents which have the title search content elasticsearch and they are published and they have a range so if it's published after 2015 so what this query here is doing is that it's using both query and filter in its context and giving us back the data where the document title matches search the document content matches elasticsearch uh it is published and the publish date is greater than 2015. so this is how we can use complicated queries and simplify them and use them using elastic search in the previous video we talked about query and filter context and this video we will start writing some example queries of our own just to have a recap we talked about the query and filter context where we talked about what a relevance course what does query mean uh how does the score affect the query and what are filters we also ran some examples where we talked about how these work in detail and in today's video we will actually talk about how we can write our own queries to find something in a database now before we write some queries we have talked about indexing documents in one of the previous videos i'll link it in the description and for writing queries for this particular video we have to have the accounts.json data indexed in our elasticsearch database so make sure you follow this website and you have downloaded the data set into elasticsearch and indexed it properly once that is done we can start writing our own query so before we jump on to that let me first make sure that we have elasticsearch running and let's also run kibana now let's just wait for elasticsearch in q1 to get running so that we can start writing the query so until this happens we'll just see what the query is so we have an example here so we have to write a query to search for addresses that have either a lane or a street in the name so the address should either have a lane or a street and have a balance between twenty thousand and thirty thousand so uh to understand the query properly let's just uh wait for kubernetes elasticsearch to load so that we can understand uh the parameters uh the key value pair of our json data so that we can understand uh this question better and then start writing the query for it let's just make sure it is up and running yeah i think it is let's just go refresh kibana and see if it's here yes so now before we jump on to writing the query let's actually uh create an index pattern on kibana so we are using the bank database so let's just create that so that we can visualize our data set in q on in kibana as well and we are good to go so let's see how our data looks like on kibana and yeah awesome so all of our data can now be visualized on kibana properly so we can see a table view or a json view of the data that's perfect now as you can see our data set has an id the index bank a score which is zero by default it has the account number the address age balance city etc etc so now we can make some we can write some queries based on the data which we have here so as you can see we need to write a query to search for addresses so we have to deal with the addresses tag and it has some data here so comparability so it has lanes it has a street but it can have lanes or streets streets again let's see it can have a court or different different name for different vertices so we want addresses that have either a lane or a street in the name and have a balance between twenty thousand and thirty thousand the balance has to be between twenty thousand and 30 000 as you can see now yeah let's start writing so the first thing which we need to understand uh is the kind of api which we have to call so we have talked about the different kinds of apis elasticsearch uses so the first thing which we need to do is understand what we have here and what we want so we need to write a query to search something so when we search for something we have to get the results back right so it's a get api and we have to search in the bank index and use the search api inside guest get let's start so we have to write a query let's just have the basic skeleton of our query so that we are sure what we have to do okay so now we need two things let's divide the query into two parts first is to search for addresses with the name uh having lane or street in them and the second part will be having a balance between 20 35 now let's use both the query and the filter context here so we can use the query context to actually understand how we can have street and lane both in the name so it can either have both or either of them so since we're using both query and filter we should have a boolean so and let's see so it should must have either a lane or a street so a must and it should match now uh it should match the address and something which i think is uh interesting here is that we don't have to have multiple match statements for lane and speed we can just have match lane and street here right here so that how the match query works is that it searches for the string and the substring so it searches for lane street street and lane so all of them will be included in the a single query we don't have to worry about the rest uh if you have to match queries even that works but since we can do the same kind of work in less lines of less query lines that works better and next we need to have a filter so let's just write a filter now now we need a filter so filter oh i can't spell filter sorry about that now we need a filter and i think we made some mistakes okay let's just write it and then we can figure out the mistakes so now we have a range and the range is between the ranges for what is the balance so the balance has to be between this and this which has to be greater than 20 000 and less than 30 so let's write the range for balance and has to be greater than 20 000 and it has to be less than or equal to 30. 000 awesome let me just make sure this is right okay so we don't need this since we just have one we don't need this as well and yeah let's just check the indentations because i'm very particular about them so query bool query goes here must match the address linux street and you filter by this let's run this and see how it works yeah so it took 40 milli micros milliseconds we have total 85 hits and the max score the most relevant document was this and it had lane and had the balance just greater than 20 000 and this way we have a lot of hits so it has lane lane again street and more than 20 000 so yeah that is how we write an example query elastic search just to reiterate on what we had done here we had we had to write a query to search for addresses that had either lane or street in the name and had a balance between 20 and 30 000. we started off with a get api a get search api had a query it had both query and context sorry query and filters so it must have to match lane and street and i had to filter between the balance twenty thousand and thirty thousand so that is how we write a query in today's video we'll actually jump into compound queries understand the different types of compound query compound queries that are available and see how they work in the next video we'll actually implement compound queries and try to also generate a use case example for it in the previous video we had uh written our own queries uh very simple basic queries to understand how could query and filter contexts work and also understand how the relevant score works but today we're going to jump a bit into the advanced concepts and understand why compound queries are important so just like a sql database even in your nosql databases you need to have you need to write compound queries and that is where you'll be working on most when you apply for job so let's just start the first compound query which you can talk about which you guys probably already know is the boolean query so it is just a default query for combining multiple uh query clauses for example as must should must not fill the clauses so whenever you use a boolean query you can work with multiple queries at the same time so let's just see how let's see an example of how this works so here in the example as you can see we use the boolean keyword then the way we read it intuitively is like this so first it should must match the term where the user is kimchi it should filter all the tags where the tag is tech it should not have people from the age greater than 10 and less than 20. it should also match the tags which is wow and elasticsearch the mini number of matches that it should show is one and the boost or the one the relevant score that all of this should get is one so we'll talk about the boost uh keyword here uh in some time because that is something which is new but we get the idea boolean query uh helps us uh add or use multiple queries at the same time the next one is boosting query so boosting returns documents which match a positive query but reduce the score of documents which also match a negative query so this comes into handy when we actually deal with real life use cases so let's see the example here first and then we can actually relate to a real example so example given the documentation states that if you want uh to up the score of the text which contains apple you give it a positive uh score and if you want uh negative scores to be assigned to pi dart fruit or crumble you send them to a negative score now you can just specify that a specific term should be positive or negative or you can also give the score which you want them to have so here they want the negative term to have a boost of 0.5 so whenever you get an you get a text which has pi dart root crumble or tree they will be having a score of 0.5 now this can be helpful in a lot of ways something which we're going to talk about is the kovid 19 news so now there's been a lot of misinformation where people have been protesting that the people should not stay at home and they should actually go out and save the economy and there have been a lot of efforts by multiple organizations from such as facebook and twitter where they are trying to uh not show the news with support not staying at home so similarly uh nada is not exactly what to do with the elasticsearch but we get the idea that you jump up the positive terms where you talk about staying at home social distancing current time and you give a negative score to news such as you don't stay at home or you go out and save the economy that is how the boosting query works the next query and the last video which we're going to talk about for now is the constant score query now there might be some cases where you would like some terms in your database to have a constant score while the others might have different scores so for example you always want uh the name of a person to start by start with r so whenever you want that you can assign that a constant score and then when you search for those specific names you get a constant list of names and the other queries which you have the other query which you have in your compound query will have different scores so let's see how it works this is where the boost key term comes into the picture so constant score also works as a filter uh works in a filter context so basically what you're saying that what we're saying is that if you assign a constant score to a term just make sure that the term is there in the search results if it's not there leave it out of the picture and you can also assign it a custom score so if you want the score to be 1.2 or 0.2 times the other search results you give it a boost of 1.2 and that is how constant score query works and this video will actually write the three compound queries which we learned about in the previous video and see how they work now before starting the video we have our elastic search running and keep on running as well so that we can directly start writing the queries and get the results and these are the queries which we talked about we talked about the boolean query which is a default query for combining leaf or compound query clauses must shirt must not filter process and we talked about the boosting query where you can assign positive and negative boosts uh to your queries where the relevance of this co score of documents changes based on whether it's a positive query or a negative query and the last thing which you talked about in the previous video was the constant score query where it wraps a query into another query and executes it in a filter content context so all the matching documents are given the same constant score we talked about these we talked about the individual working how they work how they can change and why are they important in this video we'll actually write some queries so we'll actually make up some scenarios where we have to write some queries for a database so in the previous videos we have talked about using the banking database which was found in the documentation so there please check i'll have the link in description we can download database and index it to elasticsearch and now we can start writing our queries so the first query is the boolean query now we talked about this before we wrote our own queries this was the first query which we wrote but since it also is the first compound query let's write again so what we need to do here is that we need to write a query to search for addresses that have either a lane or a street in the name and have an account balance between twenty thousand and thirty thousand so since using the banking database the database has the addresses and the balance of person and we need to write a query to search for addresses that have either a lane or a street and have a balance between 20 000 and 30 000 so let's just start writing it first this is going to be a get api call to search and we have a query and inside a query we have bool because we're going to use that and inside bool it should must match either lane or street so let's write must first and let's say it properly in our way not so must match the address and the address can be lame or straight so wait i think i missed the match yep must match the address and uh we've talked about the previous videos is that uh we have to match either lane or street but match searches for the substring as well so it's just for lane street and lane and street so we don't have to write too much queries for searching for both now we're good now do now we need filter because you want to filter out a balance between twenty thousand and thirty thousand so filter and this is going to be a range the field is balance greater than twenty thousand or equal to and less than 30 000. and this is how we write a simple boolean query where we have must and the query context and the filter context together in a single query let's run this and as you can see we get the output we go to 85 hits uh we get the church lane and it has around 23 000 we get pulse lane with 24 000 so it's as you can see the lens score as well and that is how a boolean query works this has this had already been discussed in detail in the previous videos and i'll link that in the description as well next comes the boosting query where we actually uh give a positive score to a certain query and negative to sum based on how we want it so here we just have a simple use case where we have to write the query to search for addresses that include church and lane both but we want to prioritize church overlay so sometimes we have a database where we have to find all the people who are living in an area but we want to uh only prioritize people who live near a church or have a church in the names to prioritize and that is where the relevance or the score of documents come into the picture and where boosting queries can really help a lot so let's see how we would write this and actually also give it a negative boost score as well which uh i think we've seen here so let's give it a negative boost 0.5 just to see how it works and see what the difference it makes again we have a get request and search api which we need to get back let's have some parentheses right there so it's a query and what we're doing is boosting uh the the syntax uh i'm referring to this syntax here so this is what i'm currently typing and i'm using the same format or the structure of these queries so that i can understand them better so don't worry about uh knowing the syntax for how to write these queries just see the examples on the documentation how they write it it's pretty intuitive it's pretty easy to understand that we're having a query we're performing the boosting api we have positive terms we have negative terms we are assigning some negative or positive boosts to it and that is how it works so don't worry about the syntax uh the syntax can be learned or do not worry about it where about how to frame the queries based on what use case you have so let's continue we have positive and we have a term so the term goes like a d d i s s and the address is church okay awesome so we give two and now we have negative and we have a negative term same as before and the term is address and we want lien to be negative now skip two and give it a negative boost so let's give it a 0.5 and see how that works so here what we did was we wanted to search for both church and lane but prioritize uh get the church results first and the lane second and you want the difference of that to be by 0.5 so let's see what i mean by that now we've run this and let's just focus on the score first so we get two options we get two outputs church avenue and church lane as you can see uh church lanes comes last and church avenue comes before now look at the score here that's the score the score is five point nine nine let's take it as six and the score for the second one is two point nine let's say it has three so the negative boost reduce the score by half exactly half so the first uh result which we get is 366 search avenue has a score of 5.99 and church lane has been 2.99 so in this way you can not only uh manipulate the order but also the score of what you want and then have some uh applications to it later but this is how you can actually manipulate your queries with boosting api so the last component query for now we have two more left which we'll be discussing in the next video but for now we have the constant query where we make sure that they have the same relevance so we'll be using the same example as before we want church and lane what to be found but with the same relevance so let's see how we would write the query for that uh as you can see i'm using kibana uh the dev tools and kibana right here the tool here devtools and as you can see you can re uh write this in real time uh search for it you don't have to do any curl commands as you know copy as curl and run it on your terminal you can use this console it's pretty intuitive pretty easy to use and as you can see again i have different parts of my queries so uh just running this will not run the entire script will only run this part and i can run each of the parts fragmented and individually which is like really great so let's write the third constant query constant score query sorry okay so get bank and we need search so just make sure that before you see the implementation you will just pause the video and try out to try to write the query as well there won't be much changes to do in the query you can just see the documentation see their examples i've been making the example because they're the documentation as i keep specifying is the easiest place to learn from and it's really good you just see the way the structure in which way they write their queries and you can just uh follow the structure and write your own queries in the same way so let's go query and the query has constant score and filter has [Music] i'm sorry about the sound behind stay with me for some time [Music] okay so it's church and let me just go back to church now yep so this is the church everything all of it gets uh all the hits are only for church and now i copy the same thing and do the same thing for lane so if i don't do it for lane as well i can just get the same outputs i get church avenue and church lane so the database is put in that way so that we can do output but if you have more uh addresses which have chosen in common you just type lane here and you can get the common scope for lane as well so yeah for church we run this and the output is church lane and church avenue and you can see the score is one point two and one point two which we actually asked it to so it so yeah so that is how we write compound queries using elasticsearch in elasticsearch in this video we'll talk about full text queries in elasticsearch in this playlist in the previous videos we have talked about how to set up elasticsearch how to index our first query how to upload multiple documents on elasticsearch and we also talked about query dsl where we talked about query and filter context and compound queries now we move on to full text queries so the full text queries enable you to search analyze text fields such as the body of an email the query string is processed using the same analyzer that was applied to the field during indexing now full text queries can be grouped into the following queries the first is intervals query so an intervals query is a full text query that allows fine grained control of the ordering and proximity of matching terms let's see what this means so the internal intervals query returns documents based on the order and proximity of matching terms so the internal world's query uses matching rules construct constructed from a small set of definitions these rules are then applied to terms from a specific field now the definitions produce sequence of minimal intervals that span terms in a body of text these intervals can be further combined and filtered by parent sources let's see the example request and understand what how internal intervals work so here we have we are searching and we want to post a user post api and we're searching at my text all of it the order is true so it's going to be ordered from top to bottom and it's going to match my favorite food first so there's going to be an order that is going to be followed so it's my favorite food no gaps and it has to be ordered again and after my favorite food it should match any of the following it should match hot water or cold porridge so my favorite food is cold porridge can be a possible answer the possible document that can be retrieved as you can see here the following intervals search returns documents containing my favorite food immediately followed by hot water or cold porridge in the my text field so in this way we return the documents which are based on some order and the proximity of matching terms so the search would match a my text value of my favorite food is called porridge but it will not match when it's cold my favorite food is porridge so the order here is very important that is why we use a tools query now the next query is the match query which you have already seen before a lot of times but we'll uh reiterate and see how it works again so the match query is the standard query for performing full text queries including fuzzy matching and phrase or proximity queries let's see the match query in detail so uh we have used the mask match query before so we'll just uh make sure that you understand the example that given the documentation it returns documents that match a provided text number date or boolean value the provided text is analyzed before matching so we haven't used analyzers in elasticsearch yet and we'll use them in the next video but until then let's understand how the match query works in general so here we have an example query where we want to get some results back and we match a value with the queries this is a test so this will written all the documents which have either this is a test entirely or has this or is or or test so it is going to match the entire string or even a substring that is what the match query is used for let's see the next query the next query is match bull prefix so what does it do so match bull prefix query creates a bull query that matches each term as a term query except for the last term which is matched as a prefix query so we have talked about what poll is goal is to use complex queries or to make multiple queries in one single form so we can use the bool query or the match bool prefix query to match each term as a term query except for the last term let's see how this works in depth so a match bull prefix query analyzes its input and constructs a bool query from the terms each term except the last is used in a term query so the last term is used in a prefix query so let's see an example now if we want to find all the messages which start or which have quick brown and has anything which starts from an f later or the prefix is f this is how we write the query now this will match all the message values which have quick brown fox quick brown face quick brown fountain etc etc now this query can be also written in this way which is much more easy to understand but it's a bit more complicated right so here we have query bull and it should match the term quick as you know should also match the term brown but the prefix of the message can be f so the message the value of the message key should have quick and brown and any word that starts with the letter f or has the prefix f so we'll look at the difference between match pool prefix and match phrase prefix later until then let's move on to the next full text query the next full text query is match phrase query i think we have seen this before as well so like the match query but used for matching exact phrases or word proximity matches let's see what that means so the match query analyzes the text and creates a phrase query out of the analyze text so let's see what that means now here we have the same message this is a test but we have a match phrase instead of match now if this were a match query then it would match the entire string or also a stop string but for match phrase it would exactly match the entire string and not the substring this is the difference between match phrase and match so this is not case sensitive so it will match everything that has a capital t or a capital i or capital a or any capital letters in between but it should have the entire phrase inside the sentence or inside the value of message if the entire phrase is not available inside the message value then it will not return the document as a relevant document that is what match phrase query was about the next is match phrase prefix query so it is just like the match query but it does a wild card search on the final word let's see what that means so the match phrase prefix query returns documents that contain the words of a provided text in the same order as provided so as you know quick brown f so quick brown the same order must be followed the last term of the provided text is treated as a prefix matching any words that begins with that term so let's see what that means here the following search returns documents that contain phrases beginning with quick brown f in the message field now this search would match a message value of quick brown fox or two quick brown ferrets but not the fox is quick and proud so the order here matters for the match phrase prefix because quick brown f is treated as a phrase the entire string is treated as a complete single phrase so quick brown f should always be there so here as you can see quick brown fox matches quick brown ferrets also matches the two can be before it but since it is matching quick brown ferrets in the same order it is a relevant document but not the fox is quick and wrong so this is the difference between match phrase prefix and match bullet prefix so let's just see the difference again so the important difference is that the match phrase prefix query matches its terms as a phrase but the match bool prefix query can match its term in any position so here it can be quick brown f or brown quick f or quick f brown the example match bull prefix query above could match a field containing quick brown forks but it could also match brown fox pick so the order is the difference between match phrase prefix and match bull prefix it could also match a field containing the term quick the term brown and term starting with f appearing in any position so since it's a bool when you write it again either match quick or brown or any prefix starting with f so it's a bool so either one of these is fine but for the phrase the entire phrase must be considered so here it has to start with quick brown and end with the starting letter f so these were some full text queries where you can use this to build much more better and complex queries for elastic search of documents and index index indices in the previous videos we talked about how elastic search works how we can use that to write queries and also search on using those queries we also talked about how to index data to elastic search using rest apis and we also talked about some new answers on how to search effectively using elasticsearch in this video we'll start a completely new topic where we can use elasticsearch with a programming knowledge and my favorite choice of programming knowledge to programming language to use with elasticsearch is python so let's start the official client for using elasticsearch with python is something called as elasticsearch pi the name of the framework elasticsearch only in python so let's see what it is so elasticsearch pi is an official low-level client for elasticsearch the goal is to provide common ground for all elastic such related code in python because of this it tries to be opinion free and very extendable so elasticsearch pi is a low level client for elasticsearch which means that elasticsearch's dsl is another high level client which you can use to search properly using elasticsearch and python but we'll talk about elasticsearch pi or the low level client for now so since it is a low level client it tries to be opinion free or doesn't have to be opinionated and can be used very effectively and you have a lot of control over the elastic search part of the code as well so how do we start we start with installing elastic search so let's go ahead and do that i think i already have it so it should show that satisfied yeah so you can just go ahead and start installing elasticsearch using prep install make sure you have python 3 upgraded after that we need to set up our elasticsearch server so that our server is exposed on the rest api so let's go to elasticsearch folder go to pin and run the server this might take a few seconds to run let's just wait for that until then this is the official code or the open source code for elasticsearch pi it is open source so you can go and look at it all the links that i show you will be there in the description below and you can see the documentations you understand how it works see which version it supports uh for example you might have elasticsearch 5.0 so you can use the 5.x framework specifically by using 2 and 5 as you can see and will be using this example or the example from the documentation and run it to see if it works properly for this video and in the next upcoming videos we'll actually go in deeper with the python client so let's wait and yeah i think we are done with the server and i've also set up a jupyter notebook on which i'll be running the code so let's get started first uh since we are going to follow documentation we need date time for now so uh we're using date time because we want to see how a python can be integrated with elasticsearch by using pythonic functions uh inside elasticsearch queries and see if they run or not so we'll use datetime but you can go over to any other example any other library in python as well next we need elastic search and we need the elasticsearch class with the e-capital let's run that and we will run it properly and now let's create an instance or an object of elasticsearch which will have a server running so let's see where our server is running before we do that usually it's written here so it's published at 9200 port 9200 so yeah as you can see our server is up and running so let's set up an instance with the name elastics yes and yes the instant instance is ready now let's index something uh just some random raw data into elasticsearch uh dummy data so let's call it so we'll be indexing using query right so let's query now and i'm currently following the official documentation so you can always refer back to the code to make sure that i'm not making any mistakes along the way and you can also play around with the example so that you get a better understanding of the python client so let's say we have some random data and the time stamp for this data will be gathered by datetime.now and yeah that sounds that looks about right now we have a query now let's index this using the index function so es dot index let's create an index let's say my index as i'm following the documentation the doctype will be a test type it can be anything you like so don't worry about that let's give it an obscure id uh let's see what the documentation uses it uses 42 let's give it a 42 as the id we'll be retrieving the data that we have just indexed in the next piece of code so we just want or to retrieve by the id and next we have our query which is going to be query in the body and let's see what the errors are uh authorization index is blocked let's read only okay my bad i think this index is already created so let's see my another index no problem and yeah it's good to go we have successfully created uh another index and it has a test type it has a document uh with an id 42 now let's retrieve that id so we use the get api to retrieve an id or retrieve data from elasticsearch so we'll go ahead with that use the get function the index is going to be my another index i have to give it a really bad name because i already made the my index before so i'm sorry about that we have the doctype which is test type and id equal to 42 now this will uh give us back a dictionary or a dict which is similar to a json and we just want uh the source of the json which contains this data any data and the timestamp we don't want the entire list but we can get that as well so let's just first uh see what the entire output looks like so let's call it output sorry yeah now let's see what output looks like so we have the index the type id version uh found it was able to find a data which had an id 42 and the source contains our final information which we need so let's output that so we have to print the output all we need to print it's jupiter notebook so output let's retrieve the source and wrap that around a string and yeah so as you can see we were able to retrieve the data which we indexed on elasticsearch uh using the python elasticsearch client this is a very simple basic example of how to use the python client for elasticsearch and as you can see it has a lot of other features as well so it translates basic python data type to and from json the date times are not decoded because uh elasticsearch itself has its own way of dealing with date and time kind of data structures so for that reason they are decoded properly it automatically configures discovery of cluster nodes so we'll talk about this later in the upcoming videos it has persistent connections so we're able to set up the server and run our code very easily it has load balancing across all nodes so when you are scaling up your data set and how you have a lot of data load balancing helps there uh failed connection penalization we'll talk about this in the next video just thread safety you don't have to be worrying about a deadlock or any other os or underlying issues and pluggable architecture so we were able to set up elasticsearch uh for a third party programming language very easily and all these features make elastic search very easy to use with the python programming language in the previous video we talked about how we can use elasticsearch with the python client using the official elastic search package in python we also talked about a high level client called as elasticist dsl which we'll be looking into today and also being having a brief introduction about what it actually means so what is elasticsearch dsl it is a high level library whose aim is to develop writing and running queries against elasticsearch so pretty straightforward it is used to write and run queries against elasticsearch and it is built on top of the low level client so it provides a more convenient and idiomatic way of writing and manipulating queries so it stays close to the elasticsearch json dsl it mirrors the terminology the structure of how you actually write an elastic search query and exposes the whole range of dsl from python either using classes or query set like expressions it also provides additional optional wrappers to working with documents as python objects so you can define mappings retrieve and save documents flap the document data into user defined classes so you can write up your own class and mimic that class as an actual data source of your data on elasticsearch so today we'll just have a simple example of how to use elasticsearch dsl uh this file documentation and upcoming videos we'll be using both elastic search elasticsearch dsl together to see how we can write more complex queries and use python as a means to do so so before we start with the code this is uh the official github source for elastics dsl uh pi you can see how to install it we'll go and do it right now click install plastic search bye and give me a minute plastic search tsl i'm sorry and it's already done now let's go ahead and actually focus on the documentation example which we have here we'll code this up and we have already indexed a piece of data which was any data and the time stamp using the date time uh python framework so we'll be using the same index the same data to retrieve it or to search for it and see if you can get the results back so let's get started as usual first we need to spin up our elasticsearch server so let's do that make sure you have the most recent version of elasticsearch if not you'll have to make sure that the elastics dsl python package or the pi by package which you install is also the one which supports your elasticsearch version so until this is getting ready let's see the compatibility issues so yeah as you can see make sure that you have the latest version if possible if not use any of the major versions and this is how the requirements in your setup or should look like okay think we're good to go let's check if our server is up and running and it looks like it is so let's go ahead and see how we can use the python client for elasticsearch dsl uh let's actually see if we can use both together or not so from elasticsearch import elasticsearch like we did in the previous video nice and from elastic search dsl we import search because you want to search the data which we have indexed in the previous video so es equal to elasticsearch let's spin up an instance and okay this is working just make sure this is working again let's have a search using yes the index we want is my index if you remember let's query this so as you can see it is a very high level client so we can use simple functions like search query etc etc and you'll see how easy it is to actually code up a search query using elasticsearch in python right now if you want to match uh any because any was our key and the value was data so let's see we can find something which matches the same thing and let's store the response into a variable so let's start execute so we execute the search query and we show the response here awesome now let's see what the response looks like so although we know that there's only one set of one piece of data on the server or on the index but let's just see how it would be if you have multiple so and for hit and response i'm just following the documentation so i'll be using the same structure let's print the score uh of the search query how good was it let's make it a bit fancier then let's see uh the any part so key for any uh let's just say source file source yeah so the source was hit dot any if you were able to find it and hit dot timestamp if i'm right so this should give us the score of how uh relevant the document was and the source which was hit dot any and timestamp which you both need and yeah as you can see the score was 0.28 the source was data because we had any data and this is the timestamp which we used using date time.now and this is how we use elasticsearch dsl with elasticsearch to search for queries very easily so as you see uh the main piece of code here which is used to search on the index we have a simple search for functionality using the elasticsearch client this is how we connect to the elasticsearch low level client and also the server the index we use is my index which we have done in the previous video and then we have a function called query where we use the match query and we do it on the key being any and what we're looking for is data we execute this query and store that in a response then for each hit we get for the response we print out the score and the source since our piece of data had only two columns uh two keys which is any end times term we print both of them here and we can also see how the head looks like so if you want to see the entire thing we can spin the entire thing so it is an object a hit object with score and this is our source and this is the id which is 42. this is how we use elasticsearch dsl with elasticsearch to search for data inside an elasticsearch index
Info
Channel: ProgrammingKnowledge
Views: 60,858
Rating: undefined out of 5
Keywords: ElasticSearch, Logstash, Kibana, ELKStack, ElasticStack, Windows 10, Filebeat, scalability, reliability, pipeline, Elastic Stack Tutorial, ELK Stack Tutorial - Getting Started With ELK Stack
Id: kjN7mV5POXc
Channel Id: undefined
Length: 123min 36sec (7416 seconds)
Published: Sat Nov 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.