Elasticsearch and PHP: AMA with Enrico Zimuel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Applause] hello everybody welcome back to the elastic live user group this is the uh north and south american virtual user group where we take our lunch time to talk with someone whether it's inside of elastic or outside of the company that's using elastic or just talking about tech in general and we hopefully have some questions that we can ask and this week is no different we're continuing our client series i think this is actually the last of the clients uh that we're running through before we have to bring y'all all again because you know 7.14 just came out 7.15 is coming out 8.0 will probably come coming out sometime next year so we'll just keep rinsing and repeating until it happens but i have enrico zimwell with me enrico how's it going good good hello everyone let's kick it off as we normally do enrico tell everybody a little bit about yourself how you got to elastic and ultimately how you got to you know be in charge of the php client sure so i start to work uh an elastic now it's uh two year and a half mostly yeah it will be three year in march uh next year so uh you know i was working for zen technologies that is the company behind the php basically funded by you know andy goodman's and zips rascated you know creator of php version 3. so i was deeply involved in the php community in open source i am the co-author of you know three big projects in the php arena that are some framework capability and expressive and um you know after 11 years even because the then company was acquired by another company you know i started to look around for new challenging and uh i found the you know the job offer from elastic about php the php client so i think was interesting i was intrigued about elasticsearch i was not definitely an expert of elasticsearch so this also gives me the opportunity to you know to learn more about you know searching uh algorithm you know searching uh use cases so here i am i love that like most of the people that i've talked to at elastic when they got to elastic they were not advocate users they weren't like the the heaviest of users of elasticsearch so i think that leads something to the company having that thing that just brings people in and puts them in a position of leadership even though you know they they aren't necessarily the the most long experienced people in the community as a whole which is great because that means we're always looking at the outside to bring them in but also how has how has over time your knowledge of using elastic uh grown like i mean do you still feel like you know you're still not the the biggest user of all of the tools in elastic or do you feel like since you have to develop for it you've kind of grown to learn more and more about it yeah i'm learning day by day basically i grow of of course the first month was like crazy because you know from almost zero to you know 100 in one month because you know as you said we have to build the software in order to use elasticsearch so we need to understand you know deeply all dpi and we are talking about around 400 apis so it's a lot with all you know parameter use cases uh you know it's a challenge of course but uh especially in the clients team this is quite popular in the sense that we are expert in each field so you know javascript php ruby dotnet go java so that's why i think they you know uh built this team to think about experts that had you know a previous background in open source how to create a library uh community and because as you know the main the most important part of open source community it's not only technical it's very very important to have you know people uh behind you know the community so that's why i think uh we are not really you know heavy users of elasticsearch uh in the beginning now of course we have we are because you know after almost three years but again rustic search is a very nice project a lot of feature so you literally learn every day how to use it how to improve it a new feature coming every two months so it's quite hard also for us to be you know aligned with the new proposal of the company so awesome well i love that you mentioned you know there are over 400 different apis to cover and i will be honest with everyone that's watching we're not going to be covering all 400 of them yeah we got about an hour so uh we're going to be hitting some of the the wave tops on this but if you have any questions that you know for enrico while we're here you can feel free to throw them in the chat let us know also if it's something that we don't cover that you're interested in go check out our discuss channel we have a forum there where people are asking questions all the time the way i often tell people we have elasticians technicians and all other issues of every kind in there that are just hoping to help folks get the solutions that they're looking for and i know that enrico has done a few other presentations and we do presentations all the time there are plenty of events that are happening throughout each and every week and that's going to be at community.elastic.com where you can find that information and last but not least i would be remiss if i didn't let everybody know that elasticon global is coming up and be sure to to look out for that be sure to rsvp and sign up for that there's going to be a bunch of talks i've been helping some folks with talks i'm super excited to hear about it i'm not sure how many are php specific but i know that there are plenty that are going to be happening around the elastic stack as a whole and you can find more about that at elastic.com so enrico let's talk a little bit about the php client how is it similar and different to some of the other clients that we have a good question is i would say similar to other you know languages dynamic languages like you know javascript ruby python uh for sure not so similar with the others like you know dotnet where you have more object oriented you know more abstraction actually at the moment the php client we say that is low level client in the sense that it gives you the possibility to you know connect to elasticsearch managing the connection pool that is a feature that every all the client support so basically you don't have to manually you know pick up one of the node and send requests to the specific node you can just specify look this is my set of nodes my cluster in the configuration and the client will basically using a round robin algorithm spread all the communication to all the nodes so this is a feature that is uh common in all the clients and i i think it's a very good feature because you basically don't have to care about the architecture of your you know cluster everything is managed by by the client itself so the difference are basically as i said as you as a developer experience how you perform an http request that in the case of elasticsearch php is just a function that you have to call into an object the object is a client object so basically the flow is very simple you create this instance of the client as an object and you can perform all the apis just naming just calling the the name of the function that is the same name of the api that's it um very very simple i will show some example of course but this is basically uh you know the background and of course we know that php is still kind of how the internet runs in many cases a lot of folks like to to look at javascript to that but i i remember many a time where php was the thing that was just pushing the internet and and getting that data to you uh how does elasticsearch and the php clients work in compare like incompatibility with some of those php tools that are out there that people are using um such as like laravel and of course we know wordpress is kind of a big deal yeah uh it's very simple in the sense that the client returns you an associative array that is the body so is the first of all the communication between the each client and elasticsearch is provided using the rest api using json so if you want you can just send with using any http client in php for instance using guzzle uh just to perform a specific api but you have to know all the you know the parameters the middles the adders you have to manage by yourself the connection pool or you can use elasticsearch hp that gives you basically the input is an associative array so the parameter the body of the request so for instance if you want to index a document you have a document inside an array you just put and call it index function and you will come back with the serialization json it's an associative array as well so it's very simple to integrate in every php project and there are a lot of community libraries that consume the elasticsearch.php client in order to integrate into laravel as you mentioned or wordpress you know it's very very simple to integrate you know the the php projects out there well i i will say you've exhausted about most of my knowledge around the php clients and uh php as a whole so i am very much eager to learn um how we interact with elasticsearch using php i people tend to forget this live stream is for me because in many cases i'm the beginner that's like hey how do i do this uh so i'm going to turn it over to you let you go ahead and share your screen and then walk me through the process of using this php client and folks again if anybody has any questions as to what enrico's doing or or if you mention something you're like hey what is that let me know in the chat and we will be happy to to get that answered okay cool so let's start uh from scratch of course the so the first thing is to have an elastic search instance uh running so the idea is to you can do you know can install manually or just we can use docker because we offer in our you know docker elastic.co so this is our website where we store all the docker images of elasticsearch so we do this uh example so maybe i can increase a little bit i don't know if it's big enough um you know you can pull an image of elasticsearch in this case uh i'm you know using the 741 version so it is the the latest and the other command this is standard docker command it's running basically this instance using the default port of elasticsearch that is nine two zero zero because as i mentioned elasticsearch is a server application that communicates using uh you know http a specific part so let's uh we can run actually let me run in another window so basically here i'm just running the elastic search so i'm pulling down the docker images and now the elastic searches is running uh here you can see all the you know the dump basically all the information the logging messages of elasticsearch so okay let's do this in the background so now we are running uh elastic search on it's just one node of course in my local my local machine let's start with uh an example so let's do a test example here so first of all we need to have the php the elasticsearch.hp client install it so we can use composer that is the default way to install you know packages libraries in php so we need to write composer require the name is elastic search slash elastic search of course i search okay wrong so uh of course now i have everything in my cache otherwise you will have you know some seconds in order to download all the packages but as you can see basically there are not a lot of dependency so we have uh the main uh library this is elasticsearch and we have a couple you know dependencies in order to manage basically the http connection so we are using a gazelle and the specific ring php i forget it uh because uh you know this basically is a library that is based on ghazal but was uh you know uh uh deprecated so i forget on my on my my repository so i'm managing it by myself uh we are supporting the psr for logging psr-3 for logging the psr standard in the php world and we are using promises from react because we also have a synchronous operation so the elasticsearch.php client gives you the ability to uh execute a synchronous call so basically you can for instance provide a search and this search is not blocking the php execution so the php code will be um continue to execute so this is nice if you you know have performance issue or if you want to boost you know your uh your application okay so now we are ready uh so the php the elasticsearch hp library installed under the vendor folder so we can you know create our first uh php script of course we need to require the vendor autoload of composer this is quite standard in all you know the php projects and now we can start uh creating uh the client um for instance performing i don't know an info api just to give some information about the cluster that is running elastic search so the first thing that we need to do is to create a new client and we have this client builder class that is a factory so it implements the factory design pattern basically we need to create a client we need to specify the hosts and this as you can see is an array because here you can specify a list of all the nodes that you have in your cluster in our case it's very simple it's just uh one it's localhost 9200 and you can build actually the object itself declines so basically in one line i wrote in three lines just to you know for uh to have a for a pattern you know a format but in basically one line you will get the client already so now you can actually perform any operation i don't know like we can perform a very simple api that is uh info as you can see when i start typing uh i just put the you know the the autocompletion gives me the list of all you know the api all the namespace actually that i can you know i can uh execute for instance we can have a info info is very basic api that gives you just information about the status of elasticsearch and we can uh let me print um dollar results okay so basically now we can execute this uh simple script uh again i'm here in the console like we can just run from the command line and as you can see the result is an array so this dollar result is an array that contains some information about the status you know of the instance that i'm running on my machine uh the name of the instant the cluster name because it's a docker cluster was the name that i used to run the docker image the version of elasticsearch 741 we have also some information about the build ash because elasticsearch is an open project on github and so you can even check the ash of the the last commit with the tag 741 you have the bill date you have the lucine version because i don't know if you are familiar with the engine of elasticsearch but basically a such elastic search internally uses lucine as an engine to perform the search operation so this gives you also the uh version of lucine that is using uh elasticsearch and we are last but not least we have the famous our famous you know tagline you know for search so this is a basically a very very simple uh you know api but again it's just calling a function with the same name of the api just jumping in really quick with yeah i i love that this is like always the hey let's let's do this just to make sure we're connected uh for those watching another thing that you can do with this is you can use this to verify the version number um to make sure that you know if you're connected to the right you know if you're connected to the cluster of course you can check the cluster name but most of our code isn't breaking most of our updates don't go back and break we do have a few times where that happens with like some major releases so i think there are often times where you may want to execute different commands based on the elastic search clusters age and the version of elasticsearch that you're working on again that that's one of those things where it's not like it doesn't happen often but if you're like debugging or if you're optimizing for production or if you're trying to work at something where you're scaling it and you might have some older elastic search instances that might be a thing to check yeah and again because everything is a very simple sensitive array you can just you know i don't know version number you can just print it as a result in the array so the first name is version you have to pick the number run again this is simple read i will get back the information so as you mentioned you can check it out if you know your application is compatible with specific version of uh elasticsearch because maybe i don't know you're using some feature that has been released you know you know starting from a minor version for instance like you know schema on read that something that we released recently so you can check as js8 very very easily the the version number okay let's jump to you know more uh interesting example uh in order to reduce the time already created some example here so basically we want to index a document so the first operation of course is to store something into elasticsearch how we can do that so a document basically for elasticsearch is a json document for php can be also a json string i will show you how to do that later but generally speaking is a an array an associative array so imagine that you have this information uh you have i don't know account information with the account number the balance the first name last name the age gender and so on and you want to store this information this document into elasticsearch you can use the index operation so the index basically is the meaning of storing of saving data into elasticsearch so indexing so that's the verb and you basically need to perform uh to specify some um you know information here the first is the name of the index so an index if you want to compare to a sql relational database more or less like a table you know in an index you have a name like a accounts table and inside you can have you know all the documents of course there is a big difference with uh relational database in a regional database you know the fills are almost the same so you have to define the structure uh here is like nosql basically in elasticsearch is a kind of nosql you know database so you can actually put whatever you want into the um into the document so whatever json so that means for instance for some information you cannot have i don't know the address for other you can have it or putting other you know you know information as well anyway the first parameter is the name of the index accounts in this case you can specify the id actually we can also don't specify that it just uh let's elasticsearch to create the id for the specific documents and the second parameter is the body the body represents the request that you want to send to elasticsearch in this case is as i said an associative array account so let's see what what this example does so the name is let me name is index basically it executes uh the uh the script and at the end i have a var dump of the results total results so let's see what happens so basically we have an array actually let me print maybe this better okay now i think it's uh you can read more easily so basically we have the account the index account that we you know use it uh the type doc because this is a document so this is some internal uh you know type of elastic search and this is the id so basically we created the documents this document here and elasticsearch stored with a specific id so the id is created automatically by elasticsearch you have also the version so this is another you know cool feature of elasticsearch so basically each time you insert or update a document the version is incrementing and this is used for you know to have consistent data inside the database there is also a possibility to have the version outside elasticsearch that means for instance you have a i don't know mysql database with a table you want to insert the table in elasticsearch you can even you know uh have uh used an external version coming from i don't know my sequel postgres or whatever and uh we have the results so as you can see everything that start with the underscore is something managed internally by elasticsearch if you don't see the underscore from the name this is more you know for uh as um i would say output as a result from the api uh for instance here we have you know the number of uh an array that contains the the shards that is our internal information but the most important one i would say for for our you know example is the results because in the results you have you know the created results so that means the index was successful and the elastic search created this you know document inside the uh the edge of course now with that we created something we want to read it so the the read is very simple is basically a get there is this uh you know api uh get that performs uh you know they read basically you have to specify the index name and the number of the id and in our case for instance let's copy and paste this one is the id that we just created and sorry we have to run the get as you can see we have the response here the dollar results and the get is actually the uh you know the information about the the document so the indexes account the id is the one that we provided here and the actual body is under the dollar sorry the underscore source field so as you can see here we have all the information about you know uh the the data that we put before so amber duke is the first name the last name and so on so very very simple if we did a mistake for instance let's see i don't know if the full that does not exist and we perform another uh you know get we are we can you know try catch because basically the php client if you specify an id that does not exist it generates an exception in this case is missing for zero for exception as you can see here there is the full namespace of the exception under the common exception and in this way you can manage you can catch the exception and i just put uh you know some uh message here the documents doesn't exist this is actually the message the get messed up so this is the usual way where you can you know in an exception in php you get the the message of the exception and this is actually the response from elasticsearch so this is the exact json that elasticsearch provided you in case of error as you can see there is a found false that means that the domain does not exist enrico if you if you didn't run that 404 exception catch would it i mean would it result in an error or would it just say like nothing nothing happened here yeah so uh if we comment the try catch we can try to run again you have uh you know a php fatal error because basically each time an application in php execute and accept you throw an exception of any kind php will stop the execution as a fatal error so you have to use try catch in order to you know protect yeah the execution of of the gap in this case okay i know that there are some other um i guess error messages so like we have a 404 we have a 400 um and there are a few others i know that in some of the clients there is a a way to ignore some of those especially when you're dealing with something like in like if you have to re-index data sometimes it's easier to do like an index dump at the very beginning and just say hey do this dump and don't freak out if there is no index to dump that way like that's fine um is is there something similar um in the e client you can just say okay let's uh try catch with the you know elastic exception that is one basic search exception you you cannot you know perform nothing for instance uh this or you can actually uh even if you you know specifically for instance you could you cannot you don't want to have a 404 you you won't just to in your the four zero four you can actually do something like this so you can specify in the client so this is a special key for the elasticsearch hp where basically you can customize the execution of the http client inside the elasticsearch hp and you can write something like this in your 404 and basically if you specify this the elastic search will not throw um the elasticsearch hp will not draw an exception okay that that's what i was thinking of i saw i saw the try except and i was like okay is that a is that a thing but i see here you have the both yeah the two options by the way the majority of the php modern php project uh because everything is object-oriented try-catch is quite popular from the php you know people but if you want to if you know that you want to omit specific error like 404 you can just add this in the parameters of the api and this actually will not draw an exception so in this case the results is the result from elasticsearch that was by the way the same that we had you know catching the exception from the cut message but the php is not blocking so you don't have a fatal error of course you need to check manually in this case so if you know the found was true or false you know what i mean if you if you want to but again if you already know that you you don't care about 404 you can just pass this and you don't have to try catch or check uh nothing basically yeah and and again for me like i i have only had to do like an ignore result like a 404 or 400 error usually when it's like i'm indexing a bunch of information and if something happened the first time like i have a delete index field in there that is just hey in case i run this a couple of times like if it ran too early or something like that then delete the index but in most cases that index won't exist but i know that i'm just making sure that if i do have to index it again it deletes the first index and prevents you know 400 error yeah yeah exactly okay so i just show how to index how to get let me show how to update documents so the basic operation there is an update api so very very simple you need to again specify the account the id so let's uh let's check for the id so imagine that we want to change this um id here so we want to use sorry this uh document id and you want to change i don't know the age was 52 let's change to 33 for instance and let's see what happens php update php so now we have uh the response back as you can see there is just uh um you know updated results you don't have the uh the actual you know object the documents in the response so if you want to be sure that this id has been updated of course you can use the get okay and as you can see the age now is 33 so we change it basically the the age and you notice that you can just change update just one field in the document but you can also add another field because as i said previously elasticsearch is a nosql database you can you know add filled uh on the fly basically okay uh the last operation to you know to fill up the crude example is the delete how we can delete a specific document very very simple we have a delete uh you know api you you have to specify the index name and of course the uh the id so imagine that we want to you know just remove the document that we created we can delete it as you can see the result now is deleted and if we get back uh we don't have uh nothing because found is false so if we go back here to the try catch and we remove the decline in your 404 it should generate an exception and using the try catch actually let me move this inside the try catch otherwise we have error documents does not exist because we delete it so basically i show you how to insert how to update how to read and how to delete in a in elastic search and it's it's very much that that like you said that crud method methodology of like client dash arrow or arrow create update you know yeah and delete and then i guess you have you know you have the the get of that as well which is kind of the read in that process um yeah actually i'm i'm planning also to because this is something that i started to do for an article that i published for the php article that is a famous popular you know php magazine [Music] and i wrote an article how to use elasticsearch with with php this has been published on july this year uh so i collected a couple of example but i'm collecting you know very long actually i can show you so here basically we have a lot of example uh different kind of search bulking logging even schema read you know a lot of other you know example i'm i'm thinking to publish you know a repository to open all these examples so of course we will share the details when i when i you know finish this uh this project so as you know everything we love to you know share with other people and sharing code is everything every time you know uh funny because you receive also a lot of feedback and you you can grow and even uh this will be a public repository so people can even contribute with a you know crazy example or scenario whatever another example that i would like to show you because this is very again basic operation but i think this will be more interesting is the bulk so imagine that you want to index not just one document but you want to index a set of documents let's say 100 documents the the beauty of this api is that basically that you can specify 100 documents in just one body and perform just one http request so you don't have to perform 100 http requests you can just perform one single http request so this is very convenient i will say so how to use the bulk operator in elasticsearch php this i think is interesting to also to mention for the audience uh as i mentioned asic search this is the official website of elasticsearch about the documentation so this is a elasticsearch guide 714 rest api documents api bulk api so basically if you go to the documentation all the rest api are you know documented of course and you can have a look at the the bulk api the one that i want to show you now so here you have an example so you can even copy and paste uh using uh the curl uh full is a famous http client that you can pull in in your shell or you can use also kibana and view in your console whatever you want but let's talk about php so basically if you read it this example let's say okay if we want to post multiple documents you have to create one single body where contains the first line is basically the operation that you want to perform and the second line is the document and you can actually you know create uh more operation uh for instance you you can this is a pull corporation that index one document with the index name test id one it creates so it's stored this document field one value one into elasticsearch the second operation is a delete the third operation is a create to create an index an update and so on so basically in one single http you can perform multiple different operations not just indexing 100 documents deleting 100 no you can even send an index request a delete an update and so on so this is very very powerful so this is mapped in php in this way basically uh you have a bulk api and you have to create in the body of the bulk you have to create the operation as in the example that i show in the online documentation for in this case we are just indexing so the operation is index we are we are working on the same index that is accounts so the the first line is uh as i said the index operation so we want to index something this is the id that we want to create so this is just the number that i put here the the id so we want to create a document with this id so the second line is actually the document itself so in the previous example the accounts was the first name last name all the information i just put here name and surname and the second again is another index of course using a different id and a different value so basically this is a bulk that performs two indexes with two documents in one http call so if i try to run this let's create another index so we okay so basically this is the millisecond the took it gives you the number of milliseconds that elastic search consume in order to insert these two documents and these are the items there is no errors so error is empty basically the items are the document that i uh created so basically the first item is this one and as you can see we have basically the the information if you remember the index example that i showed here are the same basically the accounts two is the name of the index the id the results created and the second element is the second uh also created the second and the second document it's also created so that's it very very powerful of course now it's just uh you know a toy example because we have just two documents but imagine that we have multiple documents for instance this is an example a json example that we use also in the documentation of elasticsearch with number you know of accounts specifically i think it's two thousand and uh one yeah 2002 uh accounts actually the documents is the half so it's one thousand and one documents because remember the first line is the action the second is the document so it's the alpha basic of the line in this case so imagine that we want to perform a bulk operation to insert all these you know documents in elasticsearch we can actually perform everything in one shot because elasticsearch gives you the possibility to have i if i remember this 10 megabytes with the 5 000 document or something like this but just for uh you know for this example i choose to split the document into 500 so i'm booking 500 documents at the time so we are performing i guess for http call here and what i'm doing is very simple so i'm just this is pure php so i'm reading the accounts json the file um i'm of course specifying that i want to create an accounts index the body is empty because the body is something that i have to create the reading the information from this accounts json and this is something interesting because here basically i'm reading line by line for this account json and as you can see here i'm not converting the json string because this is a string i'm reading a text file here so this is a string i'm putting directly the string into the body this is a cool feature of elasticsearch hp because if you find online or in another project if you have some json for instance like in the example that i show you have this json here and you want to use this as input of the function of elasticsearch hp you can even pass a string so basically the the dollar patterns that i showed early for instance in the the get or maybe in the index this one as you can see here the dollar account is an associative array you can even specify this as a string so just to give you so just good just to get the first line of this and put as is so this will work as well so you have two options basically you you have to you can specify the body as a societal array or the body as json documents this is convenient if you copy and paste something from another project online from the documentation you want to experiment you can do that and actually i use this approach in this example because again i'm bulking but i'm using the json so i'm not you know decoding from json to associative array and because internally of course the operation that elasticsearch does is to take the associative array if we specify an array and actually call the serialization in json because at the end of the game the http call needs to be json to elasticsearch so this is also a cool feature because if you already have a json you don't have to serialize and unserialize internally you can just put the line itself so basically i'm creating the body here this is the concatenation sorry the concatenation operator the dot equal in php so this is concatenating the string at the end of the string another string and i'm just counting when i reach you know 500 i basically perform the bulk operation so i call the bulk with the dollar params that is the parameter and of course i need to clean the body because the next time i want to start with an empty body and i just print added the end documents here so very very simple script if we put all accounts as you can see we performed you know for http call we where each basically as we call store 500 documents into elasticsearch okay so this is just you know another way to consume the elastic search php client using again the bulk uh the bulk uh operator uh the last example actually or one of the the last example that i would like to to show you is the search because you know elastic search is famous for uh this feature for searching um you know for search you know for search yeah exactly so how we can perform a search operation this is very very basic example imagine that you want to just know okay how many documents i have in my accounts in my accounts index you can just perform a search you specify the name of the index and i'm just getting out the total documents the score took and also printing some information here so let's execute this example so this is just search php so again i'm not searching for any you know information just give all you know the documents if we perform this we'll we'll have back this response so we have basically number of documents is 1 0001 the maximum score is one because inside elasticsearch there is this idea of scoring so basically when you search something all the results have a score that is a floating number that gives you the quality of the results so yeah if you i don't know if you are performing uh a search looking for a specific term uh elasticsearch will provide also similar similar term for instance using the fastest search that is a nice feature of elasticsearch and using the value you can understand the quality of the results so if the result is similar how uh you know the distance between the results and the actual you know term that you've wrote in your in your query anyway in this case it's just one because we didn't perform any search just looking to to all the the document this is the request uh the response time so basically researchers consume is 382 milliseconds to perform this operation but as you can see they give also the results so these are basically the documents that i stored into um you know elastic search but only the first 10 because of course i cannot give you rustic search by default does not give you all the one thousand and one uh documents it's use pagination so basically by default elasticsearch gives you 10 response at the time and of course the next question is okay enrico but how i can iterate uh or all the pagination right because this is we have a idea of scrolling basically that is uh the pagination so you can actually scroll into the results how we can do that we have uh the search api as usual you have to specify the scroll and the sites so these are the two options that you have to specify in order to use the scroll so the first scroll is the number uh is the the timing of second between each request that is 30 seconds so this is interesting why you have to specify this information because basically when you perfo when you perform a search on elasticsearch the results are stored into a snapshot for 30 seconds and you can you know perform the pagination on all the results for 30 seconds why because if you have an elastic search cluster with a lot of data maybe the the data are changing even when you are searching can change from one page to another can change so that's why it's important to give you a consistent results a consistent snapshot of results in a specific time so in this way you can specify look i have a window for 30 seconds so the next if i call the next page in 30 seconds i will get back the original snapshot results so in this way you don't have an inconsistent you know result because maybe i don't know you have a very high intensive application that performs a lot of index update you run a search and gives you 1000 you know results and the next time you go to the second page the number of results will change because you know again the data behind the sheet the the change so you don't want to have this kind of performance so this is a cool feature of a sixer they say look i want to have a snapshot of the results and this is the time frame where the snapshot basically leaves from one request to another i think a good example of this is on the security side i mean if you're if you're looking for the latest events and you're wanting to scroll through them if you're you're doing any type of management for a large you know company you're going to have thousands and thousands of entries happening every minute so you want to be able to uh one make sure that if you were working on one you can like you give yourself enough time to be able to actually look at the things that are happening and there are plenty of ways to make sure we go about doing that and where in this case you're we're just saying give me like the all of the data you could say give me all the data that happened you know within a certain time frame but as things continue to update if you didn't do this every time you would go to a new page you would just see the same old data over and over and over you're like wait a minute i'm trying to see the next 10 results well if 10 new results happen then of course you're going to keep just seeing that same old data every time so i think that this also is a way to do that isn't there another way to look at the actual snapshot as it as it was collected and i might be wrong about that but i thought there was a way that you could say like in this search that search itself has kind of a snapshot and that snapshot kind of has its own id so you can search based on the results in that particular snapshot uh yeah but this is something diff so this is basically the search where the snapshot is internally into elasticsearch of course if you have your snapshot you can just specify oh look i want to search into this snapshot specific one so this is more internally but you write there is a way to specify when you create your snapshot look i want to search into this maybe this is a good idea to have you know snapshot for you know you have a locking system you have snapshot daily so you want to just search into that day and you you can you know perform a search in that snapshot as well so yeah yeah i think i was conflating two two things and like how elastic no the idea is the same yeah but it's the same but this is i i call it snapshot because this is how elasticsearch actually performs this operation but it's not you know public in the sense that you can specify the the name of the snapshot the other parameter is the size so this is the number of results per page so we want to increase it i don't know 50. and of course you have to specify the query again we want to a match all and this is interesting because the api of elasticsearch sometimes uses empty objects so just to give you an example in json an empty object is basically this how we can map a json empty object into an associative array in php we need to create an empty object but as you know we don't have uh actually we have it but not a lot of people are familiar with this standard class so this std class is basically a class a standard class in php so when you want to create an object an empty object you can just create an instance of this standard class and this is actually the the trick that we use that they actually the json encode and json decode that are the two functions of php in order to encode the json and the code adjacent users internally to convert this uh object empty object into this one in json anyway long story short each time you see in your json api of classic search something like this an empty object remember that if you are using associative array in php you need to specify this new standard class okay so basically this perform a search giving the first page with 50 results and how we can actually scroll uh go to the next page there is a this scroll id that is basically uh an id that gives you the you know the information about the pagination actually because if you use this sequentially here basically what i'm doing is while total number of the results is greater than zero because the results um are under the it's its dollar results so here basically say if this exists and the count of this i store it in dollar tot so while dollar dot is greater than zero i perform this while loop and inside this while loop adjust you know print uh you can do here you can perform whatever you want you have to store the scroll id because the scroll id is the one that gives you the information about the id where you are in the in the snapshot basically and in order to go to the next page you have to perform a scroll operation so you are not so performing a search so the search only the first time if you want to scroll into the results you perform a scroll api and very very simple you need to specify the scroll id and again the time frame that i use the same value 30 50 seconds and the scroll basically is a search under the next page that's it and i again because i'm updating the dollar results each time i will have it here the new information as the news quality because this will change from one page to another basically and with this simple while you can just perform the pagination and read all the information for all the uh api so let's uh run it just uh i didn't print all the documents suggested page 15 documents and so on as you can see there is one spare documents so there is page 21 that contains just one documents i don't know if there are any questions or if you have any questions you have me you have me thinking because i've i've built plenty of projects before where it's like give me the results and i've never i've never really played around that much with scroll but seeing it i mean it makes a ton of sense but being able to also look and see like you can store everything and look at the scroll id and just say like you know as you said here page one page two page three page four and use that in your own system to yes exactly to move around yes yeah so that's that that's making me think a lot about like now now i need to go back and look at like all of my projects and be like how do i how do i implement scrolling you know properly in this yeah that's uh that's also uh interesting idea you you have to play also with this time frame because this is actually uh you know very important because otherwise you're storing some scroll id and if you if you don't perform any scrolling 30 seconds the snapshot will be you know destroyed basically yeah there is a limit about the scroll i don't remember is some five minutes something like it but generally speaking is more than enough to cover the majority of the the use cases so just trying to think what would be what would be the best way to do that because i mean i i'm sitting there thinking from a traditional search standpoint like an example on like a website where you say you know hey there were 5 000 results here at the first 30 what if someone just sits there and hangs out on that search you know for too long i wonder if there's a kind of a systematic approach to saying like hey if if the next search is after this time you have to re-execute the search otherwise you're you're going to run into some issues actually uh yeah that's a good question there are a lot of possibility here so this is more in a scenario where you have a lot of you know data coming updating so that's why i use 30 seconds so yeah imagine that you want to perform you know searching into log if you perform a search and after five minutes you perform another search you don't want to see you know the five minutes ago log because maybe you are interested in more updated you know data so this is more for you know this kind of uh this kind of uh scenarios uh but if that timeout has happened and you try to search you try to continue the scroll will you receive an error because it may just be again arrogant no you don't you you don't have an error actually you have you have uh you can have inconsistent you know results because maybe the number of total documents are you know changed and that's why so this is more parameter to be sure that the results that you are giving back are according to the snapshot that you call it into the time frame you know what i mean yeah that makes sense and this this is why i enjoy doing these because it's like hey today i learned a little bit more of how the scroll process works in this and how to effectively use it so yeah there are tons of tons of internal details that are very very specific to the implementation of elasticsearch that gives you that you open the door for a lot of you know scenarios like the schema read uh like if you're familiar with the latest you know feature that we introduce it it's open the door to completely new approach because from my knowledge there is no other you know solution in the market that gives you the possibility to change to add the field in the results when you are reading the information without changing the information this is quite crazy if you think about it yeah so open i'm sure we will see a lot of interesting uh you know application use cases in the in the future well cool well i think we're running up well we're actually running a little bit past time but uh thank you for for i mean the extra time went towards this so i thank you for that because i learned something today um but did we have do we have any other things that you wanted to to point out um let me think because i have a lot of uh uh example here so i provided uh yeah maybe something that i didn't uh that i didn't you know tell about the mapping this can be interesting so basically when you in the example that i show you we just have some documents associative array json and we just put the document inside the elastic elasticsearch we didn't care about the mapping so the type basically uh to have you know for instance in the sql world where you have database relational database you have to specify the structure of a table no you have to specify this field is a varchar is a string integer whatever i didn't write nothing if you remember what the example did i show you i just pick up a random json and then put it into elasticsearch um so we say that uh elasticsearch is schema-less so because we didn't provide any schema but this is not true so inside drastic search elasticsearch a schema that they made as mapping in order to understand okay this looks like you know a string so i will use a string this look like a number i will use the type number and so on so this is quite powerful because you don't have to think about you know nothing but of course when you start going more deeply inside your application you may recognize that maybe you need to change something because you want that something is not a text but is a term uh and etc so there is a nice api that is mapping uh actually this is the first example of api into elasticsearch hp that has a namespace because typically i mentioned we have almost 400 api in elasticsearch we also collect together using namespace so there is for instance all the api that performs on indexes our indices are under the in the system space and this is an example get mapping you you have just to specify the name of the index and this will give you the results of the type use it internally by elasticsearch to manage this index so if we actually run this example for each field and remember the example that i show you are count number address uh age for instance as you can see age automatically because age was a number automatically elasticsearch is able to say okay look this is a number so i will use the type long for instance the account number the same long address is more complicated because it's a text that contains some keyword with maximum 256 for the for the keyword you have city the same is a text the email is a text etc so you can see using this api how elastic search manage internally the accounts index in this example but imagine that you want to change it because i don't know maybe the gender that is actually if we go in the example gender as you can see it's just m or f i don't want to use text for this maybe it's better to use you know term for this because term is internally elastic search is a type that gives you the possibility to perform a very you know performant operation using this term like not a full text but a specific specific term so long story short we want to change it this mapping how we can do this we can use the under the same namespace indices they create so we want to basically create because if basically an index already exists you you cannot change the mapping you have to re-index the uh index uh or image that we want to actually create from scratch a new index that is account you can specify in the body the mappings and as you can see here for each property first name last name gender etc i'm specifying that is a keyword this is the type that i want to use it is basically we want to perform a term operation on this uh tie on this field and the keyword is the one that we need to we need to use so basically if of course now i can i need to delete the uh index this is another api that i didn't show you but this is just basically this delete the entire index uh we can execute this uh create index uh mapping we can i don't know perform a bulk uh operation to put some data and now if we go back to the to the mapping as you can see the first name now is a keyword the last name is a keyword and a gender is accused why i did this because now i can do very very advanced stuff like i don't know for instance aggregation aggregation is a way to aggregate data using you know keyword you can do also in a text full text but of course it's not stop uh performance so now that i have aggregate i can for instance uh answer questions like how many you know male and female we have in our database in our you know accounts because now we have a keyword for the gender i can aggregate and this is actually the way to aggregate data using the ax terminology as uh filled in the body and we can perform just a search so search is very a lot of different you know body uh type of body request anyway if we perform this and this is actually search aggregate you can see this is the result so basically that provide aggregation and this is our dynamic so in basically in the accounts index we have five 505 male and 494 female this is an example and in order to do this again i needed to specify that the gender field is a key word otherwise it this aggregation will not be possible or not so performance and as you can see we did this in 46 milliseconds so very very fast yeah i've i've learned a lot in in the art of making sure that your mappings match what you're trying to accomplish and funny when you mentioned you know adding that delete and bulk indexing again that takes us all the way back to that first part about ignoring certain errors of of why i had to do that because i actually did have to do that for this exact reason of changing my mappings it was like all right well these mappings are wrong let me re-index everything but that means i have to go in and delete it first and it's like you know what i'll just build this into the code so i don't ever have to think about that again yeah this is the the general you know path when you learn how to use elasticsearch you don't care at the beginning about mapping but when you are going deeply into the search you are starting to recognize look okay i need to aggregate this i want to search using fuzzy search here so it's better to use keywords so i need to remap so usually is a process you know it's a very agile process i would say because you you change your code according to your to your needs and you start using uh you know this uh mapping in the second phase typically when you have to play with the with the search absolutely well let's uh let's start wrapping up here and uh i wanted to give a little bit of time to talk about what's what's coming up next so uh 7.14 is out um 7.15 will be out at some point and then i know everybody's looking forward to 8.0 coming out how how do how does this release cycle affect you on the php side what can people be looking forward to in terms of php so i we are working uh since the beginning actually to the 8th version because basically now we have type internally so we know how to create the request and how to create a response for elasticsearch because we mapped all the possible scenario all the possible input and output so basically we can create an object-oriented way to create the requests and the and uh getting back the response so no more associative array because one limit is very powerful it's very simple as i show you a couple of lines you put the json or the associative array you index and you're done but if you don't know how to write the json or the associative array this can be you know a limit because you have to go to the documentation to understand how to create or some example and so on imagine that you have something that you can how to complete you know you start doing like a new search request and you have all the possible you know parameter just as a property of a class this will simplify a lot the user the developer experience that's the goal so we want to have a client that give you the best developer experience possible so the php elastic search php 8 that will be released next year of course when elastic search eight will be released we'll have this kind of you know power of course this can be a breaking change for a lot of projects so we are thinking also to have a way to provide backward compatibility maybe having an old legacy client that performs like the one that i show today so the seven will be the d8 as well but we want to give also the possibility to have a more high level you know structure for creating regression response so this is the biggest you know news and also in php especially i'm leveraging the fact that we have now php eight with a lot of nice feature like uh named arguments in the function so you don't have to specify the order of the parameter you can say the id parameter is this one the name is other so you can actually change the position of the parameter you don't have any more in php something annoying like optional parameter that is the third and you have to pass as the second parameter and null value in order to get the third one and so on yeah so elastic search eight we leverage php eight so this is also nice uh combination because we have the same measure version and eight for php and eight for ls6r so will be very nice if you want to have a kind of preview about what we are working on we released another php client that is enterprise search php and this is the url where you check it out that actually use this approach so i recently released this client so i used the you know the new technology that i was building on and this actually uses a library common library that is elastic transport php that is another open source project that we deliver in order to have one common you know core for providing http communication between client and server application not just elasticsearch because the majority i would say the total of our products inelastic are rest api so the idea is okay we build this kernel for http transport so this library will be used by all the php projects all the php client that wants to communicate to enterprise search to app search to workplace search elasticsearch and etc that's why we built this uh library and this actually use also psr 7 standard that are you know standard in the php world we don't have any more the internal details that we have with the gazelle or ring php in the previous uh in the actual you know elasticsearch b client so we are trying to leverage the new technology in the php community even because i'm part of the php fig organization that provides a standard for the php so i have to use it you know i get that well thank you and and again this is this is one of those these series are important to me because i am a client's person i am a person that um as much as i love the elk stack and we're even going to be talking about parts of the elk stack with logstash in a few weeks but i i tend to want to go i'm writing a project in this language how do i how do i interact with elasticsearch using that language instead of having to now use two different pieces of the overall project but uh thank you so much enrico for you know taking some time out i know that you're in europe so it's actually a little bit later so uh thank you for meeting with us my pleasure i see you have your your twitter handle there i guess if people want to get in touch with you they can do so there of course if you had any questions about php that we didn't cover or the php client for elasticsearch you can do that over at discuss.elastic.com and stay tuned over at community.elastic.com where we'll be talking about what we have coming up not sure what's going on with the live stream for next week but i think there might be something involved keep a lookout for that as always if you're not yet subscribed to the channel if you like this presentation give it a thumbs up that helps us know and let us know in the comments what you want to see next in uh this series but that's going to do it for this week and until next time for enrico and myself we'll talk to you later
Info
Channel: Official Elastic Community
Views: 164
Rating: 5 out of 5
Keywords:
Id: _fr8saofmNc
Channel Id: undefined
Length: 82min 19sec (4939 seconds)
Published: Wed Sep 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.