Learning to rank search results - Byron Voorbach & Jettro Coenradie [DevCon 2018]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
first to get you energized of course the show of hands so who has kids ah almost who knows kids in this neighborhood maybe his neighbors or its brother-sister okay who used to be a kid okay so I have everybody now okay so if you if you work it you know work it you have to get you probably know this toy yeah so this is a learning toy what kind of learning toy it's a toy with which you can learn how to rank you can rank the sizes of the rings in the right order and this is something you can teach your kids and they love it only at the first try they end up something like this this is a good try it's a nice try because all the Rings are are on there and the ball was on top so this is already quite difficult but it's not exactly what we want so what we do is we show them the example again and we try to learn it and then York it tries to rank it again and it turns out like this the question is is this right or is it wrong I would say it's perfect I mean it's it's nicely ordered in a certain way it's not the exact same thing as the example but it's it's it's good so today we're also going to talk about learning and ranking only we use a different toy we use elasticsearch and we're going to combine it with machine learning I'm your token rowdy I'm a fellow at luminous Amsterdam I'm specialized in elasticsearch and I love experimenting with machine learning and if you want to follow me on Twitter or whatever then all the things are here yes my name is Byron faba I'm a search and data engineer at luminous Amsterdam and my daily work they are mostly busy with building and optimizing search engines mostly elastic search yeah so I shouldn't do the Beauty and the Beast job I shouldn't do it so don't do it ok let's talk about ecommerce and ranking and search so when I was preparing for this talk I I do talks on a regular basis it's not always the same subject but I do some talks these years I wanted to treat myself so I wanted to buy something so I had a I had a thing that you could move the slides with but it was a bit old and rusty and I saw this TED talk and someone had something there that you could do you could point and you had this I thought well this is cool I want that as well so what I did I went to to Internet of course and I started looking at the wake up because I knew you can buy more stuff at the wacom than clothes and I found it it was here the fifth result but yeah I'm Dutch so you want to see if it's cheaper maybe somewhere else so I tried cool-blue and it the same thing the same search and yeah there it is again that's slightly more expensive but let's try another one so we went to I went to the website of bottles comb and it's not there so I happy to know someone who works for bulldoze comb or at bulldoze comb and he knows a lot about search and I'm asking him now if he can maybe explain why this cool device is not on the screen yes so for the last few years I've been helping our poldek home with their transition to elastic search so I know a little bit what's going on here one of the major differences between bulla calm and bay calm and cool blue is that I don't know if it's really visible for people in the back but you can see that there are 237 results for the word presenter which is a lot more than we saw on the other two websites so both are comments a lot more data now another thing you notice is we just search for presenter which is a pretty common pretty broad term meaning that we can get any presenter that is out there so if we dive a little bit closer in to this example let's change our search to logitech presenter and we can actually see that the the product actually exist that bowler come so they're also selling it but if you look on the top you can see a drop down in which you can sort the products from the results from your search and by default it says we sort on relevance of what is relevant for jethro a more expensive fancy presenter is more relevant but for me I would just go with a cheap one because yeah although the feature is cool it's a lot of money so what we're gonna so the relevance part is pretty important because the first thing that people see when they see the search result is the first hit or a second hit and it actually has been shown in studies that the click-through rate for positions is higher in the top like three results so you want to make sure that if you sort on relevance that you have the most relevant products on top for either that user or for your business so since Bolcom is not like software we have running on our computer but we want to show you today how we can improve the search or improve the relevance by using machine learning we made a small demo we have a search engine here which contains the top 500 albums from the Rolling Stones not the Rolling Stones itself but vital top 500 yes exactly so and yeah Jethro can show that we can search on that yeah so let's do a few of the queries first long for instance you - you can see that results are quite ok the first five albums are from u2 and then there's radio ads radio ad Beatles and of course we can look for something else as well let's get a little bit more guitars let's search for Metallica there's two albums from Metallica and the top 500 but there's also the Beatles ok why is there the Beatles so we click on it and maybe we can okay so for some reason Metallica is in the information box of the Beatles as well so this album will be in the result list for the Beatles let's do one more let's check rolling which I don't really like to type out rolling stones completely but if I type rolling from the rolling stones I see the first album is from the Rolling Stones but the second album is howling wolf the third world Professor Longhair dr. John okay this is not what I was looking for so let's see if we can improve this yep so just to get back on the demo how do we get from our search term you to in the first examples to a list of albums like how did we build this well meet elasticsearch can I see a show of hands on how many people here know elasticsearch that's a lot maybe we can go faster yeah then we can go a bit faster on the next slides because we I prepared a little intro into elasticsearch elasticsearch is a distributed search engine we have been using it since the beginning we love it we use it for a lot of projects we help commits on it so we know we do a lot with it but actually how elasticsearch works is you have elasticsearch standing there it's your you have an index and what you're going to do is you're going to feed data into it document and in our case right now it's the data from the rolling 500 so we store the documents into elasticsearch and at that point the documents will go through an analysis phase during this analysis phase tokens will be extracted from the documents so that if we search for you two we actually get documents that have you two somewhere in the text the analyzed data gets stored into an inverted index and I'll show how that looks like and how that works in a second which most of the people here already know and the next step would be we have a query which we can send to a lessig which will match the documents in our infrared index and the result will be returned so what that looks like if we start adding our data from the ruling 500 let's say we have some documents that just contain an artist field so we have for instance we insert a document for YouTube with Doc ID number one it gets put into the inverted index we get the term YouTube we have a doc ID and we have a total term frequency so how often this term occurs in the complete corpus of your index so if we move further and we start adding more data you can see that the input text gets split on white space and gets put into the inverted index that's how we found rolling and I will find rolling stones you should type both them now let's add one more Rage Against the Machine and the last one is when you have another document that contains a term which already exists then it will be increased so the extra doc ID will get added and the TTFN will be increased by one so this is in short how elasticsearch are actually leucine internals work just to quickly go over what documents and queries looked like for elasticsearch this is an example of a document you can see that we have an album name we have some information we have a label we have artists and these fields are fields that we can search on which will also be using later to improve our relevance this is an example of a query we have an index called rolling 500 which we're querying against through the JSON kuroh API and we have a query where we search for YouTube and we search on the three fields which we just indexed so artists information and album so what happens on the water is as soon as you send a query to elasticsearch it will run against the inverted index and it will try and match the tokens from your query against the tokens inside the inverted index and from that matching documents are returned and let's say these numbers are document IDs so by matching these document IDs are returned then the next step will be is to score those documents which will be done by a ranking algorithm internal into lesyk this is the default ranking algorithm in elastic and this might look a bit scary at first but it's actually the TF versus the IDF with some normalization factors and the takeaway from this is that the IDF score is based on how often a term occurs inside the whole corpus so if your search term occurs more in other documents then your document will be less scored and then for the term frequency let's just go through this so the amount of dogs have more often occurs the lower the score is then for the term frequency means that if the term occurs more often inside a document then the score will be higher so just a small takeaway that if you incorporate the field length into that that the longer a field is the less important certain document is so if you have searched for our you - you have a hidden a title which is a really short field then it means that the document is probably more interesting than for instance the metallica we saw which had a match in the information so next to having be m25 by default it can give a pretty good score but if you just use the basics for our Vlasic but there is a way of modifying that the relevance yourself so you can add certain boost parameters or you can add some functions to improve the score so on the left you can see that we do a query where we search for YouTube but if the document matched in the artist field then we say it's twice as important then it would match in the information field so we already have some knowledge of front knowing which fields are more important than the others then you can use field centric boosting the second example is a function score where we say we have some information about this document inside a variable inside this document which says how important it is so we can have we can store the amount of likes with a certain document so that we can later say okay we want to boost on the actual value inside the document could also be price likes favorites whatever and then there's a third one and that's the decay function so for instance we search for you - and we say that every album that has been released in the first year from now it's more relevant than the older ones well for the Rolling 500 case it's not really a good decay because all the albums are super old you didn't know most of them know it's all for my time so these are like just three ways of changing the boost and changing the relevance brings to the next way of doing things yeah so blackberry mentioned you can influence the boosting yourself but then you have to decide on the boosting factors yourself as well which is not always easy I mean I visited customers where there was in the code a factor of two thousand three hundred and seventy-six and I asked them what's what's this boosting factor yeah we don't know it just seemed to work that that was actually the answer and the problem then is that the next person that comes by it says wow I'm having this query but I would like to have this result so I need to change this boost factor a bit and then this query works better but now the other one doesn't work anymore so you could do this by hand with a lot of testing but what you could also do is you could use learning to rank which is a way to use machine learning using data to find the most optimal yeah well what you would you could call boosting factors it's just it works a little bit different which I hope that will be clear at the end of the presentation so on the slide there's also definition from learning to rank I think every serious presentation these days includes a link to Wikipedia so I have it check so this is a serious presentation yeah so let's continue we're going to talk now a little bit about machine learning and actually we're going to talk about supervised machine learning if you want to know what unsupervised machine learning is then come to me after talk because we're not going to talk about this what supervised machine learning means is that we have a data set which has an input and a known output and then we can use this input and output to train a model let's give an example with the housing market so imagine you want to estimate the selling price of a house based on the square footage and the amount of rooms so I have an immense 'iv number of test data 3 and with this data I know that my input is the number of rooms the amount of square footage and I know that it was sold for this amount so now we're going to choose a model use this training data to configure the model and then with the model we're going to predict for these same houses the price or one could argue if this is a good prediction or not so there is an error between the actual labelled output and the predicted output well this is what machine learning is about so now in the next step we're going to tune the parameters of the model a little bit and do a prediction again see if the error becomes smaller and then with the smallest error we know that we might have a good model to do the prediction well that's nice for houses but as you saw in the example we're not selling houses we're having a search engine well it could be a search engine for an e-commerce website but it's different what we we're not looking at the right price well that could be an interesting example though but that's not what we are going to talk about what we are talking about is having the right order of the documents that you return so it's a slightly different problem if we go into the search area we have in this case four inputs being for queries and for each query we know the order of the documents that we want to return so we have again input and we know the output so we can do the same thing we can come up with the model train the parameters and do a prediction now that we have the prediction we need some way to compare the predicted output with the actual output to train our model so there will be an error between these two we're going to talk about what the error looks like but again using this error we throw it back into the model and then try to tune the parameters better do a new prediction and see if it improves so there's a few ways to evaluate the error to do the model evaluation the errors one is binary relevance with binary relevance we look at a document we say this document that you returned is relevant or not well there's two methods that we could do the mean average precision which we will have a look at and there's precision and there's lots of others as well another way is to use a graded relevance so now we have it's it's slightly more relevant so we could say from 0 to 4 if it's a 4 is really relevant if it's a 0 it's not relevant at all and in between there is a skill or elephants are great this is used by the we practice this like 2 times and I keep forgetting the D this counter thank you very discounted cumulative gain and the second one is the normalized discounted cumulative gain and this only takes the relevance then there is a second graded relevance which is called the Cascade based and this also takes into consideration if you click the third item there most likely the two items before that were not relevant to you at least that's what we think because you click the third item and that's what the error does well to give you a slightly better understanding of what we mean because this is this concept is really important let's check with an example so we have the query YouTube we returned five documents anyone colorblind here oh yeah I knew that one yeah so I'm going to explain it to you so there's two colors green and orange the top the third and the fifth one are green and the second and the fourth one is orange no no but now you know okay so so keep it in mind for the other side yes we'll call it green and orange so when we calculate the average precision we take the ones that are relevant so being the first one is relevant and out of one dead one is relevant so we call it one then the second one is not relevant so we skip it for now the third one is relevant but now we have from the first three items we have two relevant items oh that's zero six point seven do it again for the fourth and fifth the fifth one now three out of five is relevant so we have three efforts Precision's four one three and five and then we take the mean average precision which takes the mean of these three so this is what's called the map of five map as five this is one way this is binary relevance relevant or not so if we would go to the discounted cumulative gain then there you see that there is now we have an estimated output but there is a real thing in there as well the colors all the same by the way Braille is 3 means that the actual document ending in the output that we know that it has a relevance of 3 but the order could be different but the output is take that the relevance is taken from the output itself and then we calculate the discounted cumulative gain and what it does it takes the position where the document is into consideration and we expect just like with the click-through rate that barians showed that the lower the document isn't the less the less important we think it is so if the documents is on top and it is right it's the most relevant one then we give it extra and if it's it's down the list then we give it less less numbers so if you would do this all the way with all the 5 steps you will see that in the end we have a DCT value of 807 but if we will take 10 to 10 items then it would always increase if we take 15 items it would increase because it's its edit it's cumulative so that's where the normalized DCG comes in place and there we first calculate the max DCG with the actual values so the second column with the y is what we know this one would have been the ideal output and then we calculated ECG and we also calculate the DC for the predicted output and then we can calculate per line what normalized distributed distributed that it's with the distributed and ECG value is I hope that this makes it a little bit clearer than just the description so a little bit of recap we have input and output x x and y we put it into the model we have chosen an algorithm and now we're going to tune the parameters we're going to do a prediction then when we done the prediction we use a cost function and the cost function checks the predicted output with the output that we know and it will tune parameters and we try again until we find the most optimal value for the cost function so there's four the model there is a few different approaches there is first the point-wise approach where we would look at each document separate we would calculate a score indicating how relevant that document is and then we just sort based on the score this is what default elastics which way is and what a lot of other search engines do as well so this would be the outcome of the order because the score is now this ability is now sorted from top to bottom the pairwise approach is a bit different there we go through all the documents we make pairs we don't really calculate the score the only thing that we say this one is more relevant than that one and that's the way how we sort them so we have two and then this one's more important so in this case the 23 document is more important than the 88 then we would do that for the for the next pair we would compare 80 a to 40 45 and in this case we say 45 is more important than 88 but we still have to do another comparison because we have to compare 23 with 45 and we say that 23 is more important we get the same result back now there is a final one and the list wise approach and this rights approach does it all at the same time it just takes the list and somehow determines the right order yeah somehow I I didn't know how to put it into a picture because it's quite complicated it's also the newest ones and this is what a lot of lot of smart people at universities are looking at at the moment usually I stick to the pairwise but believe me they do come with a right short order and but we you can try it yourself so how can you try yourself well we do have a few steps with learning to rank first you have to create a judgement list this is what what would be the the actual values that you would expect so the ground truth then you have to define the features for the model that we are going to train on and have to lock the features during usage if that's not clear stay with us is become clear where's the Train and test the model then we have to deploy the model to production and there is a feedback loop that we can do even more so the judgments list what I already mentioned it's about label data there's two ways to get this judgment list one is an expert panel the problem is that it's very time consuming well luckily for the Rolling Stone 500 you have the expert panel on the floor I mean we are by far the experts on this list so we we really knew how to create this judgment list the problem is it's error-prone because there is a matter of opinion in there as well and you have to have a little bit of context so to see if you believe me we're going to do an experiment yes sorry I have to ask you to raise our pens more often but this is the final one we're going to do the bet or not experiment so you tell me if you see a bet on the picture if you see a bet raise your head I think I think which I think everybody knows this is a bet okay Anna I knew it love I knew it would be you'd raise his hand okay so I think we agreed these there's no pet on this picture I mean he's kind he's nice but it's not a pet enough who thinks it's a pet I see a few so if you think this is a better you will be in the expert panel then there would be pet as a label but if it was someone next to you who didn't raise his hand it could have been it it's not a pet so that's what makes it hard with an expert panel it's opinionated so you have to give a little bit of guidelines and yeah there is a risk yeah so another way using an actual panel has its advantages but also disadvantages so we can also use implicit feedback an implicit feedback has to do with local user behavior for instance we compare the actual clicks to the predicted clicks so if we think that 20 percent of the people that come into our search engine would click the first item and we see that it's actually just 10 percent then most likely that item was not relevant because we would expect 20% and it was 10% so it's most likely not relevant another thing that you should also take into consideration when using clicks for judgement list is that a click might not be a judgment if we look at this example where someone would search for Adobe the expert panel would most likely put the Adobe home page on the top and would say this is the most relevant item but if you would see that click statistics it could well be that this one the PDF reader would be clicked a lot more does it make it immediately a lot more relevant than the other one I don't know it could be because a lot of people click it but like 2 3 well maybe five years ago most likely there will be flash player in here but yeah these days that's not relevant anymore so when you have these implicit feedback you can do two things with it you can use it as a feature which vine is going to show us in a while and you can also show you can also use it to create this grand truth so really create this list so that you don't need an expert panel anymore and we go to show that at the end as well alright so now we have some background information on elastic and learning to rank now let's do the fun stuff let's try and improve our search and by incorporating learning to rank so the guys over at open source connections built a plugin for elastic search called elastic elastic search learning to rank which allows us to define a model train it and use it to rescore our documents so let's take a look at our judgment list this is the format of a Python library that we use and which it needs to parse in order to train our model so let's quickly go over it it might look a bit a little bit like a mess at first but it's actually sort of like a table structure so on top you can see gray query ID doc ID and title which already column said the data needs to be aligned in so as you can see on the Left we have a set of great for meaning super relevant is meaning no relevant at all then next to that we have a queue ID query ID if you look on the top you can see that Q ID 1 stands for u2 q ID 2 stands for Metallica and Q ID 3 for the Beatles if we take the example of Metallica which we saw before we had two hits and we had a third hit for the Beatles which is also being reflected here because you can see that on the left side where we have a score of 0 we have a q ID of 2 for Metallica and we can see that that was the Beatles hit so the next row is the document IDs which you need to add because you need to know which documents were returned for the certain query and what is relevance was and we have a title next to it so this is our ground truth this is what we say this is the actual order things need to be done next so now that we created our ground truth we need to define features and four features we need to say what is actually important about these documents for using features you can use raw term statistics for which the guy said in the plugin allowed to get those information out so you can get the raw term statistics the document frequency the total term frequency and you can combine those with min and Max so you can say I want the max frequency of a certain term as a feature and all these features are defined as elastic search queries so what that looks like it's like this we define a feature on our artist field because when a match occurs in artists and that's probably important so we want to train on that so what we will be doing for our example is we'll be creating for features for features that were going to trade our model on first one is artists and we have two more on information and on album so we have three text fields that we want to train on the fourth one is is we try to incorporate clicks into our example so as you do search for YouTube you click on one of the album's a click is stored with the document so we know how often it has been clicked so we can add a function score as I and earlier and we want to use that as a feature for training a model next step would be logging the features so we have our four features and now we're going to use our data our judgment list to run through a model to actually lock the features so lock the actual values now all we can see is we have one with a colon and then the score the first feature that we added was on artist and as we can see some of the values have zero meaning that for that query that elastic search query there was actually no hit so the score is zero the other ones have a little bit higher score and that's actually the raw BM 25 score of the ranking algorithm I explained before so we can see the scores for each of the features and we will put this in the model and later we will use this for actually training and testing the model so the plug-in on the water uses rank lip which is a Java library which contains ten different models that you can use you can specify separate training validation and test sets and what that means is that you have a set of data which you know you're going to be testing with you take roughly about 60% of the data for actually training your test set then use 20% for testing the actual model and as soon as you did your your testing and you think you have the right parameters you use it the last 20% to actually validate that your model was not purely optimized for your test set and if you have some features in your view some data which is not normalized but you would like to have it use it normalized then the plugin can normalize the data for you before using it for your model so if you want to know which kind of models are supported in the rank clip then now would be a good time to take a screenshot because I'm not going to go over all of them first off because it's too much and second because we haven't used all of them yet so we don't know exactly for which use case to which model will best apply but a takeaway from this is that there's a lot of different models for your use case a specific model might work best then in the plug-in there's there are a few evaluation metrics three of them yet we already talked about and there are three others that you can also use so what we did is we have a judgment list we locked our features so we made an overview like a matrix for seeing for which model with which evaluation metrics gives up which score and we take out two of these because two of them active one of them actually really stands out and that's rank boost you can see that these scores did you see except for the DCG are all normalized meaning that a score of 1 means you have a perfect list so your test set actually was perfect according to what comes out of the model now for a rank boost you can see that the scores are 1 so this could mean that either it's really good or it's just one because of overfitting another one that is actually some pretty good scores is lambda Mart and that's the one we chose for our demo which jethro will show in a bit what that will look like and what it will do to our search results so next step we trained the model we validated it we had some error cost function tuning our parameters and now we can deploy the model so the model is it where we can upload the model to elasticsearch and from that point we can use that to rescore documents based on our model so this is what that looks like in an elastic search query so we still have our query research for album we search for artists research information but what we have now extra is we have a rescore in our query and what this actually says is we want to rescore the top 20 results based on our model test number 6 so what that will look like we're gonna see now so we extended our search with not just having basic search or you can actually compare our model versus or original score yeah let's search for something okay we're going to search for something well let's first check the rank rank boost one which had the factor of 1 and look for something like YouTube that we tried before well this result was already okay and luckily it stays ok so the model doesn't make it worse that's that's good to know as well now let's try another one let's see if we could do something that I know we didn't really train on the rolling was not in our judgment list and now let's see what happens so on the left side is the one without machine learning without the learning to rank and on the right one we did apply learning to rank in this case the rank boost model and we see that it doesn't really improve so most likely there was a problem with overfitting in the rank boost case which is not that weird if you would see the judgment list that we created because I think it has like five queries so it's not that that's strange but now let's try it with London Mart with London market results become a lot better as you can see on the right side now we see that the Rolling Stones are mentioned for all the top items so I guess in this scenario it worked pretty good yeah pretty good so we could say let's try one other let's try The Beatles for instance and the Beatles is also quite ok and the other artists you want to look for it's the Rolling Stone top 500 so there's no the love in there ok let's move on to the presentation ok so I promised you a point 6 which is called the feedback loop and it's called feedback loop because we use the feedback that is made by the users so we are going to use clicks and we use these clicks to come up with a judgment list that's the goal so we don't have to have an expert panel anymore we can do it using you guys just searching and clicking and interacting and giving us feedback so there for the click models you can do a few things you could use the random click model in this case we don't care where in the list the item is the item that clicked most is most likely the most relevant item that's what the random click model does then we have the click-through rate model and this one is more like I discussed a few minutes ago that if you click on the first one but your first one has 10 percent of the clicks while we would expect 20 percent of the clicks its most likely less relevant and if you click on the fifth one which should have around five percent and it turns out it's it's around 10 to 15% that is most likely more relevant and that's what we use in the model as well and then there is the the Cascade model in the in the Cascade model we assume people make only one click during their search session so they do a query and they click on one item if it's the fourth item then that item is the relevant item indicating that the other three that were before are not relevant well this is a bit weird if you think about how people interact with a search engine they often click on an item see if it's relevant or not if it's not relevant they go back click on the next item and it's not possible with the Cascade model but with a dynamic Bayesian network model this is what it's possible so with this model we can record multiple clicks and if user clicks on the second item comes back clicks on the third item comes back clicks on the fourth item moves away then the fourth item most likely was the one he was looking for or he just gave up yeah that's hard so let's assume that everybody finds what he's looking for and if he leaves he was satisfied that's what the dynamic Bayesian network is about so in this diagram that it shows a little bit what with what I discussed so you have the top one with the East the e says a user did examine a URL then if we examine the URL and which is a good container snippet and maybe a picture and he founds it attractive than he would click on the item then he goes to sort of a landing page of that product then he checks it out if he was satisfied he will leave if he is not satisfied he will come back and then he will take this arrow on the right side and he will take the next item to examine and then we move from and we continue then at the bottom there are two parameters the AU and Su these are parameters to tune the model so these are the ones that we are going to optimize if we start training the model and I start doing predictions with it if you want to learn more about the dynamic by using network model there is a link on the right side that you can use with an article explaining it a little bit more extensive so let's discuss our setup on the Left we have the screen it's our application so here we record the clicks of users these are written to Los and these looks are parsed using logstash and sent into elasticsearch then we have an application the rolling 500 and it takes these these click close and he creates sort of a structure that is required by 4 epsilon another python library that we use and this 4 epsilon library takes this this learning data in the specific structure and it creates our judgment list Wow almost it creates something wet we can use to create our judgment list then when we have these judgments we will send it to our plugin the learning to rank plugin which updates the models sends it back to elasticsearch and we can start all over again that's what we're going to do so this is part of the logs this is in the format that for epsilon requires let's zoom in on on one line this is one query that someone searched for madona and just to give it a better view it's the same thing the first item was the query ID the second item was the text that was actually query to party on then we have some things that we don't use but you could use them if you want to to make it more advanced then there is the document IDs which are the IDS that were actually returned when executing that clearly and also in the order that they were returned and the final row that there's the false false false again that we don't use that's why it's false and then there is a clicks which registers the clicks that the user did so in this case which item was the most relevant item according to dynamic Bayesian network the third item yeah thank you so this is what it looks like for epsilon generates this list in the first indent well in the second column so it's this this list I can use it once more just because it's cool and then we have another Python script and despite the scripts then parse it into our judgment list structure and this is something that you should recognize by now this has all the queries that we've done this is what Byron has shown us and this is the judgments so it's the same thing that we did before okay so we do a demo again and this time we're going to to use this this file that we created and we're going to put it into into our system into elasticsearch so we run this Python thing you see I'm a Java programmer that's why I call everything despite and thing because I actually don't really understand it a lot but I'm learning learning about machine learning so I also have to learn about Python because a lot is done in Python if you look closely you see at but there's something called finished and then there's an end ECG value these are all the different models that have been trained and then it results in to an end ECG it's not really important to to show just believe me that it's there and then we go back so now underneath we change the models in elasticsearch and we're going to do the query again oh this didn't really help that did it so the result became crap I think in this case we could say that for the Beatles the model didn't work very well and I think again for rolling the model also doesn't really work very well so what happened well I told you in the beginning that you have the best expert panel on the floor so of course we are a lot better than the machine learning algorithm and we can do a much better judgment less than the machine learning algorithm can do or maybe we choose the wrong model or we don't have enough data and if you look at the the sample that we created you could also see that there's a if you look for you too then you could click all the you two albums but maybe I'm curious to see why if I searched for you to Radiohead would be there as well so I click radio ad or just what I did in the sample with Metallica I clicked on the Beatles and then I search in the Beatles why it was returned from Metallica well if you think back at what happens I would get back this array of for Metallica three items and the third item was clicked because I was curious why Metallica was in the Beatles was in the list for Metallica so actually I messed up so don't blame machine-learning don't blame the model I just took the role model for the problem at hand and I think that's very important when you start experimenting in this world that what could work for someone doesn't always have to work for you as well you have to think close about what is your intention and you have to experiment so try different models see what you get what you get out of it check the scores check the evaluation metrics that we discussed also play around with the evaluation metrics it could well be that a difference if you're a symmetric works better for you with another model and yeah I think in the end the goal is to have fun while doing this and if you're in the business world then prepare to spend time on it if it's more than a playground then don't feel that you could fix this in a week this is something that you have to keep doing because things change and you have to look very closely at what happens the data that you collect do I really collect the right data is my data clean there's a lot of steps that you can take that's it so we are on time so we have time for questions if you want let's go to the middle first do you think a user would do that if you think about the context in the beginning that we're going for for for our bold old home or another ecommerce website I think what they did is to provide a sorting algorithm at least for that you could change the sorting but I don't know do you know statistics about how many times it's used no no but I do know that what you're saying is actually going more into the way of personalized search so what you could do is you start you could record searches and clicks for a specific user and train a specific model for that user so that's something you could do which yeah depends on how large your business is that you actually want to do that because it can be very costly to do but I don't know if you would let your users themselves choose a model for sorting but so it's a good discussion if you have like the the bolded chrome website which has a lot of different categories that they could train the results for one category different than for another category for instance that is something that is being done ok other questions yeah so when you I think you could do that with machine learning but there is already an easier way of doing that by just using boost values so if you know that a certain user certain searching something you could fetch their information to see what they think is important and you could add boost factors for that so that's the easiest way of doing it yes I think you could use a model or machine learning for that the only downside is that we've seen that you need a lot of data before actually things get more relevant and yeah we just started with this application and then we noticed that we really need a lot of data and I don't know how much it would actually need for things to become more relevant for a user so I don't know if if machine learning that sense is the best fit for that but you can already do that by default in elastic it would not be smart to start out with this I mean I would always start with a simpler approach and only if you're not satisfied with the results jump into this track because it's a lot more work so not really clear on this session yeah so when does the session end is a good question of course and it's I could say the consultancy answer it depends yeah to my opinion the session for us the session hands ends the moment you start a new creative so at the moment you change the query then you start a new session but as it that is in the model that we have been using so far I know there is a lot of experimentation being done to collect everything and to see if people refine their query so if they I think the presenter case is a good example so if I was searching for presenter then Bahrein showed us that you would get a lot better results when you add Logitech to it and in the current examples that would be a new query so it would get a new query ID with a different different flow but yeah yeah so that's the way we have been using it so far but there's also a lot of different ways you can do with that like if someone D just minimally changes their search term to something similar you could use that information also in your feedback look to for instance automatically train synonyms or do spelling correction because you know that users have been changing their career to refine where they actually want to go and as soon as they buy a product after doing three different searches searches you might be able to link all of those searches together and improve the next time someone searches for that term yeah yeah but it's hard to narrow that down to like out it could be because yeah if someone uses a synonym for a search term not like would you say that's still the same search term yes other questions it's yes how much data do you need i I really don't know that's I think it depends a lot on the amount of features that you want to introduce depends on on the amount of records that you have so if you have like we do 500 records that's of course different then if you have a product database like bullets come with millions of products so it's really hard to do to tell it's really experimenting that's that's the only way to find out sorry I would love to give you like an answer like per record thousands or something but it's it's not there now one of the good things is that we did see that with a small judgment list we could improve for the search for a rolling pretty well actually so yeah read the pathways was a bit cheating yeah a little bit come to our booth if you want to know the cheat we didn't achieve with the model by the way that's also real but it's yeah it has something to do with how we created the judgment list so if you want to get the ins and outs in that case then come look us up there's this nice feature in the app where you can track speakers so you know where we are and you should be able to find us other questions I think it's time for lunch yeah thank you
Info
Channel: Luminis
Views: 4,324
Rating: undefined out of 5
Keywords: Devcon, Luminis, Devcon2018, Luminis Academy, java, nljug, Byron Voorbach, Jettro Coenradie, Elastic, Elasticsearch, search, search engine, Learning to rank, Machine Learning
Id: TG7aNLgzIcM
Channel Id: undefined
Length: 54min 10sec (3250 seconds)
Published: Tue Apr 17 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.