Live Coding Spring, Kafka, & Elasticsearch: Personalized Search Results on Ranking and User Profile

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome back uh we have another great session coming right at you here um one of my favorite types of sessions it's a live coding session i love when people write code live it's it's a hard thing to do but when done well it's very impactful uh we have erdem talking about uh spring kafka and elasticsearch it's a great use case story highlighting how these three technologies have helped him improve her his users uh use cases and application use so i'll turn over to erdogan to get started thank you thank you thank you very much i hope uh i hope i will be able to fit everything in 25 minutes and i hope it will be useful for audience as explained greatly i am going to explain what i have learned what i have gained in one of my recent projects using elasticsearch kafka and spring framework to come up with a personalized search results uh it is going to be a demo driven presentation and then live coding as well uh i'm gonna this i'm gonna demonstrate what we have done actually and i'm gonna go continue with how we did it this is a search problem we have to find first so this here comes elasticsearch and then i'm gonna show you how to boost the search results based on content's popularity or ranking and finally i'm gonna add a user behavior ingredient to boost the results per user here is a quick uh example uh what we of what we have gained at the end of the project on the left that there is a user a network user natural user who do not have experts do not have any behavior in the system so they get the results based on community rankings this this person this user listening to turkish music so clicking typing on the one letter e this user gets the results for turkish content turkish music content and this one listening to english or foreign music gets a totally different results i'm gonna display uh how we did it i'm gonna first demonstrate as i said uh the application this is our mobile application fizzy which is used for listening to music in this page you can see that i have today recently played these two great singers they all start with letter c so i'm gonna go to search i'm gonna type only one letter c and as you can see i'm gonna get these two great artists listed on top for me based on my today's behavior uh yeah this is this is basically what we have gained i'm gonna look for the second artist if i type it correctly i will always get it jam carriage i will always find this artist here as the first one but what if i made some i make some mistakes like i put some irrelevant characters in between should i still be able to get this artist yes for example i put additional stuff spaces i can still find so first we will go into finding the content and then we will go to the posting how they can boost the content based on popularity and user behavior i always forget to mention few words about myself i started working with layer mark this year after working 16 years at turks cell at layer mark we are specialized in a gis based solutions mainly field mobility solutions uh land management and cadastral solutions and storm water management solutions we are partnering with cmmi and s3 companies i am the founder of istanbul spring meetup where we are almost 5 000 people in the meetup group and regularly organizing meetups based on spring framework everything i'm going to display today uh is listed in this repository gonna use elasticsearch you can reach everything from here what i'm gonna do i'm gonna start with uh elastic search part uh from scratch and explain how we could make this project by uh coding live here if i look at my current indices and elastic search i can see that i have a content index i have a user profile i have listen events i'm gonna delete all of them i'm gonna start from scratch so if i perform this query i get an exception index not found what i will do i will index four contents in content index with the basic elastic search default behavior and then i can see that the content are listed i can find them but what if i try to find the content with one letter i cannot find okay let me continue i'm going to try to find this artist i type two letters no luck still i cannot find a i have to type everything until i i'm able to find this artist this is not good i should be able to find with one letter two letters etc let me look at another scenario ellen segarra i'm typing the whole word but still cannot find because i have to pay attention to the accented characters i have to provide exactly the same thing but this is not the real world i have to solve this problem how i'm going to solve this problem is utilizing elastic search analyzers okay let me take this example here and then what i will do i will see the default behavior this is elasticsearch's default behavior but this is not good because it is leaving all the uh accented characters and special characters here i should get rid of so what i will do i will extend this with some special character filters and token filters i have to first i need to remove this character so what i will do i will put this character filter which is basically replacing everything which is not alphanumeric character okay let me try i'm expecting this apostrophe getting removed it is gone so i'm not expecting the user to fill in all the special characters in the search but still this is not enough because i still have uh non ascii characters i have capital letters i need to get rid of these ones as well so what i'm gonna do is i'm gonna add token filters this is an array of filters i will do ascii folding first let's see that accented characters are gone and then i'm gonna do a lowercase another token filter which will remove the capital letters looks good but this is still not enough for me to create it to search with one letter or two letters i have to add one more thing which is a engram filter what it will do it's an edge engram it will separate it will create tokens based on the engrams like this h he agm now with this configuration i can find i'm going to show you i can find the content i'm looking for with one letter in order for this to run i have to delete the content index and create the index with predefined configuration because i need all these analyzer settings in my index otherwise i still will not be able to succeed i'm going to delete the content index i'm going to create the index with a special field which has a special analyzer index analyzer and create a query analyzer if i re-index all my content again and this time if i look for s i still cannot find because uh my content the generated tokens are not stored in artist name field they are stored in artist name prefix field now okay if i run the query i can find all my for content because they all start some somewhere with s what i can do i can play with it if i want to still search for artist name which makes sense because if there is a complete match i should expect a bigger score for example this one i should expect a bigger score than the other matching content i can even do this if it matches from this field five times boosting if it matches from graphics one time boosting we do not understand the difference because there is no other content matching with csn so what i'm gonna do in the multi-match query i'm gonna add fuzziness one this will this will match other artists uh like still says announce but selena is also here but you can see the difference it is almost five times more higher boosting so going to the spring one going to spring world how i can make this uh available to my client applications is through spring rest controller spring wood project this is typical springboard project in my project i have springboard starter 2.5.4 i have got elasticsearch rest high level client and then spring kafka these dependencies will be enough to make the whole system working properly uh when i try to reach the results through my api this is running on for 8080 what i'm gonna provide is one letter s i'm gonna see the results but the results are not boosted with any factor yet let's see how we do it uh with the search controller i get the input and then i send the request to my service pin spring service bin let me make this complete the whole page if you want to make queries to elasticsearch you you need to provide search requests search source builder and then provide like i did in the kibana screen which fields you want to use for search and how you want to manage your search results for example if fuzziness 0 i'm going to give higher boosting factors but if fuzziness one because i don't wanna miss uh high pose but this time i'm gonna provide lower much lower uh boosting factor so i'm gonna get the results uh based on these these fields once i do that i need to send the request to elasticsearch and then process the response hits since i'm i don't have too much time i'm i cannot go into too much details but you can always refer to the project from github and analyze and ask me any questions anytime i'm happy to answer so let's move on to the second part which is boosting the results based on popularity let's go back to kibana we can he we can see clearly that these artists are do not have the same popularity cezannax is the most popular ellen segarra is the least popular artist in the ecosystem but i want i cannot when i do a search here with s as you saw before shakira is the first one science is an exercise the popularity is not taken into consideration if i want to use popularity uh affecting the search results i need to provide functions scoring functions this is an example here taking the popularity into account ranking into the account and changing the order of the search results it is very simple it is still a basic query the same query basically with one letter s and then some list of functions this function is pure mathematical function what it does if the content if the document has the ranking field it's going to take a log 10 of the ranking value you can apply any function you want you you need here let's run this query but this time we should see all the artists sorted based on their ranking here this makes three times a bit more than three times the least popular artist and sorted accordingly you see ranking 1110 and 1. it is very simple to apply in in your project if the user wants to include ranking in the results all we have to do is add this function which is the same thing as you have seen in kibana in the function filter function builders list and then it will be applied to the search to the search results let's see here this is my the same api the same rest service still searching with the query string s but this time include ranking true that will include that function in the elastic search request if i run it i get the same result sorted content artists are sorted based on their ranking their popularity so if i dynamically increase the ranking of one of the uh artists let's give it a try for example ellen segarra she is the least popular what i'm gonna do i'm gonna make that she is listened to 200 times okay let me see from my comment line i can see that the messages are processed first messages are sent to kafka topic sending message to topic here i can clearly see that the artist id is a4 which is ellen segarra and from the same process i have consumed producer in the same process to make the things easy all 200 messages are received and they are put in a special index now i should show you these indices okay because i have to process the messages listen these are the listen events listen event messages that i received from kafka and placed in the listen event indices per minute and then i should see user profiles are also getting generated user one profile is a containing artist a4 200 times and if i aggregate the results listen event events per artist id i can see that a4 is listened to 200 times so the process the application what it will do it will also update the ranking in the artist document so it will both update the artist's document and it will also generate the user profile which is going to be to be used in the third step uh for user based user behavior based uh boosting so i can see that it is 201 now ellen segarra she was the last one you remember when we did the ranking based search now at runtime this is quite near real time near real time as she was listened i make the same search and i can see that she is returned to me in the second place now because of this ranking value updated to 200 plus 200 volume so imagine your system like your users are listening to music or doing some activities you are updating the content documents whatever the content is it can be a song it can be artist it can be anything you keep updating real time and the search results on the field get updated automatically near real time for your users so this was the second one in my uh boosting in my agenda the third and the last one is boosting by user behavior so i need to take another function into account in order to change the results per user if a user has a profile in the system uh if i can show you again the user profile i have profile only for one user which is user one and listening to a for ellen sega i do not have another user so this user should be uh seeing different results in order to make sense i'm gonna send another uh listen events for another user which is user 2 and a3 again 200 doesn't matter once this is processed at the end of this minute i should see user profile for user 2 as well currently i still don't see it because raw listen event events are not processed yet once they are processed i will have profile for two different users until that is until that is processed i'm going to show you the function okay which is again in my repository this time i should put another function into account rather than ranking i'm going to run another function in order to change the boosting boost in order to change the score document scores this time again the same query again the same function list but this time the function is different i can determine which artists have which ranking boosting factor not ranking sorry boosting factor based on the user profile because i know which you which artists are listened to by which users depending on the user i can say that a4 ellen segarra is has boosting factor five shakira i think this one is shakira has boosting factor two for this user if i run this i will get ellen segara uh quite in a higher position with a higher score than the others and then the second one is shakira normally they would be way below in the result because they are not very popular in the ecosystem but for the users who are listening to these ladies get these content listed on top um similarly if i change for example another user listening to a1 and a2 listening to too much a1 like i apply 3 and a2 like 1 i should see again says an axo a1 a2 [Music] a2 a3 oh it is one one doesn't make sense sorry one is one multiplied so selena gomez a2 is the second one and then based on because they do not they are not interested in the by the user so they just get plain regular search scores based on the length of their content so three minutes ago or two minutes ago my user profile should be updated you see here i can see that user 1 is listening to a4 user 2 is listening to a3 this is all by done by the process here every minute it is processing the raw listen events and then updates these indices i should be able to see the effect now if i make the whole search with letter s but this time including user ranking excuse me it is false already including user profile as user one i should see a4 even though she is not the most popular singer in the ecosystem i get the results uh listed this way for user one and if i perform the same query for user two here user two i should see a3 shakira as the first one for for example user two ellen segarre is the least is the last one in the list but for user one ellen segarra is the first one uh i still have two and a half minutes to go so i can show you i have time to show you how the index manipulation takes place i have an event controller which is simulating the source of events as i clicked as i triggered the listen events for user and artist id it is going to generate that many events and then send it to kafka through kafka template how this is configured is very simple i have a kafka producer config which takes bootstrap address and very basic configuration for kafka nothing specialized here but i need a kafka template bean b with this kafka template bin i can send messages kafka template that sent what i sent is basically raw json data representing my listen event object under consumer part which is event receiver service i need kafka listener i i might might be in my service bin with kafka listener annotation which is consuming the same topic from kafka by using the kafka consumer conflict here i by providing the same bootstrap address and group id i can access kafka and i can start consuming events from kafka when i when i receive a message all i do is revert it back to listen event because i know that it is json string i can process it through object mapper read value and then put it save it in the listen event the next thing that should happen is event processing service which is updating which is aggregating the most recent events from elasticsearch listen event indices and updating uh the content index uh user profile index etc as i said i have only 20 seconds left if please if you are interested if you find this some bits and pieces useful for you to take together with you and using your professional problems professional work i'm more than happy to discuss and help you and i will be very happy if it helps you as well what i have done and what i have showed you so far and i'm done with timer [Music] awesome what a great session i love all the live coding it's super impactful and we can see people actually demonstrate everything in action um and and really show us the power of all the spring tools that we have in the ecosystem we have another great case study coming up next so i hope you stick around and we'll be back in a few minutes you
Info
Channel: SpringDeveloper
Views: 1,233
Rating: 4.9285712 out of 5
Keywords: Modernization/Refactoring, Reactive
Id: kQva1Mahx8k
Channel Id: undefined
Length: 26min 43sec (1603 seconds)
Published: Wed Sep 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.