What is Semantic Searching? (NLP Concepts)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm walking down from nonin Castle in La stool Germany where I'm shooting video footage for my other channel which is called The Traveling historian where I'm just kind of documenting my travels across the world and talking about history and things like that when I have like a 20-minute walk back down to the train station and La stool and there's no one around so I thought I'd go ahead and spend this time and record a video talking about naturally semantic searching so what is semantic searching semantic searching is when we're able to take the semantics or the the meaning of a user's inquiry or query in a search engine but instead of using that to find keywords in a collection of documents that might sit in a database instead what we're doing is we're trying to capture the essence of that query and then return results that might not necessarily have the same keywords but rather have the same Essence as that query so what's a good example of this well let's imagine I was searching across documents related to the Holocaust and this is a real use case that I have done in the past and imagine you wanted to maybe search for and find any documents that dealt with the concept of hunger this was a real research case with a real researcher at the United States Holocaust Memorial Museum a few years ago how would you search for that if you're using keywords well one thing you could do is you come up with a list of keywords and synonyms to the word hunger so maybe things like hungry starve starvation and you'd get good results you could find documents that have those key words in there absolutely in a in a a database that's been indexed the problem is is that hunger can be expressed in a lot of different ways it's an abstract concept and it can be expressed without using the explicit word hunger or its you know identifiable synonyms so what does this mean well it means in Holocaust uh oral testimonies for example a person might talk about Hunger but not use that specific word instead they might say the expression we were uh in need of food we didn't have enough bread now in that sentence I've expressed a very clear concept the abstract idea of hunger meaning hunger is implied to be the next step in my situation the thing that I didn't do though was I didn't use the word hunger or any of its synonyms now if I needed to find that document and I were to search for maybe just the keyword hunger I probably wouldn't actually find it and the reason for that is because it doesn't have the explicit word hunger and this gets the point and the backbone of semantic searching semantic searching would allow for us to vectorize or convert that query of hunger into the essentially the the mathematical meaning of that concept of hunger and then what we can do is we can use that vector and see how it compares to the vectors of all other documents in our database rather than keywords and that means that we could find things that dealt with the abstract idea of hunger such as the sentence I just said and other documents that used That explicit word of hunger or maybe synonyms like starve or starvation so what does this mean it really means that with one one single search I'm able to find and retrieve documents that either have that keyword or have the meaning of that keyword so what's a good way to think about this as an analogy the one I like to use is to think about being a historian and walking through a library now imagine you're looking for a very specific book how are you going to find that book in a library you're probably going to go up to the computer you're going to type in the title of that book then you're going to go to the section of the library where that book is and you're going to find it you're going to grab it that would be more of your traditional keyword searching in the example of a library though vector or semantic based searching is a little different imagine you're in that same situation and you're not looking for maybe one book you're looking for a bunch of books that are similar now if you're a historian by training you probably already know what to do you're going to go you're going to find that book that you're looking for and the next thing you're going to do is you're going to peruse and look at the books that appear around it on the bookshelf because very likely they're going to be similar to the one that you've just grabbed and they're going to be ones that are also interesting this is the concept of semantic searching finding the things that you didn't know existed by searching for something that is similar to a query so that's semantic searching in a nutshell so the big question is does semantic search replace keyword searching and the answer is no they should be used as two different tools to solve two different problems keyword searching is really good when you know the keywords that you want to search for semantic search is good when you don't now there are approaches that can make the best of both of these worlds and this is called a hybrid approach now if you want to get familiar with semantic searching it's probably going to be necessary for you to Learn Python but if you want to just learn about it by using it then I've got a couple good links in the description down below where you can find semantic search engines that I've built for a couple different digital Humanities projects including one that lets you semantically query a collection of about a thousand Holocaust testimonies and another one that lets you semantically query Shakespeare's Corpus if you do have python though and you're looking to maybe up your game with a couple different python packages then here's the ones that I would recommend in order of easiness uh to more Advance the first one's going to be text AI out of the box it's let you build a semantic search engine it handles all the heavy lifing for you automatically the next one is going to be requiring you to build out a couple different components of semantic searching one being the index or the vector database and the other being the way in which you kind of vectorize everything and the way in which you query everything thing so each of these different stages are going to be handled by two different python packages one is going to be probably Transformers that's going to allow you to load up a machine learning model to factorize your texts uh I'll have a link in the description down below for the models that I would recommend for uh English and multiple languages and the other thing is going to be the algorithm in the vector database for querying that Vector database this is going to be a Noy from Spotify which is my opinion one of the better and easier implementations of semantic quering and python when I'm working on a project personally the first thing that that I do is this approach number two because it's typically very easy to get up and running and you can really use the same uh template for a lot of different projects by simply changing out the model the third option the most sophisticated in my opinion is to start working with apis now these are going to be better uh for more finalized projects and they're going to be the things that you would put into production uh one of the best libraries for doing this is going to be hands down we8 there are a bunch of different libraries out there like pine cone I happen to like we8 because of the way which you can query the fact that it's open source and the fact that it has very good documentation in a team that is very Pro open source all these things make me really love we V8 we V8 is a vector database that lets you automatically uh vectorize buildout and also even query or semantically search across all of those documents so it'll handle essentially everything for you the reason why it's more advanced is because it requires you to build out an independent server that can host all those documents and this is known as your wv8 server and then it requires you to set up an API key to then query that server now these require a couple more steps and a little bit more advanced but what you get is a lot of flexibility and something that can be put into production because the server handles all the heavy load for you rather than having to load everything up locally and when you start building out more formal projects especially ones that are larger these are very important features that we V8 gives you I'm going to go get some chocolate croissants from Aldi souit before my train ride but if you like this video like And subscribe and consider buying me a coffee down below
Info
Channel: Python Tutorials for Digital Humanities
Views: 587
Rating: undefined out of 5
Keywords: what is semantic search, how to do semantic searching, semantic search, vector databases, what are vector database, what's a vector database, what is semantic searching, how do i semantic saerch, walking nlp
Id: buFay8nCdnc
Channel Id: undefined
Length: 7min 30sec (450 seconds)
Published: Tue May 28 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.