Generative AI and Long-Term Memory for LLMs (OpenAI, Cohere, OS, Pinecone)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

generative AI is what many expect to be the next big technology boom and being what it is AI it could have far-reaching implications that are beyond what we would imagine today that's not to say that we have entered the end game of AI with AGI or anything like that but I think that generative AI is a pretty big step forwards and it seems that investors are aware of this as well we all know that the majority of Industries had a very bad 2022 yet generative AI startups actually received 1.37 billion dollars in funding according to New York Times that's almost as much as they received in the past five years combined however it's hardly surprising there were several wow moments that came from generative AI in 2022 from generative art tools like opening eyes Dali 2 mid journey and stable diffusion to the next generation of large language models from the likes of openai with the GPT 3.5 models the open source Bloom project and the chat Bots like girls Lambda and of course chat GPT all of this together marks just the first year of the widespread adoption of generative AI we're still in the very early days of a technology that is poised to completely change the way that we interact with the machines and one of the most thought-provoking use cases in how we interact with machines I think belongs to generative question and answering or gqa now the most simple gqa pipeline consists of nothing more than a user's question or query and a large language model the query is password large language model and based on what the large language model has learned during its training so the knowledge that sold within the model parameters it will output an answer to your question and we can see that this works for General Knowledge Questions pretty well across the board so if we take a look at opening hours DaVinci 003 model cohes extra large model behind the generation endpoint or even open source models that we can access through hug and face Transformers we will get a good answer for General Knowledge Questions so if we ask who is the first person on the moon we will get across the board the answer Neil Armstrong so we can see that this works incredibly well for things that are within the general knowledge base of these large language models however if we start asking more specific or Advanced questions these large language models will begin to fail so if we ask very specific question about machine learning methods and specifically NLP and semantic search training methods like which Training Method should I use for training sentence Transformers when I have just pairs of positive sentences now you don't need to understand what that means if you don't no problem one of the correct answers to this should be multiple noted ranking loss or even just ranking loss would be fine as well yeah if we ask this question and we'll go ahead and ask what I found to be the best performing of the large language models so far if we ask DaVinci zero zero three this question it gives us this answer and it says we need to use a supervised Training Method which yes that is correct but it doesn't really answer the question it doesn't give us a specific method to use and the reason it doesn't give us that is because the model doesn't know this knowledge has not not being encoded into the model weights or parameters during training so it can't answer the question now there are two options we can take in order to help the model answer this question the first is we can fine tune the large language model on the text data that would contain this information now this can be hard to do it can take a lot of computational resources or money and it also requires a lot of text Data as well which is not always necessarily available if we just mention the answer once in a single sentence of a million sentences the large language model might not pick up on that information and when we ask the question again it may not have learned the answer we need a lot of Text data that mentions this in multiple contexts in order for it to learn this information well so considering that our second option which I think is probably the easier option is to use something called retrieval augmented generation or in this case retrieval augmented generative q a and this simply means that we add what is called a retrieval component to our gqa pipeline so adding this retrieval component allows us to retrieve relevant information so if we have that sentence within a million sentences we can retrieve that sentence and feed it into our large language model alongside our query so we're essentially creating a secondary source of information so going ahead with this second option of retrieval augmented ml when we apply it to large language models we can actually think of it as a form of long-term memory and to implement this long-term memory we need to integrate a knowledge base into our gqa pipeline this knowledge base is the retrieval component that we're talking about and it allows us to take our query and search through our sentences or paragraphs for relevant information and return that relevant information that we can then pass to our larger language model and as you can see using this approach we get much better results so again using DaVinci 003 for the generation model here we get you should use natural language inference nli with multiple negative ranking loss now nli is just one option for the format of the data essentially but the answer of multiple names its ranking loss is definitely what we're looking for and this much better answer is a direct result of adding more contextual information to our query which we would refer to as Source knowledge now Source knowledge is basically any knowledge that gets passed through to the large language model within the input of whatever we're putting into the model at inference time so when we're predicting or generating text in this example what we use is open AI with both generation and actually embedding which I'll explain in a moment and also Pinecone Vector database as our knowledge base and both these together are what we would refer to as the op stack so open AI pic and this is a more recently popularized option for building very performant AI apps that rely on a retrieval component like retrieval augmented gqa at query time in this scenario the pipeline consisted of three main steps the first one we use a openai embedding endpoint to encode our query into what we call dense vector and step two we took that encoded query sent it to our knowledge base which returned relevant context or text pastures back to us which then we combined with our query and that leads on to step three we take a query and that relevant information the relevant context and push them into our large language model to generate a natural language answer and as you can see adding that extra context from Pinecone our knowledge base allowed the large language model to answer the question much more accurately and even Beyond providing more factual accurate answers the fact that we can retrieve the sources of information and actually present them to users using this approach also installs user trust in the system allowing users to confirm the reliability of the information that is being presented to them let's go ahead and try a few more examples we're going to use the same pipeline that I've already described the knowledge base that we're going to be using so the data source is the James Callum YouTube descriptions data set which is hosted on Hocking face data sets which is just a data set of transcribed audio from various Tech and ml YouTube channels so if we ask questions around ML and Tech generally speaking if it's within the knowledge base it should be able to answer those questions pretty accurately so we're going to start with what is nli and our first answer is nli stands for natural language interface which is wrong and the second is correct so we get natural language inference nli is a test that requires pairs of sentences to be labeled as either contradictory neutral or entailing inferring each other which is perfect so let's try something else how can I use how can I use openai's clip easily okay so no augmentation it looks like we're just getting a description of what clip is which is I mean this is correct I used to classify images and generate natural language descriptions of them which is not how I would Define it in fact I know what that's not what I would go with uh to each clip you need a GPU and the opening eye clip repository yes you can do that and you can use the provider script to train and evaluate the model Additionally you can use it so on and so on okay it's mostly correct so from the start that's not really how I would describe clip but then the rest about using the clip depository is correct now I got a rate limit so let me try and comment this part out and try again okay and what I wanted to get is is this so you can use opening eyes clip easily by using the hook and face Transformers Library which in my opinion is 100 the easiest way to use the model and then we get this which is what some library of doing anything with NLP and computer vision not necessarily that standard with computer vision but I think I know the source of information that's coming from which is one of my videos and I probably do say something along those lines because that is what we're using clip for in this instance and then to get started you should install pytorch and the Transformers and data sets libraries which is actually usually the case using a data set from data sets and you do need to install pytorch with Transformers so that is really cool unless that's one more question I want to know what is a good what is a good de facto model or sentence Transformer model to use in semantic search and let's see what we get so in no augmentation we get a popular defective sentence Transformer model for semantic searches but it's deep learning mode which pre-trained and so on and so on not actually so here it seems like they're talking about the the standard Bert model and not even the sentence Transformer or buy encoder version of Bert so I would say it's definitely wrong so I'm hitting a rate limit again so let me comment this out and run it again okay and here we go so the pre-trained universal sentence encoder model is a good de facto sentence transform model to use in semantic search now I would disagree with that I think there are better models to use but that is actually I think one of the most popular ones to use as the sort of first sentient Transformer that people end up using or sentence encoding model and that people end up using and this is a much more accurate answer than what we got before without the context without the augmentation which was but which is not even a sentence Transformer so I think this is a still a pretty good answer personally I would like to see like an mpnet model or something on there but that's actually more my personal preference so I think this is probably a more broadly accepted answer okay so as demonstrated large language models do work incredibly well particularly for General Knowledge Questions but they definitely struggle with more Niche or more specific pointed questions and this typically leads to what we call hallucinations which is where the mod model is basically spilling out things that are not true and it's really obvious to the user that these models are being inaccurate in what they are saying because they can say very untrueful things very convincingly because these models we can think of them as essentially masters of linguistic patterns so they can say things that are completely false and say them in a way that makes them just seem true so to protect us from this issue we can add what we call a long term memory component to our gqa systems and through this we benefit from having an external knowledge base to improve system factuality and also improve user trust in the system naturally there is a very vast potential for this type of technology and despite being very new there are already many people using it I've seen that you.com have their new you chat feature which gives you natural language responses to your search queries I've seen many podcast search apps recently using this technology and there are even rumors of Microsoft with Bing using chat GPT which is another form of this technology as a challenger to Google itself so as I think we can all see there's very big potential and opportunity here for Destruction within the space of information retrieval essentially any industry any company that relies on information in some way and retrieving that information efficiently can benefit from the use of retrieval augmented generative question answering and other retrieval augmented generative AI Technologies so this really represents an opportunity for replacing some of those outdated information retrieval technologies that we use today now that's it for this video I hope all of this has been somewhat thought-provoking interesting and useful but that's it for now so thank you very much for watching and I will see you again in the next one bye

Info

Channel: James Briggs

Views: 13,817

Rating: undefined out of 5

Keywords: python, machine learning, artificial intelligence, natural language processing, Huggingface, semantic search, similarity search, vector similarity search, vector search, generative ai, generative ai tutorial, gen ai, generative question answering, openai, cohere ai, google palm, large language model, large language models explained, ai memory, hugging face, hugging face nlp, gpt chat, gpt 3 tutorial, gpt 4, gpt 3.5, gpt 3.5 chat, gpt 3 explained, gpt 3 model explained, gqa

Id: rrAChpbwygE

Channel Id: undefined

Length: 15min 51sec (951 seconds)

Published: Thu Jan 19 2023