How Good is LLAMA-3 for RAG, Routing, and Function Calling

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in the last video we looked at 53 Beyond benchmarks so I thought we'll repeat the same thing uh for Lama 3 as well so in this case case we're going to be testing L 3 on rag its ability to do query routing and Tool usage or function calling Now function calling is not officially supported in Lama 3 so we're going to be using the gro implementation for that to test rag we need a source of data so for that we're going to be using this article from wge the synthetic social networking is coming this is the same article we tested with 53 as always link to this uh notebook is going to be in the video description so first we need to install all the required packages uh in order to run this notebook we're going to be using Lama index to implement uh the rack pipeline now first we need to uh load the contents of this web page so for that we are using the beautiful soup uh web reader it's an amazing library for reading HTML content we provide the URL of the website that we are working with and it will convert it into a document object next we need to load our llm uh so we're going to begin testing it with Lama 38 billion uh version but later on I'll show you the results for 70 billion version as well and in order to uh load the model we are going to be using the amazing Gro API which is one of of the fastest or actually the fastest inference API currently available on the market so here uh I have my uh all API Keys as secrets so I will just enable access to the Croc API key right and we are good to go so it will load my grock API key and we create this uh grock llm object the model name right now we are providing is Lama 38 billion and later in the video I'll show you how to load the 70 billion as well you just need to change the name of the model all right uh if we doing rag we also need um embedding model uh so for this we are using the op Source uh embedding model which is the BGE small English embedding if you want to use mral embedding open a embedding even jna embedding or coare has its own embedding so you can just provide those here okay and we're setting both the llm as well as the embedding model in the settings of Lama index all right next we need to create our Vector store so for this video we will be creating two different Vector stores uh and the reason will become apparent when we are looking at uh the query routing so first uh we create a vector store which will convert document into chunks Compu ellings for each chunk and then uh store those in this Vector store the other uh Vector store or um actually summary index is basically summarized version of the document so these are two different options that we have and we are going to switch between them when we're doing query routing okay next we have some helpful login code this will keep track of everything that is happening now let's look at uh how the query works so we create our Vector store and based on the vector store we will create a query engine and then we pass on our query to this query engine to get responses from the model so the first uh question is how do open a and meta differ on AI tools and here is the response that we get uh based on the query that is going to be passed through the uh Vector index that we just created so the response is open ey tends to present its uh products as productivity tools simple utilities for getting things done meta on the other hand is in the entertainment business and the answer is pretty accurate based on the context that we have provided now one thing I have noticed is that uh the Lama 3 models uh actually tends to generate pretty short and concise responses compared to what we saw for 53 models in our previous video all right so here's another one what are the new features added by open into chat GPT so they are talking about uh adding voice and um image generation or image upload ability so it actually correctly identified that it has added two new feat features to chat GPT the ability to interact uh with its large language model via voice and the ability to upload images and ask questions about them uh in the previous video 53 was actually confusing some features that were added to meta AI uh with features added to chart GPT but the 8 billion version of Lama 3 does a pretty amazing job at retrieving the correct information now I asked the same question about regarding meta AI and it actually correctly identifies that uh meta AI uh unveiled 28 personality driven chart parts to be used in meta messaging apps this is actually correct so this is pretty good okay next we're going to look at its ability to do query uh routing so what exactly is query routing let's say you have multiple databases for example you are a teacher and you create a vector base for mathematics the other is for physics and let's say the other one is for chemistry uh and the user uh is trying to learn something so that they ask a question then this query router has to decide which Vector store to use in order to generate an answer so if the question is related to physics it will retrieve information from the uh physics Vector store if it's related to mathematics it will retrieve data from there right and you can expand this uh to other uh use cases as well for example if you're working in a company there are different um let's say departments you can create Vector stores for each department and then depending on the query as well as the user uh authorization level you can detect or you can select which uh Vector store to use right so it's a very important ability to have in this case the query router is basically an llm which will decide which uh Vector index to use that is going to be selected and is going to be used to generate responses now to keep it simple uh so we create two uh different tools uh and we're going to be passing out to our query engine the first one is the vector store that we created and it's useful for searching for specific facts the second one is the uh summary tool or the summary index that we created which is useful for summarizing an entire document right uh so these are going to be the two different uh Vector stores or vector indices right and the uh query engine tool using the llm needs to determine which one to pick depending on the question that user is asking okay uh so we created a multiple selector so we provided a list of tools to the router query engine that we are going to create right and depending on the context it has to select either one or multiple of these available tools so the tools in this case is either uh the vector index that we created or the summary index that we created right and can select multiple if need it needs uh so the first question was what was mentioned about meta summarize what is mentioned about other companies in the document now it actually selected only the first uh uh tool to use which is the vector index and the reason is the question is asking to summarize what is mentioned about meta which implies searching for specific facts making Choice One the most relevant right so it thinks that since it's talking about meta so it has to uh figure out where specific information right and here is the response that it generates not the best of responses it seems like it kind of uh copied uh a sentence from the provided context right but uh at least it was able to uh correctly identify that we need specific information about meta now for some reason it completely ignored the second part of the document so which is a bit concerning uh then I kind of expanded this I said summarize what was mentioned about opening I and in this case it selected the second option so it's going to be using the summary index and the reason is the question is asking to summarize what was mentioned about openi which implies which implies summarizing an entire document making option to the most relevant Choice sign and then it provides a summary of what was mentioned about openi and the summary is actually pretty good based on the provided contacts that we have okay then I actually wanted to see how a 70 billion model is going to respond so here I changed the model to the 70 billion model we are still using the gron API now in this case the responses for same queries are actually much better so the first part was what was mentioned about mattera and rather than simply copying um a sentence from the document it actually kind of summarize everything and talks about the 28 personality driv chart Bots that are created uh by meta which are going to be used it in its messaging app and then it also says as for other companies open eyes mention as announcing updates to chat GPT including wise features that allows for user to interact with language model via wise this feature gives uh chat GPT a more humanik personality right so this model is definitely doing a much better job at uh retrieving information when there are multiple Parts in the query and it does pretty good job again at uh this summarize what was mentioned about open ey as well and in this case it is trying to use the uh summary index rather than the vector store so as you would expect uh the 70 billion model does a really good job at query query routing uh for most more uh complex uh queries all right so the last part we want to look at is its ability to do function calling now Lama 3 doesn't officially supports it but Croc has an implementation for Tool usage which is basically function calling okay let's first try to understand what function calling is before we look at an example so there are scenarios in which you ask the llm and it doesn't has the ability to uh provide you an answer so in that case uh using some external tools is going to be helpful so for example uh if you ask a complex mathematical question it's better for the llm to use a calculator rather than doing the mathematics by itself right so how does it work so first you ask a query the llm has to decide whether it needs to use an external tool or not if it thinks it doesn't need to use an external tool the LM will generate a response for you right so there are still uh two calls to the llm now in case if it needs to use an external tool then first it needs to find what the appropriate tool is for that specific task uh that's basically picking the tool then it needs to make a call to that function uh usually usually the tool usage is implemented as a function it will get a response from that uh function and then that is fed into the llm and the llm is going to generate a response now grock has implemented the same thing as a external Loop uh for LMA 3 and the mixl Moe models so in this case we're going to be using a notebook provided by Gro so you first need to install the gro client for um the python we set the API Keys exactly the same way we did it before now I'm using the smaller model and it also does a pretty good job the bigger model is definitely much better uh the Demi function that we're going to be calling is uh we want to get scores of NBA games so there are different scenarios in which we have different teams uh and there are different Associated uh scores in this right so depending on the input it will pick a team and give us the uh final score all right so here's how the main Loop looks like so this is the first uh system message that goes into uh the first call of the llm and uh the system messages you are a function calling uh llm that uses data extracted from get uh game score function to answer questions around NBA game scores include the team and their opponents in your response okay so the user asks a question that is going to become part of this initial uh message or initial prompt right here we are defining the tools so you can have multiple tools but we are just sticking to the one that they have provided in their example uh and the main thing here is to um make sure that you include very detailed description of of what different tools or function do right because based on this it will select uh which tool to use very similar to which uh index to use when we were looking at the query routing example okay and then what is going to be the output so that is the team name that the model is supposed to return okay uh and then like some uh description around uh what the tool does so for example it's supposed to use the team name that is going to be a string and the name of the NBA team okay now uh so this is going to become uh the system message or the the initial query that will feed to the llm so here we we are making the first llm uh call right now in this case uh the way the groc has implemented is in the response you can check whether it decides to use any tool or not if the uh tool usage flag is true that means the it wants to use a tool so you'll have to make a second call to the llm right so in this case we first check if the tool you uh usage is true or not if it's true then we look for what tool was used and uh right now we have only one option so it will just make a call to that tool it will get the response from that function that is fed again as an input uh to the um llm itself so that will become the second query so basically we're talking about this step get a response fed into the EDM to get a final response right and that second response is the one that is going to be shown to the user so for example here's the first query what was the score of the Warrior game so it goes through the step-by-step process of whether it needs to make a function call or not so it actually decides that I need to make aun function call and the uh team name that it selects based on what is provided in the prompt is Golden State Warriors right uh and it will get a final response it says according to the call the Golden State Warriors uh and their opponent is the LA Lakers the final score was this in the in the favor of Warriors right so it actually uh got the response correct now if I uh ask something irrelevant to the uh functions or tool usage uh so it actually goes through the whole thing again uh but in this case it says I'm glad you're um interested in knowing the purpose of life but I would like to take a different approach how about talk about the game between uh LA Lakers and the Golden State Warriors right so since it's a smaller model it wasn't able to actually correctly deduce that there is absolutely nothing about the NBA games and it decided to use the LA Lakers as an input to the function call but if you replace it with the bigger 70 billion model then uh for the same query now it doesn't have any problem so it says it tries to figure out whether I need to use a function call or not it decides that I don't need to use a function call because uh it's not really relevant to the uh function that is available or tools that are available so it says uh as a final response that's a deep and complex question I'm happy to provide some insight and it gives us pretty detailed answer and it uh doesn't do any uh tool call which is pretty amazing so the bigger model is definitely a lot smarter which is expected but I think even the smaller 8 billion water model is pretty capable uh but there might be some situation in which you definitely want to try the bigger sibling okay so this was more of a practical use cases of this model and its abilities and as I said uh both the a billion as well as the 70 billion models are pretty amazing um meta has done a really amazing job with these models so I would definitely recommend everybody to check out Gro uh they also provide a free API which is extremely fast I hope you found this video useful let me know if you want me to do comparison of other models for uh similar tasks I hope you found this video useful thanks for watching and as always see you in the next one

Info

Channel: Prompt Engineering

Views: 7,681

Rating: undefined out of 5

Keywords: prompt engineering, Prompt Engineer, LLMs, AI, artificial Intelligence, Llama, GPT-4, fine-tuning LLMs, llama-3, llama-3 8b, Groq api

Id: V83SeIr10FI

Channel Id: undefined

Length: 17min 56sec (1076 seconds)

Published: Mon Apr 29 2024