Open Source RAG running LLMs locally with Ollama

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Victoria from web8 we've been doing a lot of work to take our open source rack application verba to the next level and I'm super excited to share it with you today retrieval augmented generation is one of the most popular techniques right now to leverage large language models in production or industry use cases in a rag workflow the user asks a question like what is weate and that query is sent into a vector database to search for related documents or chunks to that sentence then that context along with the original query are put together in a prompt for an llm the llm will then output a conversational response to the original question using the provided context there's a lot of benefits to rag systems unlike other techniques like fine-tuning rag can be super cheap and much easier to update with new data or delete old or outdated information without having to retrain a whole new model they can help reduce hallucinations and errors seen in LM systems normally and the users can directly see the source information used by the model allowing increased transparency and Trust even though there are a lot of tutorials and resources for rag now it's still a big technical undertaking to make a good Advanced rag application this is why we built verba the golden rag traper we wanted to provide Advanced rag techniques to anyone even if they didn't have a technical background and make it super easy to use and customize thank you Victoria on top of our mission to make rack accessible we are an open source company and as vv8 we want to double down on open source machine learning with this new verbal release we make it super easy for you to run any open source model on top of olama so you can build open and accessible bridges over any mode just spin up veral locally with AMA and run all the awesome open- Source models you love what about met llama 3 Mist 7 billion open source model command air from coher gamma from Deep Mind or all the awesome mix bread AI models but that's not all the new update also has a weworked interface improving the user experience and adding more control over your rack application for example conversations are now cached in the browser so you don't lose them when you leave the app when ingesting data you now receive better feedback of what's happening in the back end by receiving logging information and when quering verba you can now view the exact context that is being sent to the llm to better understand where the answers are coming from these and many more features will help with using and understanding verba verba was built from the ground up with the focus on modularity and customization you can choose and add between different data types chunking techniques and large language models with a new update you'll also be able to fully customize the look of verba changing logo title colors and more this allows you to build your individual rack app in just a couple of clicks and my personal favorite the dark mode to change the front end we have a new page for settings on that page you're able to apply all the needed changes you will also find settings like enabling caching or autocomplete suggestions but don't forget to apply these changes let's have a closer look at what you can do with verba let's look into a re use case in the medical domain when we are working in a sensitive area like healthcare we need to be careful that the llm is producing the right output navigating patient notes treatment plans or adverts drugs effects requires precise and correct information so let's dive together into our rec showcase Zana Maxima which is built on top of verber utilizing Google Gemini and vv8 for Santa Maxima we capture three patient cases this includes their medical records treatment plans medications and any adverts drugs reaction they may have experienced we also added the latest research on Cancer Treatments to provide doctors with the latest information in the domain just in case they are needing it let's jump into our life demo what are the names of our patients as said just three as this is a demo showcase but with verba you can easily scale into millions or billions of Records let's ask Zana Maxima some practical questions who is Amanda what is Amanda's treatment plan what side effects does Amanda experience what medication can be related to Amanda's side effects what is the latest research on meal threade along with having industry use cases like the medical one Philip just went through verb is also a great tool for chatting with company data either as an internal tool external or both in our online live demo of verba we've adjusted all of we's documentation and resources from our blog posts podcasts Academy you name it this means that Vera can now answer any questions about we8 or even do weate specific generation tasks like help us create content on new features or give us ideas for our next video when we ask the question what is hybrid search [Music] we get back context chunks around the hybrid search blog post which is definitely expected we also get a great answer going into details on what hybrid search actually is so let's ask a follow-up question can I Implement hybrid search in [Music] JavaScript verber responds with the correct answer yes and even gives an example Cod snippet we can ask all sorts of different questions everything from what are some Enterprise use cases of weeat to can I use weat with llama index or even what is verba the best thing is that setting up this environment doesn't take 2 weeks of engineering it's as simple as just getting it set up locally adding documents in the UI and customizing your theme Edward's going to show you how to set it up in just a couple of minutes and realistically you could have your own personal company chatbot in just a few hours of work to get started with verbot you have three options install it via pip install golden verba install it via cloning the open source repo and installing from Source or use stalker to install both wv8 and verba into one container before installing verba make sure to have at least python 3.10 installed and create a new virtual environment if you're new to Python and setting up virtual environments we have a little guide on our GitHub page as well as a readme which is also explaining the installation steps once we have the correct python version and environment we can install verba by writing pip install golden verba into the console after the installation we're going to create an environment file to specify some environment variables you'll find the full list of possible variables in the verber re me for this installation example I want to use ama ama is already installed and running for me you can visit the web page to get a guide on installing it on your own device to use AMA in verba we need to specify the URL it's currently running on and a model name we want to use let's use Lama free for this make sure that the environment file is placed where you want to use verba now we can simply start with the verba start command this will run the server and make verba accessible via Local Host 8000 we can use the overview page to see whether both AMA variables were set now let's quickly import the verba read me to test the application for that we go to the add document page select the reader chunker and embedding model let's use the AMA embedder for this once ingested we can go to the chat page and ask how to install verba with Docker this response was now generated locally and no data left by device before using Docker make sure to have Docker installed and running as verba is telling us we can simply use the docker compose up command to start the process and install both wv8 and verba locally let's use the Das Das environment file flag to point to our prepared environment file once both are installed we can access ver again through Local Host 8000 and we8 through Local Host 880 that's how you can get started using a local rag app on your device if you run into any issues feel free to create an issue or PR on our GitHub page thank you for watching don't forget to give verar a St on GitHub or even become one one of our open source contributors and make verber even more fantastic by the way we are hiring so if you want to work with a young super ambitious and fast moving team this is your chance to work with Victoria Edward and me you will find our open roles in the description and don't forget all your vector embeddings are belong to you in 2017 attention what is all you need what happened somebody set up P the bomb we get signal what main screen turn on it's you how are you gentlemen all your vector and Bings are belong to you you are on the way the vector search what you say giving away your data will not survive make your time ha ha h oh
Info
Channel: Weaviate • Vector Database
Views: 12,533
Rating: undefined out of 5
Keywords:
Id: swKKRdLBhas
Channel Id: undefined
Length: 10min 0sec (600 seconds)
Published: Thu May 16 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.