How to Build an LLM Query Engine in 10 Minutes | LlamaIndex x Ray Crossover

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi my name is Jerry co-founder and CEO of wallindex hey my name is amog I'm one of the ray developers here at any skill and we're super excited today to present how lava index and Rey can help you build an advanced query engine over your data with Outlets so first Jerry's going to give some context of how to build llm applications LMS are a phenomenal piece of technology for knowledge generation and reasoning but the main problem is that they're pre-trained on large amounts of publicly available data and so they don't inherently know anything about you yourself as the user or an organization but they're so good at a variety of different tasks from question answering text summarization even agent-like interactions like planning and using different types of plugins and tools the main challenge here is figuring out how do you actually incorporate all your private sources of information whether you're an individual or an Enterprise into your LM application this could include all different types of data from file formats like PDFs PowerPoints documents all the way to apis that you commonly use like a slack or notion all the way to databases within your Enterprise data Lake llama index gives you the tools to actually solve this and manage and query your data within your llm app it provides a comprehensive set of tools to index and ingest your data as well as provide an advanced query interface within your llm application the first challenge that you must solve if you're trying to incorporate private sources of data into your LL map is figuring out how do you actually load this data and how do you actually structure it in the right format so that the llm can understand it so to this end we offer our data connectors as well as our data indexing modules our data connectors are offered through our site called llama Hub which offers over a hundred different data loaders from a ton of different sources from different file formats to databases to workplace apps and this allows you to easily load in data from different types of data sources into your llm application into a centralized document object the next component here is being able to define the right data structures over this data data parsing structuring and indexing is essential because it allows you to define the units of computation and data that the llm will eventually operate over and so this includes for instance being able to parse a document into chunks of nodes this includes being able to Define indexes on these nodes as well for instance via keywords or embeddings and then putting it into a downstream storage solution like a vector database or a document store the second challenge that you need to solve when building an LM application over your data is figuring out how the llm is able to retrieve this data through the indexes that you created as well as synthesize a response and how to make this interface General enough to satisfy all different types of queries that you might want to run over this data so this could include very simple questions that you want to ask over specific facts this could include summarization queries and this could also include complex multi-step questions so as Jerry mentioned llama index provides excellent abstractions for solving two challenges with developing LM applications the first is during the data ingestion with llama index you can easily load files from a variety of different data sources parse them generate embeddings and store those embeddings into a vector store however when you want to scale this process up to thousands or millions of documents it becomes a challenge this is where Ray comes in by using Ray to paralyze this data ingestion process you can quickly ingest and generate embeddings for many different documents from a variety of different sources all in a much more performant fashion now the second part of the LM application is the actual querying so once we've created these Vector stores you have an LM which can access information from these Vector stores and answer user questions for example so again llama index provides the abstraction for being able to create these types of query engines so given user input you can use the alarm index query engine to get some answer to your question but when you actually want to deploy these types of applications in production you need a good solution and this is where race serve comes in using racer we can deploy our application be able to request it and be able to handle lots of different user loads through race serves Auto scaling so with the reserve you get automatic load balancing and auto scaling and a variety of other production features that are necessary for building LM applications so let's take a look at an example use case where we can actually compare and contrast the rate documentation itself and also the array blogs and we can compare and contrast how these two different data sources present raise serve itself as a technology so this is of complex question containing multiple steps and the llm query engine that we defined in Lama index is able to first break down this complex question into two sub-components how does the array docs present reserve and then how does the array blogs present rayserv it is then able to execute these sub questions according to their corresponding tools or sub query engines and so these query engines actually correspond to indexes defined over each document itself or each collection of documents so we have a collection of documents representing the ray documentation we also have another collection of documents representing the right blogs and we have indexes defined over both after we ask these sub questions over these different collections of documents we can get back some initial answers the ray docs present risk surveys and then you get an answer and also the right blog presents where they serve as a framework for it and you get back an answer from the right blogs after you get back the sub answers to these sub questions over these different data sources you can now synthesize this into a final response to present to the user that actually answers the original question and the high level idea here is that the query engine abstraction within llama index actually allows you to Define Advanced interactions such as this query decomposition over different document sources and multi-step interaction between a retrieval model as well as the llm to synthesize the response to a task at hand okay so now let's take a look at what this example looks like in action I'll be running this on the any skill platform but you could also do this full example on open source only you can take a look at the blog post as well as a public repo for the full details of the example so as we talked about the first step in this process is how we can take our two data sources The Raid documentation and the any scale blog posts and then create a pipeline to load those documents parse them generate embeddings for them and store those embeddings into a vector store so we have a simple script for that where we're using the Llama index parsing as well as embedding abstractions and then using raid data sets to scale out this process so we Define some few functions to load and parse the files convert those documents into llama index node abstractions and then use a local hugging phase model to generate embedding for these documents and then finally what we can do is then stitch together all this logic using Ray data sets and by doing this we can scale out the whole processing computation across the resources the CPUs and the gpus of our cluster this allows us to parse all the documents in parallel for much better performance once we do that we can store the array documentation embeddings into a vector store and then you want to repeat the same steps for the any scale blogs so we again stitched together all the logic and then persist the any scale blog Vector store on our cluster so I can run this and as you can see it will start churning away at our ducts cool and as you can see we're making progress on parsing and creating embeddings for our documentation okay so now that we have embedded both our documentation and blog posts and stored them into Vector stores now let's see how we can create our lland application so again we use both Lama index and Ray Lam index provides the abstractions for Defender query engines and in particular here we're using a race serve to create a deployment that we can then query later so if we take a look at this we have two different query engines one for the ray documentation and one for the blog post and then we also create a third query engine which as we showed earlier is a sub question query engine and this one can intelligently come up with sub questions that you want to ask for each data source so this is all done via llama index and then finally we wrap everything in a serve deployment and to actually run this what we can do is deploy this application so this will create replicas consisting of our application and once this is started then we can now query just to get responses okay so now that we've deployed our Reserve application successfully we can now query it so we have this query script which I'll send it a request we'll tell it to use a sub question query engine and then let's ask it to compare and contrast how the rate docs and rate blog posts present Reserve this will then generate two sub questions um so it's going to first ask the ray docs engine how does a rate documentation present Reserve then get a response then it's going to ask the blogs engine how does array blogs Reserve get a response then finally go synthesize these two to provide some overall response also asked so finished with one sub question now it's going to do the second one and finally it will synthesize the response and as you can see we got the overall response as well as the two sub questions that were asked so in conclusion there's four main challenges in being able to build an LM application over your data the first is being able to index over different data sources the second is being able to create complex queries that act over these different data sources the third is actually being able to scale indexing to a large array of documents and the fourth is being able to actually deploy these production quality query engines into production and as we saw llama index provides the abstractions to solve the first two and how you can index from different data sources and be able to build query engines to handle complex queries and raid and particular Ray data sets can help scaling out the ingestion Pipeline and Ray serve can help deploying your llm application in production thanks so much for coming and if you want to learn more please check out the blog as well as if you want to learn more about llama index please check out our GitHub and Docs

Info

Channel: Anyscale

Views: 8,831

Rating: undefined out of 5

Keywords:

Id: Vd_8lS1iDBg

Channel Id: undefined

Length: 10min 19sec (619 seconds)

Published: Mon Jun 26 2023