AnythingLLM | The easiest way to chat with your documents using AI | Open Source!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey there my name is Timothy karambat and the tool that I'm going to show you today anything llm is going to change the way that you set up and chat with any kind of document using an llm this could be PDFs it could be word docs it could be anything anything llm is the easiest way to unlock that ability without having to worry about technical overhead having a huge you know amount of RAM on your machine this tool can run passively in the background of your computer and you won't even notice it's there with that being said let's talk about how it works just a quick overview this is a tool that is open source it's MIT license you can do whatever you want with it mess around with it doesn't matter this tool primarily functions off of two commercially available very easy to get and pretty much nothing costs nothing you can just use the open AI API it will cost you very little to run this tool and also a free pinecone.io for the vectorizing in the vector database these two things are just hosted on the cloud you don't have to worry about them they don't take up any space and this tool manages it all for you you just put in the API Keys it's super simple now I do just want to take a second to mention you've probably heard of these things like Pinecone Vector DBS private GPT local GPT all of these things I'm sure you've heard of and those are also all like very good reference points for you to have stepping into this project I just want to highlight some other projects and what the difference between anything llm is and some of these other more well-known projects so private GPT is great it works in a very similar fashion however it is only a command line tool so you don't really get a nice user interface like anything llm that being said it also requires you to run a local llm on your machine which if you're like me you probably don't have a super powerful machine local GPT which was inspired by private GPT also requires you to run a local language model which of course requires lots of ram it does run on your CPU but it does require a lot of overhead to run you're not really running this back in the background of your computer and then of course GPT for all of course all the projects that I just talked about all have one similarity where they all run on a private llm on your machine and that's great but I wanted to build a tool that is more accessible and also frankly more powerful by just using the tools that we already know how to use also it can run in the background that's kind of my main benefit here and that is why I wanted to Showcase today anything llm which will be on GitHub it's MIT license and the link to it will be in the description of this video now if you are a GitHub user you probably only need these three things and you probably already have them installed you need access to a terminal if you're on Mac that's already there you need python at least 3.8 if you're going to be doing any kind of data collection which I'll Showcase in a little bit and then there's just yarn or npm and node.js those are the only three things you need to run this first let's start with what I'm looking at here is the collector folder The Collector folder is just utilities it's scripts it just makes your life easy so let's show what collector does so here we are in The Collector folder of the project I've pulled it in from GitHub and I have installed the proper dependencies just you know this is all in the readme of this folder now there are two main things that the collector utility can do and keep in mind this is a utility it's just supposed to help you collect data so the first script is just called as you would expect main can do a couple things right out the box it is by far the easiest way to collect information you see that we can collect a YouTube channel an entire YouTube channel it will grab the transcriptions for you automatically sub stack give it the author of a sub stack and it will go and find and collect all of their publicly readable articles this won't work for subscription-based articles mediums same case with mediums find a publisher put in their link and it will go and collect all of their recent public documents article or blog links just come across a random article or a random blog that you just want to pull in give it a link here and it will automatically pull that data in and then of course get books for those of you who are looking to automate or simplify documentation this will go and scrape an entire get book regardless of whatever it is and it will vectorize and pro get ready for that information to be processed so you can just chat with it that being said I did promise that you can do this with local files so let me show you how that's done the second script is called watch and the way that it works is there is a directory in this repository that when you run this script it is called hot dir hot directory any file you drop in here will automatically get detected and will also get processed and converted into essentially it being ready to be consumed by a language learning model like Chachi BT or something like that and the UI which I'm about to show you next will actually handle all of that for you so the next step is well how do I boot up the actual app this is the thing that can run in the background of your computer and you don't really have to give it a second thought so the first thing is starting up the server that's just yarn Dev colon server now a couple things to know about this the database for this is actually using sqlite so it's stored locally on your machine this isn't talking to a third-party database and you have complete control over this and you can see that it opens up a port on 5000 this is a node app it's using express.js if you cared to wonder the next bit is the front end so before we show you the front end let me also just show you some screenshots just to kind of warm things up so this is what the home screen looks like it just looks like a simple chat interface and from here you can go and find and locate documents that you have collected using the main.py collector script or The watch.py Collector script and then of course you can chat because of course what's the point of all of this if you can't chat and then also a way to set up and make sure that you have installed all the proper environment variables there's a file in the repo again there is a readme tells you exactly how to set this up you only need two keys an open AI key and a Pinecone DB key that's it everything else is just your specifications if right here I'm using GPT three and a half turbo in this image if you have access to gpt4 gpt4 8K whatever you want doesn't matter throw it in there and then of course I have a free pine cone index running already all I did was sign up I don't even think I had to put down a credit card and I was able to make a index and I just called it socials to chatbot I don't that was the name I was working with at the time and if you look I have no vectors right now keep in mind anything llm will manage this for you you just need to get the key it will do everything else that's how simple it is so now let's get started and let's boot up the front end you just run yarn Dev front end and here we are we have our UI and of course if you'd like to open an issue on GitHub because maybe something isn't supported or you encountered a bug feel free the first thing that I want to drive home is that anything llm works by essentially containerizing documents so the way that this works is a workspace has access to documents but multiple workspaces can share the same documents but not talk to each other and I'll just showcase that because this will make more sense when we're actually doing it let's create a workspace I'm just call this a workspace one you can call this anything you like so here we are we're given a workspace one we of course have no chat history so we're going to want to open up the settings icon where we are then greeted with all of the documents I have collected I have some YouTube channels I have some sub Stacks some mediums and some custom documents custom documents were from the watch.py script that I mentioned earlier in the video there are a couple documents in here but there's one here called certificate of incorporation this is actually a legal document that when you create a business you submit to the state that you're registering in there's a lot of important information on this like how many shares are in your company you'll notice there's a lightning bolt symbol next to it and that is because when you use anything llm we actually will efficiently cache all of the information that you vectorize which in fancy words we save you a ton of money and the reason that we do that is so if you have one massive document let's say it's an entire textbook because you're trying to pass econ 480 or whatever you can vectorize the entire textbook but then have multiple workspaces share that textbook book but not vectorize it each time the reason that you want to do that is because imagine a book is five thousand Pages there's a lot of words in those pages you have to pay to embed each word essentially so this will save you a lot of money so once you embed it once it's embedded and you're good to go of course you can manage this cache and delete it if you have to update a document or anything like that so let's just do something simple and incorporate and vectorize my Articles of Incorporation but to show you that it works let's first ask it a piece of information that's on this document that chatgpt by itself by default probably has no idea about so here I asked a question what are the total amount of shares of ramp Labs Inc that's what the document for the Articles of Incorporation talks about and it obviously responds what the hell are you talking about I have no idea what ramp Labs Inc is oh okay well let's change that so here I I just click on the file itself we know it's already cached so this won't cost me anything any even if it did it'd be less than a penny this is a 16 page document and now the document has been in bed we're back into chatting let's just copy paste the same exact question it should now be able to reference that document and find the information and as you can see there are 15 million authorized shares 14 million common a million of preferred and that's interesting to know but what's more interesting is the citation of this now because this is a local document we can't link it because of browser security problems but if this was a URL like let's say it was a research paper that you grabbed from a URL or a medium article or a YouTube video clicking this citation would actually open the source of that so doesn't work for local documents but it does work for public of course this is a work in progress and the video you're watching right now is v0.0.1 so there's a lot more work to do but this is by far the simplest way to just talk to your documents you can run it privately and it's just using off-the-shelf stuff you don't have to worry about running an llm it's just it's so easy let's just do the easy thing so I hope you enjoyed this demo of anything llm and obviously looking for contributors would love for you to contribute of course just make an issue and we can talk it out there's a Discord Channel you can join just to chat it's just me in there it's just one channel so you know let's chat hope you liked it and have fun
Info
Channel: Tim Carambat
Views: 18,799
Rating: undefined out of 5
Keywords:
Id: 0vZ69AIP_hM
Channel Id: undefined
Length: 11min 8sec (668 seconds)
Published: Tue Jun 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.