Langchain JS | How to Use GPT-3, GPT-4 to Reference your own Data | OpenAI Embeddings Intro

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome to Star morph where we talk about artificial intelligence and web development today we are going to go over how to train GPT on your own data we're going to go into an intro of why you want to train and show a demo of what training can do and then talk about different Frameworks and strategies for training your data into GPT and this is something that Star Wars has been working on a lot a lot of people are really excited about gbg4 and want to start ingraining this into their business but chat gbt doesn't know everything about your business so we need to teach it everything about your business so let's get started doing that all right so first I'm going to demo why we want to train and by the way we're going to do this is I'm going to show you a new bot that I just trained on the next js13 documentation website so I took this whole website and compiled it and then gave it to my GPT bot and um it's now trained on this documentation and because of that we can get some really great information about the how next 13 works and you know basically have a great programming tutor here so before we go into this tool let's see what gbt untrained can tell us about this so if we ask gpt4 even what are the five most important Concepts introduced in nexjs 13. chat gbt even gpt4 is going to tell us that its knowledge only goes up to 2021 and it cannot tell us about next.js 13 because it came out after that so it can't really help us on this topic now if we go over to the trained bot and we say what are the five most important Concepts introduced in next.js 13. now you can see that this is actually giving us real information about nexjs 13 and this is accurate to what is in the documentation as you can see these are the main Concepts here as well same as here and we can also dive into this deeper and we can say give me a code example of the HTTP cache update in next.js13 and the bot can go ahead and give us next.js13 code so this is why you want to train because if you want to use GPT in your business you want it to be able to give answers like this not answers like this that are you know great general intelligence but when it comes to specific tasks we're going to get a much more refined result here if you need help doing this reach out to Star morph as you can see we're training these Bots it's something I'm really having fun experimenting with and a lot of the tools we're about to go into I mean they're developing so fast there there is so much software that's about to come out with some of these Integrations it's crazy okay so let's talk about the tools you can use to train GPT on your data the first tool that I want to talk about and probably my favorite tool that I'm working with in web development right now is called Lane chain and they are really pushing the limits on what's possible with using llms in the browser and in web apps and I'm very excited about like all the new Integrations that they're coming out with I can't I'm I can't wait until I have built what I have in my mind right now using this framework and just this the things that are possible are crazy so let's jump into this a little more what can you do with Lane chain what are the features of the documentation and how do we use it so first off Lane chain has a JavaScript documentation which is right here and they also have a python documentation depending on what tools you're using to build with llms and what lanechain allows us to do is it allows us to train GPT over specific documents so we're basically going to be using they have a few options but you can use open ai's embedding API to take a document compile it into the embedding form that the llm wants and then send over you know that Vector storage to the llm and now it can have information about the specific document that you're training it on so this provides us the toolkit to both load the documents into here convert them into the kind of storage that the llm needs and we can go a little bit deeper into both of those some of you may never have heard of an embedding before some of you may have already been coding your own models so I'll just kind of go through the middle on what that means and um yeah so let's go into how we can load a document in we can see that there's a lot of different file types here that we're able to load into our web app and train gbt on and this is an example of how to load a document in providing the path and we'll jump into the code a little more in a minute CSV files PDF files even and imagine once gpt4 is integrated with this and you can start to load images as a response I mean this is really going somewhere like this is this is really pushing the limit here they have an integration here to load stuff from Hacker News so you can actually go to a URL and load things in and then let's talk a little bit more about the storage so as I said in embedding is a compiled version of the data that you're training it on and openai has an API for creating and sending embeddings to GPT 3gpt4 so this framework allows us to it's kind of a wrapper around the open AI toolkit of creating embeddings and it's really easy to create and embedding from your own document um and then these are different options for embedding databases and the vector storage of the embeddings so these are a few Frameworks Pinecone is really popular framework right now and you have we have some options on how we create our embeddings and how we store our embeddings and I'll definitely be making future videos about this because it's very important for training that we have Frameworks for embeddings and chroma I think is looking really cool I'm looking forward to using this as well I'll make a future video on this but if you're trying to go deep into embeddings I would recommend checking this out as well and yeah I think that's a general basic overview of some of the stuff in the link chain framework so now I'm going to jump into what this looks like a little bit in the code for this bot so right here this is where we have uh the loader and we're loading in a you can see this is a text loader so we could if we have a PDF instead we could just do PDF loader or we could do I believe Json loader so depending on what you file you want to load in here you can update that and then you would also need to change um where you're loading it in and then we are splitting the document into different chunks and then we're creating the embedding Vector storage file so you can see we're using the open AI embedding API and we're creating this hnsw lib Vector storage file of the embedding and basically what we want to do is we can compile um we can compile our embedding our document into the embedding and then when we deploy our app it will be able to send that embedding over to the model and get us our trained response back so that's a quick intro to link chain as I continue to build with this tool um we could do a more comprehensive coding exercise building out some Lang chain code now let's just go into openai's embeddings as well because this is another tool that is available to train your data and GPT so to put them together so this is again what Lane chain is integrating but this is also just a great resource in itself about embeddings and how open AI allows you to create embeddings and then send them over to the um openai response to get back from gpt3 so another option for training your data is rather than using Lane chain you can go directly through the open AI API and train create your embeddings just directly with their API but I think link chain is really bringing some amazing tools and making it really user friendly to do a lot of different functionalities with us um and by the way one of the features that I was talking about that I'm super excited about actually don't see it in here but they just announced it is that they're building an integration with zapier and I don't know maybe it's only in the python one right now because it literally just came out like yesterday but they're building with zapier because zapier is coming out with this new API that's for natural language and this is going to be crazy because imagine you can tell a chat bot send a LinkedIn message or send a slack message and then your prompt I mean connecting it to anything to Gmail to Google Docs through a chatbot being able to perform all of those actions it's starting to feel like like a super AI that can just like CR you know do all of your tasks from one place so this is a for your connection with link chain is going to be absolutely huge and I can't wait until I build something with this so I just wanted to mention that because it's it's really exciting integration into Lane chant but um I hope that was a helpful introduction on what tools are available to train gpt3 on your data and if you'd like any help working with Lane chain or want to collaborate on a project reach out to Star morph and we would love to chat with you about your link chain project or training uh GPT models and thank you for watching this video and let me know if there's anything else you'd like me to go over about this topic and we can make a more comprehensive coding video in Lane chain or train a bot from start to finish but I wanted to give an intro about some of these concepts of training and how we can get started building tools building GPT tools that are giving us super concise answers so again thank you for watching and I'll see you in the next video
Info
Channel: StarMorph AI
Views: 118,789
Rating: undefined out of 5
Keywords: langchain, langchain javascript, langchain demo, gpt-3, gpt-4, openai embeddings, openai, vectorstorage, chroma, train gpt on documents, train gpt on your data, train openai, train openai model, train openai with own data, langchain tutorial, how to train gpt-3, embeddings, hnswlib, zapier natural language, langchain ai, gpt 3, langchain js
Id: veV2I-NEjaM
Channel Id: undefined
Length: 11min 19sec (679 seconds)
Published: Fri Mar 17 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.