LoRA Q&A with Oobabooga! Embeddings or Finetuning?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

foreign do I want to Center you a little better there we go oh just kidding I think I should be live now I'll make sure I'm alive hey no worries how you doing Megan alrighty everybody well today we're going to be doing a live stream to talk about questions and answers around anything to do with Laura's um I already have my Uber kind of set up and um I have a model that I've pulled down I just pulled down the standard uh llama 7 billion parameter model and I have a little training set that we could go through but I'd like to kind of give y'all an opportunity to ask any questions that y'all have and uh see if we can address some um I know that there are some people who have already asked some pretty solid questions and like let me go ahead and pull those up I wanted to kind of address each of them as people kind of came in so one of the more common questions that I've been seeing is when we have kind of big raw text files how do we deal with them and I I have a couple of data sets like that it's particularly kind of this like cancer or a PubMed kind of uh data set let me go see if I can find that really quickly sorry I just got off of work so I work in two full-time jobs it's super fun um but I like to kind of answer these but one of the things that I like to do when I have big raw text dumps is I like to use embeddings for those instead for semantic search and one of the big tools that we have for that so we can look for mtab the massive text embedding Benchmark and we can pull from and tab um what embedding would work best for us ah I have a i7 32 gigabyte ddr4 and 8 gig GPU can I train Laura that's probably going to be a little tight um right now 4-bit training is still kind of experimental so even with a 7 billion parameter you're probably going to be a little light on resources and I mean I have a 4080 and I have goodness I think it's 16 gigs of RAM and I find that that's a little tight um what I what I tend to do is go to Lambda Labs personally um and they're not expensive at all um let me pull up um one second I wanna give away my IP addresses um so I like to use the a100 instances on here they're usually available you can also request instances if you would like to um but I like the a100s and they're only like about a dollar an hour and for most of you're going to be doing three or four maybe five hours of training initially um so you're looking at fi and also they have a program that they're coming out with pretty quickly for um letting people train llms for free yeah here it is so they're they're getting a program set up for 30 days of uh training for people we have several the h100 sitting in front of them now there are some tricks that I've had to use to get h100 set up correctly um oh yeah I could absolutely let me put that here for you and I'll put the mtep as well here's Lambda labs and then let me get you the mtep so the mtab is pretty great um you have a few different data sets that or not data sets but a few different embedding Frameworks that we can use instructor XL and E5 are really powerful General embedding Frameworks or models especially instructor XL and E5 they are not multilingual to my knowledge but there are multilinguals so there's multilingual E5 base which is really great for if you have Spanish Japanese some other language set um that you can use for uh creating embeddings on your raw tags so personally I want icy raw text and I see just massive amounts of it and I don't have a way that I can convert it into an instructional or prompt completion or um a q a kind of style I really lean towards using embeddings especially when I have a massive Corpus and then that way I can do semantic search so if we if y'all aren't familiar with embeddings so this is tensorflow's projector and it gives a kind of nice idea of what and embedding really is and embedding is just a space of who am I the most similar to and the way large language models in particular oh I'll get right to that I'm not going to say her name on stream though I'm sorry um I will get right to that in just a second um but an embedding space is a n-dimensional euclidean space where you can have I am close to you right so if some group of words or concepts are close to each other they are related and that is what sentence and phrase embeddings are for is to plop very similarly related things close to each other in some in dimensional space so if we see here we have arrays and processing processor discrete surface surfaces we have similar things packed together closely in space so we can know that they're related to each other so if I have a query and I then embed that query I can search that query in my embedding space and I get back who is most closely related and some really good databases for doing that are pine cone and weaviate so we V8 is the I think the open source alternative to Pine Cone yeah so what what you can do with these is you can automatically store your um embedding vectors and weeviate or pine cone and use them for semantic search and then you take your results from that embedding database and you can use that for in the llm for summarization so let's see if I have kind of an example here a volt AI is a good kind of example of this so here let me drop this in chat if he has a kind of an example app here so you can take some generic documents and drop it in here so let me grab some wrap some documents so here I'm going to take these dots here I'm just going to upload them to Vault AI it'll take just a moment for that to process so these are just a medical q a i could break these out into q a but we can also just upload them into a uh search platform where this is using embeddings so what it does is it embeds your documents and then you can ask questions of it oh he made it monetize that's a good choice actually but it's a good little example so let me see what's 0 1 XML would have been okay so we have let's do lymphoblastic leukemia that should come up now [Music] there we go so now what we had oh I guess it didn't want the you have tend to have to have a kind of large Corpus for this to work super super well and of course since I only have a uh kind of small one it's not going to work as powerfully um but let's see if that's in our context here let me do leukemia instead I'm not sure which embedding he's using I think he's just using open AIS embedding um ah here we go so what it's doing is it knows that leukemias and lymphoplastic or lymphoblast are typically related to cancer so if we had a larger Corpus this would be working a little better um but you can dump your uh documents and use it for summarization so for example let's see if it if it picked up on this kaposi sarcoma it should have since it's already in the embedding space there we go so kaposi sarcoma is a cancer that causes lesions so what we're doing is we take our embeddings we use it as summarization and then we can have the llm give us a summary of that result I tend to like those better for big Raw textums and Uber I've tried not really but let me address a question up here to Zora so what a Laura does so let's let's talk about how Alora affects your model um let's see if there's a good kind of image example Rihanna oh of course it's going to not be Laura attention actually let's do Laura hugging face that should give us a pretty decent example here oh there we go all right so what uh what a Laura is really doing is it is typically attaching itself to the feed forward portion of your attention layer and low rank adaptions so where the name uh there we go good boy name I like that one all right good boy names so good boy name we uh what a Laura does is on the feed forward portion of and I would like to find an image here oh it's we're doing a lot of anime because Laura originally started as a modification to stable diffusion and then I got adapted to llms oh let's see here Laura llm let's see if that's gives us something a little bit here we go now this is a good example okay so if we look if we look at this image here what we have is we we have our previously trained Network we don't want to update the weights on those pre-trained Networks that's going to require a lot of GPU compute that's going to require a lot of memory that's going to require a lot of resources so what we'd like to do instead is approximate a training so what happens is we attach two lower rank matrices so if you remember the idea of rank D comp so let me pull up kind of a notebook here let me see if I can get the font size to look good so if we have any Matrix a and so let's say this Matrix has a dimension 2048 by 2048. I can do dcomp on this matrix by approximating it by some or actually completely decomposing it by doing um uh I just blanked on singular value decomposition so if you have a matrix these Dimensions is n by n you can approximate that matrix by having three matrices B whose inner dimensions are in m times and remember for matrix multiplication only the inner Dimensions have to match so we can have a matrix C whose inner dimensions are mm and then multiply that by another Matrix d whose inner dimensions are M and N so this multiplication will form into finally another in n and it'll be exactly a but the the idea behind the Laura is we can approximate two so we can have a lower rank Matrix or pair of lower mate rank matrices so that we have a if this is our kind of our feed board layer and it's an M by n we can approximate it by b as in m times c m n where M can be much much much much less than n I I guess uh that's like that's actually a really good question tone let me let me let me address this and I'll come back to that that's a really great question um but good boy the the idea is you have two significantly lower rank matrices that are guiding the uh updates at this point so when you train these lower ranked matrices what will happen is you compute your input from the pre-trained weights you've trained your low rank uh Laura matrices you get the inputs there you sum them at the end so you sum all of your outputs and you're just nudging the network at that point does that does that make sense good boy is that is that a is that a good explanation or should I go in a little more depth um I don't know what the lag on this stream is so I will check back um but tone let me I have not done a ton with stable diffusion so let me let me take a look at stable diffusion that I want I'm not sure what that means stack embeddings oh let's see can you give me an idea of what stackable means in this context tone I'm not I'm not too super familiar I mean I understand stable diffusion but I haven't worked with it as much um my background and and what I work in is mostly uh semantic search and and llms not so much Gans and these kind of generative Arc networks so if you can give me a deeper idea of what you mean by stackable icon I can address that a little more directly yeah so what you'll find though is if you if you had more than just raw text good boy so if you took a data set so for example kind of my medical example here I already have it it kind of in a form that I can break it into a q a data set right so I like to kind of filter out this jump though ah I don't know that's an interesting question so you can that's interesting I have not seen that at least an Uber and I haven't used that in practice I was I actually was not aware that that was a thing um I don't see why you could in theory um I'm not sure what the end result would be though so if you had two loras you append to Laura's and you sum the result I guess that could work um or stacking llm I I haven't seen anyone do that yet personally I've seen people train individual lauras typically what we've what we've been doing in practice yeah I guess I guess that makes sense right because if you want to fuse two artist Styles together you would want to train one Laura or their style a second lore for the other artist style and fuse those lauras together or um getting kind of a blend of that style I don't know what that means in text though I'm not sure how how to extend that into text if you have one author's style I guess you could if you had one author's Style and you wanted to blend them together right so you do a raw Tech stump of one author style so you do a bunch of Edgar Allan Poe and then you do a bunch of uh Stephen King perhaps you could blend those together and have a blended style between the two but I'm not sure it would work quite as well let's see here I would be curious if you could use to that to build different roles is it much cheaper to run training only on approximate I understand that you have to run yeah so this is all kind of approximate right the idea of a low rank Matrix is you are using smaller Dimensions to train or update what the output from the original feed 4 would be maybe for tag solora for output form oh yeah that's interesting yeah so you could have a Laura so you could attach in theory to if you had several multi-heads you could attach it to different portions that would be interesting it's okay hey Aaron uh so if I want to make a chat about for my digital art schools trying on a bunch of software documentation would it be oh so what's what's your end result what's your end goal Dustin so if you're are you thinking I want to have a searchable a set of documentation or I want to be able to generatively create that documentation or summarize it do you have do you have an idea of kind of like a narrower use case there um let's see so good boy the the one thing that I would you might want to even experiment with a little more is what could you do with a better formatted data set so imagine right now you have some pretty powerful results already from um your uh Rod text imagine formatting that into q a and there's a few different ways to approach that you can leverage another llm so you can asset llm okay you can leverage that llm to do some summarization for you or reformat that into q a um because if you don't have I assume an army of people to to do the Q a formatting for you you can leverage an llm to get into some nice approximate um I would use embeddings Dustin I would um when I want to do semantic search I don't think the problem is trying to influence the Network's Behavior rather I want to be able to do search and summarization um so I tend I tend to lean towards embeddings for that and instructor Excel is really solid for that um especially if you're if your code documentation isn't code but if it is code there should be embeddings for that as well um let's see yeah so we could we could we could look through some of these and figure out which of these embedders would work best for you um but if it's if it's literally just documentation instructor Excel and E5 tend to be very good and what we use in Industry right now is we're using instructor excel at least where I work we use instructor Excel and it does very well it's it's quite performance um it's pretty fast too um it's really easy to scale um and I I tend to like the results um I was wanting to try out E5 um and I was going to do that later this week see how well E5 performs on our Corpus but I want to kind of go back to uh pedor's uh Point here um when you're training Laura's you still track your loss right and there's various different ways to validate and and help the network along learning are you familiar with RL HF otherwise reinforcement learning by human feedback so RL FH is where we have humans give inputs that help us drive some loss function on training and updating the Network's Behavior there are many many ways to approach this there's no exact science on this right now there are techniques like PPO um there are techniques like um generating a loss function straight from that human input several different ways oh I'm not going to delete it I promise I promise I won't delete even if it's if it seems like a giant disorganized mess and I'm sorry everyone if it is a little disorganized this is my first live stream um uh where I'm trying to do this so I kind of want to feel out where things unoptimal and how could we improve them in the future and if you all have feedback I would love to hear it too um yeah I'd love I'd love to hear but I also kind of like always like just the shotgun of questions and just trying to answer them um but I hope those answer your questions uh uh Peter and Dustin um I feel like I feel like embeddings are neglected a lot when we start talking about fine-tuning we we tend to We tend to not the thing that I what I what I found personally when I talk to other Engineers about this is it this is okay I want to give you all my background because I feel like it's not about a call to Authority it's that I've been doing this for a long time I have been working oh yes absolutely Aaron I can um my background is a natural language processing um I have been working in it for almost 15 years now and so I've been in it long before Transformers were a thing um I was working with rules engines and let me tell you if you want to talk about a pain in the ass well I guess there goes monetization on this video um if you want to talk about pain in the ass um rules engines are the worst you have to you have to write manual rules for every single concept so if you have a 30 35 000 concept taxonomy or ontology or something like that you have to write hundreds of lines of rules for each of them it's inefficient it takes forever it's just not scalable but we used to do it and then we started getting more and more powerful tools for uh machine learning and that kind of came in the form of like word devec um and there's at other kind of embedding techniques and so I think one thing that gets lost when we start talking about fine-tuning is people want to fine tune everything and sometimes it's not the right solution at least in my opinion I like to uh in my mind I like to think of what do I have and what am I wanting and if my question is I want to search I want to be able to search documents I tend to think of things as being an embedding problem because I want to know who is similar to who and I'm not asking the model to generate something for me rather I want to dig into and through my Corpus so if if that is my problem fine-tuning in my mind is not your answer where fine tuning is your answer is when you want the model to be creative and expressive and generative and helpful and instructive right so if you want a model that teaches you how to cook or generates recipes for you that's where I tend to um okay so what happens when a Laura so that's that's that's the cool part of Elora and we've kind of already went over this a little earlier so I'll kind of speed run it in allora so if we come pull up this image here you have your pre-trained weights right you want to hold them fixed and what you want to do instead and to save on resources right loading this whole d by D Matrix and the the Steve ID is probably going to be huge right it's going to be thousands of Dimensions large so it's going to take a lot of RAM and a lot of resources to train it what would be better is if we could have some rank decomposition that approximates training that Matrix so you can take two smaller rank matrices and append it train those and bump the behavior of the attention layer so remember in attention you have three things you have your query your key and your value I'm sure you all have seen this when you're looking at the attention layer you see these three inputs what are they well query you can think of as the question that you've asked so what you're asking the model to complete right so complete the sentence right whereas the key is then some set of um metadata about the possible next completions and then value is the actual outputs themselves so how attention kind of conceptually works and I'm skipping some magic here because I just don't think it's as conceptionally important is you're trying to emulate a human right so if I take any object right so if I take my my air pods for example I'm going to use context clues to understand the object I'm looking at right so my query to myself is what am I looking at my keys are some context clues about this object it's got kind of an oblate shape to it it's got some weight it's white it opens and closes and then the values are going to be my possible answers I'd like to give silly examples is it a waffle no it's it's IP it's you know it's airpods because that's the most logically sound thing and that's the completion that my brain creates to the query of what is this and so that's that's the general idea and so what happens is with a Laura the output from that attention is nudged by the Laura by adding the results from your original input against the tension plus the input against the Laura and you've nudged it and now you get a slightly different probability of the next values based on how you've trained it oh I hope that makes sense um so Aaron what are the differences between embeddings and Laura's so Laura's are what we just went over with training your network or training these two matrices embeddings are this so an embedding is a as a space it is an n-dimensional space so it can be one dimensional two-dimensional three-dimensional all the way to 512 or 1024 as big as you want it to be where you're trying to express similarness who is similar to who in that space and if you think of something that all of us could be good at um what's a what's a good what's a good example here cars so if we wanted to group cars by their similar in this right there's several ways that we can look at this problem right you have sedans you have trucks you have makes you have models you have engines you have so you can do all of that at once because you can put things into really really high dimensional space you can look at multiple Dimensions all at the same time and that's what an embedding is for an embedding lets you store all of these um representative dimensions of the thing you're trying to compare in that space and makes it searchable and so when I look around in this embedding space and let me go ahead and actually run tsne on here and you will see how well things actually cluster in the space and it's really cool I've always liked tsne and we we use this for all kinds of stuff I've used this for neurology it's kind of cool um but where you're going to see over time is as I look at metadata in my embedding space things that are similar are going to smash and cluster together and let me um Let me throw this on a sphere instead because it tends to look a little better okay well I guess it was already spherized okay so let me go ahead and let that run now so how do you apply an embedding so great question you have some different tools that you can use but the basic workflow is as follows so you want to embed your data right so that's step one so you have a few considerations that you need to think about when you're when you're thinking about embedding one embeddings do have token limits so you need to be aware of your token limit and so you're probably going to do what's called chunk your data so oh this is called the the projector from tensorflow it's a really great way of just visualizing to people what embeddings are super cool program um but as you can see things are now starting to Cluster together we'll come back to that once it'll take a little bit to finish so um but you want to chunk out your data so if you have a thousand token limit you probably want to chunk and do some sub IDs um then you want to embed that and then you're going to upload that to a vector database you have a ton of choices here um you have postgres postgres has a vector DB extension you have weeviates you have Pinecone and all of them are good all of them are great some scale better than others some are free some are paid some are Pains of the answers like postgres but postgres works great when you get it working um and then finally you have your query so you're going to take your query right so whatever your user is asking so for example if you had a corpus of cars you would upload your car data and then you could ask questions like what cars have good four cylinder engines with 200 horsepower and so forth and it would be able to use the embedding space to figure out who are the closest ones to this and you're going to get these results back you're going to take these results you're going to feed it to the llm to summarize and then the llm will summarize that output for you and say oh well the Toyota whatever has 200 horsepower four-cylinder Etc uh I guess you could use redis we tend I I haven't used redis as much um in real and like real practice with this I've used redis for other things but I tend to use redis or event driven kind of systems or um you know I need to I kind of treat it like a a less powerful mqtt almost and maybe that's not the best use of redis um but I suppose you could use redis especially if you have some kind of event driven um Behavior going on in your in your llm um one of the one of the interesting use cases that we've been using these for is if you let's say you have a medical claim and that medical claim gets denied well we know how to combat some medical claims um you can file an appeal and what would be really nice is if we could have an appeal already ready at the time of rejection on that medical claim so you embed a whole bunch of um Appeals then you fine-tune your model based on those appeals so you teach it what an appeal looks like and then you use the embeddings as a corpus that it can analyze to make that appeal very specific to that appeal and you get some pretty good behavior out of it um so let's say you have neuro oh I always forget what this it's where they put uh like a little chip in your head and they can zap you when you're like cortisol is getting too high but it's uh it's a very commonly denied procedure and there are a hundred different reasons why you would have this procedure done and the appeals process is very different based on the rationale behind why you're being denied and so we train our model to know how to write the appeal we give it the specifics behind the the claim that was uh denied it'll pull down relevance and it'll write an appeal for us and it saves so much time it really does oh interesting okay so Derek can I get uh would you give me uh hey Derek would you mind um oh so you're not familiar with what reviate and pine cone so weediate and pine cone here I'll drop uh I'll drop some links for you so pine cone and we V8 these are vector DBS so Vector databases and they are just purely designed to take vector-like data oh absolutely Aaron yeah that that's kind of what I think of the power of embeddings right in the most naive setting with an embedding you think of it as semantic search and you can really take it beyond that so how I think of fine tuning personally is and I'm sure there's people who would disagree with me and they're they're probably just as right I think of fine-tuning as a behavioral thing I want to make the network behave in a way I want to make it a domain expert in doing X and then I want to provide it a corpus of knowledge yeah chroma DB is really good too thank you Eddie chroma is excellent too um but I want to have I want to have an expert that I can give it data and knowledge and then it gets really powerful so I can have an instruction that you are a medical claims expert you will write Appeals and then I will give it a mountain of examples of Appeals and then on top of that I'm going to give it context so I am going to look through my Corpus and I'm going to grab out of embeddings I'm going to give it examples of appeals that I have successfully written and it will then write really powerful Appeals and it doesn't require very much tweaking after that um there are a couple of embeddings that work really well for that instructor kind of didn't work super great for that we tend I think we used bloom I think and that one worked really really well for that or it might have been E5 I'd have to I'd have to verify no I'm sorry we actually we actually had to use Ada for that I apologize we used Anna I tend to want to avoid at a um it's not that I'm anti-open AI um I I like having a lot of control over my data and so I tend to avoid um open AI when I can because I want to have very explicit control over my data and how it's processed and what machines are processing it and who sees it and who doesn't um but if you want to use open AI there's also nothing wrong with that um I think it's fine yeah Derek I I feel you on that I I don't I'm okay with pinecon um they're to me sometimes it's nice to just have it scale and I don't have to think about it oh that's interesting petor so it depends so with with appeals appeals are very sensitive gpio pins I don't know what you mean by that text the church is I'm not sure what you mean um are you thinking about having this act as an agent but that can like lightest LCD screen or something um hey Otto um Otto is one of my good colleagues actually he knows he's very good at this too um I called you out Auto I'm sorry um I am also very proud of this so uh priorte so I could you could use reinforcement learning with human feedback and you probably do you really want to help teach the network what is good behavior and what is bad behavior in this case um especially what this kind of medical appeals process I mean you're talking about people having to pay 50 100 200 000 dollars versus just not getting covered right and we have people who have crippling disorders abscess of compulsive disorder um oh you know Tourette's um these kind of you know crippling disorders where um some of these devices can really make life-changing things for them so I I still don't quite understand where you're wanting to go with this are you wanting this to act as an agent are you wanting this to control some external device um you could do that um there are agents that do that I'm not as prepared for that today because I'm really focusing more on kind of Laura's um but yeah you could you could use agents for that um but yeah Derek I tend to I tend to be very private myself so Otto and I were actually discussing this yesterday something similar is what is the notion of agency with your devices in your home and what could you do if you could use an llm to control stuff and yeah it's I don't know it's there's I think a lot of things you get oh no no there's nothing wrong with that text churches you did nothing wrong um but you know I I wonder what you could do imagine if you could give your toaster personality I don't know how that would feel correct Peter yeah absolutely you would you would choose your your validation scheme based on a on a handful of things you would you would train it on or rather give it feedback and reinforcement based on what is probably the criticality of your performance so I can give a really good one imagine you are an attorney and you are using your llm to research relevant case law you probably want that model to work very well so and you want to know that every single um prompt and response was validated by a domain expert by another attorney well that's where you would definitely have someone come in and do our lhf right you would do human feedback I'm you probably don't want to trust some automatically generated validation set but let's say you're making a cookbook probably not gonna get someone put in jail for 30 years over a cookbook so that's probably okay to do more of an automated um I did hear about Q Laura and it is very exciting I will be doing a video on Q Laura uh this week I'm very excited about it um let's say how do I get one of these pine cone how would I embed data okay so how you embed data is you pick one of these from mtep and I'm going to update our description after this and I'm going to drop all of these links into the description so they will all be there ready to go but you pick one of these you run it through and that's its output its output is your embedding vector so if you have and what how these embeddings will work or look is it depends on what type of embedding you're using you have word versus sentence embeddings so a word embedding is going to take your entire input your entire input so if I say I have a dog and I'm going to do doggy so I can make an example here so this is probably going to break out into five tokens so I have so what's going to happen is you are now going to convert these tokens into their tokenized uh representation so that could be something like this I'm just making numbers up so don't take these numbers very seriously and then these are going to individually get embedded so if I run this through a now word and better I am going to get a vector for every word so I'm going to get some kind of n-dimensional vector and let's say the vector size is four so I'm going to get five four dimensional vectors back and so forth but in sentence embedding it's going to embed the whole sentence so it will do the same process but in a sentence instead of doing every token you are going to get back and embedding for that sentence and that's these are going to be floats by the way they're not going to be ins so here I'll put a decimal on a couple of these so you so it doesn't seem like I'm trying to say you get inspect um and then you store those in a vector DB you just store them as those vectors and you can search the vector DB for um vectors that are very close then there's a few different methodologies that you have you have euclidean distance which is just literally x sub 1 or yeah x sub 1 y sub 1 Z sub 1 out of whatever your dimension is minus x sub 2 y sub 2 Z sub 2 so however many dimensions you have and then the ABS it so you get absolute distance and that's going to be it but then you also have cosine similarity where cosine similarity will consider it a little more complexly than just euclidean distance um but Aaron what would it make sense to train a Laura on something like Matlab code all the llms right python code but seem to be limited with Matlab code could I take the Matlab help file formatted and make allora I don't understand what you mean by that Aaron could you give me kind of a an example use case there I'm sorry are you thinking of like training something for specifically doing Matlab code or do you want to do something with Matlab code I'm sorry I just want to make sure I understand the the questions oh thank you Dustin I appreciate that um I hope it helps people um I enjoy doing this kind of stuff um I kind of wanted to be a professor but just I just like actually applying it um and reading people's research and learning about it but I was never I am a really good application engineer I am a terrible research engineer is what I discovered um my PhD was physics and my professor would always tell me you are such a good engineer but you are such a bad researcher yeah no I I recovered eventually from that um but I did recover I recovered emotionally from that um but the main thing is I think when we're trying to figure out how do we train our model there are really a few very important questions that we need to ask ourselves and I think we went over this and we went over this actually in the other video um you want to consider what llm works best for you so which model is the best aligned right so that's our first step um and two we really want to know um are we trying to embed or are we wanting to fine-tune right and I think the way to make the determination from a trivial perspective right if you just want to be able to throw a few cards on the table if you can check them and say yes no yes no yes no and you pick embedding versus fine tune you're going to say if I have semantic search and you want summarization those kind of problems you have an embedding problem you do not have a uh fine-tuning problem but if you want you know if you want instruction um you want creativity you want um you want expertise and so forth now you have a fine-tuning problem because you want to influence the model's Behavior there you look at yeah manlab is okay so Matlab gets a bad rap um I have used it it's okay I I think it's okay for rapid prototyping stuff um but a lot of the llns just don't write good Matlab code um you might actually try asking it if it can write octave because octave is the the free version of Matlab um and I think the syntax is identical but I think that llms do better with if you ask it octave because I just have a feeling there's just more examples of octave code out there than there's probably Matlab code um but I think that's it we have four minutes left um is there any closing questions that anybody would like to ask and I'm sorry I have to cut it off right at seven I've got a I've gotta go take my daughter and go do some stuff tonight so um 8-bit yeah so oh okay Dustin so hold on let's see here or Julia yeah Julia is a good idea too um does the underlying model need to be 8-bit of training in 8-bit mode no you can you can take your models actually this model right here for example is not loaded in 8-bit it is just literally the Llama 7B um and you could try loading an 8-bit and then you'll just take your training and and run it as normal so you take your data sets and pick your outputs uh your data format in this case I have it set up as alpaca and just run yes I will leave it up for you texted churches I will be leaving it up um and that's it so yeah all you have to do is just hit locate bit so the one thing I'd recommend to make sure is when you check load 8-bit unload and reload the model just play it safe make sure it actually loaded an 8-bit um is really good but it's got a couple little buds in it still uh no no no no no no so your llm stays your llm um I have good boy but I'm probably going to do a deeper dive um but Derek no your mtab so your your embedding is a just totally different part of the workflow so if you imagine it like this so let's I'm sorry I'm about to whip out paint and I know I'm sorry I'll get a better tool for this um so let's say you have a pipeline so you want to be able to embed so over here is your llm so this is going to be llama over here let me up the fonts on this uh okay there it is so this is going to be your llama so llama stays llama you're not going to replace llama instead you are now going to have over here you're going to have your vector DB over here and your vector DB is going to respond to queries so let's say this is pine cone so your vector DB is going to respond but to build these responses you're going to have to embed so over here you're now going to have instructor Excel and that is going to take a whole bunch of docs right so you're going to have a whole bunch of documents that you're going to run through instructor Excel and then you're going to run that and take the results and dump it into pine cone and so the result of this now is you're going to have a query that the user is going to ask you're going to embed that query and then you're going to compare that query embedding to your stuff in Pine Cone and then hand it off to the llm for summarization and other tasks I hope that makes sense I hope that's helpful how to train a model in specific domain and fine-tune to generate specific answers how to do the first um that is where you have to do q a style stuff right so you're going to want to hopefully have a QA data set raw I just am not a fan of raw I don't know I tend to prefer structured but you can also train it on Raw um so you could just take your raw text file and just build a lore on it um see how well that influences the model's response so that's going to be under here in training um and I'll I will um see if I can put together a couple of examples on bigger raws but that's it unfortunately we are out of time today everyone uh thank you very much for coming I hope this was helpful and um I'll see y'all next time and we will be doing these every Thursday by the way so same time so have a good one everybody

Info

Channel: AemonAlgiz

Views: 6,431

Rating: undefined out of 5

Keywords: AI chatbot, scambaiting

Id: LFJtpVZw6ss

Channel Id: undefined

Length: 59min 23sec (3563 seconds)

Published: Fri May 26 2023