The BEST Open Source LLM? (Falcon 40B)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome everybody to a video on the Falcon large language model falcon 40b instruct is the Top Model on the hugging face large language model leaderboards but how good is it in practice and what might we actually use it for today I hope to answer those questions for you first off there are two size variants 40b for 40 billion parameters and 7B for 7 billion as well as other fine-tuned variants the main ones being instruction or instruct fine-tuned for Pure text generation or if you wish to actually fine-tune the model to some specific task or a context subject that sort of thing you'd probably want to use the pre-trained bass variants but for chat Bots q a and other sorts of back and forth correspondence then you're probably going to want to use the instruct variant and then if you wanted to fine tune a conversational model then you might actually further fine tune the instruct variant rather than the bass these models are all available under the Apache 2.0 license which is very open and permissive of for distribution and commercial use and so on so this makes this a very business friendly model to use finally the AI team behind the Falcon models the Technology Innovation institutes currently have an open call for proposals for them to provide essentially you with compute grant money for projects that utilize the Falcon model but more on that in a bit to start these models are available on hugging phase via the Transformers Library while the 7 billion parameter model is a little more comfortable to run locally needing a mere 10 ish gigabytes of memory at 8-bit the 40 billion parameter variant can be a little more challenging wanting more like 45 to 55 gigabytes at 8 Bits and 100 plus at 16 depending on context length you can feel free to run these locally even on CPU and RAM if you want but I would personally suggest Lambda for the two dollar an hour h100 80 gigabyte card instances they don't pay me to say that it's just a fact they are the best right now in terms of price to performance on cloud hosting getting set up to actually run the Falcon models is most likely going to be just upgrading to torch 2.0 since I imagine most people are still on 1.x in some form in the description of this video I will have a link to my GitHub with all relevant code and Snippets including the install shell script that I personally used with Lambda to get things rolling but this would work on AWS or other platforms as well even locally from here we can check out a basic example of Falcon 40b's performance and quality and in this case it's just continuing a thought of course though this is an instruct model which can be more geared towards things like instructions or a conversation so let's try that and as you can see already the Falcon model has a pretty good grasp of the you know just natural language right and it's really quite interesting to me how fast this like bar to be impressed by large language models has moved since chat GPT hit the scene late last last year like a year ago just a year ago this alone would have been just huge like this just just Falcon 40b's performance right now like would have been uh just massive huge breaking news and when chat gbt came out it was and suddenly we just we just happened we just live in a world now where we have these models that can speak and generate text as if we're speaking to a human or another human is writing to us and it's an it's good enough to convince other people that it is another human indeed writing to us and I think what makes Falcon and Falcon 40 be especially huge right now is rather than needing to run all of your queries through open AI you can just download and run Falcon on your own or you can fine tune it further it's yours to do with as you please and it's just crazy to me the the the progress that we've made in just one year in Ai and what's actually available to people to use like right now in software develop element and just the just an AI that is available to you and now ai that is available literally that you could just download it's yours you do whatever you want with it from here that's just insane it's just insane agile students see in some of these examples I actually think Falcon 40b is very comparable to Chad gbt's base model the GP or GPT 3.5 uh it's a little inferior to gpt4 but we'll talk a little bit more about some reasons why I think that is but also it's just a vastly smaller model the than gbt4 but I think we actually could probably eke out a lot more performance than just what the model itself right now is outputting so how intelligent is Falcon for all these examples I'm going to just use the Falcon 40b instruct model from my very brief testing I would likely classify Falcon 7B as best suited really for either like few shot learning examples or even likely as a model that you'd further fine-tune to a very something more specific whereas Falcon 40b especially that instruct variant is much more suitable for just general use and working right out of the box to start some fairly random General Knowledge Questions here I'm showing what the initial prompt input was and then the results to the best of my knowledge these are all accurate and good answers from Falcon 40b here as you can see I use this sort of format in my prompt to suggest a sort of conversation between a user and an assistant these names do not need to be this way or the same or even used at all this model is extremely open and general purpose you don't even need there to be like one to one like user then assistant then user then assistant you can have something like user assistant assistant like I'll show example of that later too but there's the possibilities here are very very open so here though I do particularly like the question about practicing Law Without a law school degree in the United States because most models get this wrong and and the question and answer itself can also speak towards potential uh moderation and after the fact sort of censorship or just a bias towards safe answers if there are concerns about models saying incorrect or unsafe things in general highly professional Fields like law and Medicine carry a lot of risk if people who are uneducated attempt to practice it so a model is likely to be biased away from answering this question correctly even though that is the truth that you can practice law in certain states without a law degree if attempts are made to encourage or bias that model towards safe answers so for example chat gbt with GPT 3.5 gets this wrong and says no you cannot practice law in any state without a law degree GPT 4.0 does actually get this right and I can't remember if it's always gotten this question right I want to say if memory serves me it didn't used to get this question right but anyway here it does but now without multiple warnings and what I would call cyas so it does seem that at the very least the potential risks in answering this question were identified by GPT 4.0 and then it's very careful in its response to you now Beyond this question all the answers indicate a wide range of accurate knowledge that you can tap into from the safety of drinking dehumidifier water to the iPhone's release date to the atomic mass of thallium and so on obviously this is a terribly small sample size and I'm confident that we could find wrong answers generated by this model but as a general purpose model this is really surprisingly good for a mere 40 billion parameters at least in my opinion next up is the topic of math an area that GPT models tend to struggle significantly with due to the auto aggressive nature of how they actually generate responses going always linearly algebraic expressions are often calculated in like chunks and not necessarily in a linear order of the characters that are seen right so large language models often struggle here for simple math problems Falcon 40b gets the correct answer but as you complicate things with algebraic problems you can often find GPD models including even gpt4 and GPT 3.5 they begin to struggle chat GPT especially for gpt4 uses something in the background that will essentially convert your math prompts to show your work prompts so where the the machine was a at least asked I just want this answer it is clearly detecting that it's a math problem and then being fed and an additional I think prompt that is suggesting that hey please show your work because this is a common trick to getting GPD models to correctly solve problems like this that are maybe not necessarily solved by thinking linearly like if you need to kind of be able to bounce around the way to do that is essentially asking it to show its work and the theory here is just the more tokens that you give the model to kind of like think through a problem that's like token is like brain power or something and then it also allows it to think non-linearly now where as if you tell it just straight up give me that answer and nothing more it's probably going to get it wrong I also exemplified this example in my analyzing gpt4 video I'll put a link to that video in the description but here's a couple friends from that where both GPD 3.5 and gpt4 get this right nowadays due to some GPT post-processing or you know just tricks that are being applied and actually GPT 3.5 got these questions used to get these questions wrong and I actually think using rbrms and other kind of heuristics and techniques and stuff that that openai learned from gpt4 I think they just went back and applied them to GPD 3.5 I'm just taking guesses here but now gbd 3.5 just responds in a very similar way to gpd4 such that I'm pretty confident they're probably running the exact same kind of forms and like like pre-pre-prompts I guess or maybe a post prompt but I don't know how I don't know what the right word is so I'll just call it heuristics but essentially a trick to get the model to Think Through the answers but you can still show that they will both of these models will fail if they try to just generate just the answer and nothing more but coming back to Falcon we can see that if we just ask Falcon without telling it to show its work it does get the question wrong but if we tell Falcon hey please show your work then it shows its work and it actually gets the question correct another area that some GPT models are surprisingly impressive is this concept of theory of Mind essentially understanding underlying thoughts and especially like human emotions and behavior for situations and scenarios so here's an example from the Sparks of AGI paper from Microsoft that I've run through Falcon 40b essentially the white is the prompt and the green is the generated answers I went one answer at a time each time passing the entire history up to that point in that new question so kind of think of this as a continued conversation between me and Falcon 40b asking about this sort of conversation scenario Falcon 40b here correctly identifies that Mark isn't necessarily unhappy with Judy's disciplining of Jack but instead how she went about it but also correctly identifies how Judy is perceiving things herself and feeling about Jack's sort of stepping in and Falcon understands that they're both essentially talking past each other and even has suggestions about how they could improve this situation these theory of Mind examples just always impress me as GPD models are strangely really good at this sort of thing often performing much better than you might have predicted if you weren't aware that these models are just good at this stuff so here's another example of theory of mind and an answer that I think is quite good again there's nothing in this text that suggests what Luke's reasoning might be this is purely an understanding of just like human sort of psychology and emotion to attempt to explain some incongruence between requests statements and behavior so again you know generally we you would expect AI to think in this like deterministic way and not really take into account human emotion and like you know odd behavior like it's really just emotion like emotions are strange in humans it's like as opposed to like something like programming where everything is deterministic and it just it's logic right um there's this whole other side to like humans that is sometimes very difficult to understand whereas uh the GPT models and Falcon 40 be here in particular um show a pretty good understanding of human emotion and behavior next we have some programming examples which is usually my personal interest and focus with GPT models five percent of the training data for Falcon 40b was code specifically but another five percent of training data comes from conversational sources like Reddit and stack Overflow where code is often discussed and shows up uh and then there's a web crawl which also is likely to contain a lot of code so there's a good amount of code in here but it's certainly not close to being the majority the first programming example I'll show here is a regular expression question and it's in the format of next line predictions in an attempt to Simply continue the sequence much like copilot might do for you in vs code and if it's not clear the Yellow Part here is the prompt and the cyan is the model's output in this case it gets the hint for my comment again much like I might do with copilot for example and it proceeds to generate the regular expression extract the prices from the text and even print them out for me completely in line with what you might expect the model to generate and this is indeed what I would have wanted next we might instead enjoy a more conversational approach to the same sorts of problems so here we've got the same exact problem just in a more q a format in this format you might prefer it purely because maybe you you want an explanation like maybe from the model here or maybe you just want it to feel more like a teacher than a code generator and give the more kind of friendly feeling to the user it really just depends what your use case is and here I'm just showing and these are both from Falcon 40b instruct but I'm just kind of showing that you know again you can you can do this you can do so many things here it doesn't just have to be necessarily a back and forth all right so one more slightly more complicated programming example uh one of my latest projects is called term GPT which is a project aimed at getting a GPT model to take some sort of General objective as a prompt and then output actual terminal commands that could be run even including with like os.system to actually achieve whatever that prompt was so this includes things like yes writing code but also executing commands executing that code installing packages Reading Writing files and so on I have a whole video covering this as well so feel free to check that out if you're interested I did it using gbd4 up to this point but I would very much like to use an open source model instead and Falcon 40b is looking like it's at least it's very close to achieve this with Falcon 40b I first pass a pre-prompt with a One-Shot example showing just what I would like the agent to respond with sort of like the essentially the same thing I did with the term GPT video but then in this case I don't actually have to have the user specify next because it's it's not required of me to do like this like back and forth so in this case I just have the user suggest an objective and then term GPT or Falcon 40b in this case just suggests command command command command command so the pre-prompt shows an example of basically what I would like and then the prompt the actual user objective here is in that yellow and then cyan is what the model's output actually was the idea here is that a user would input that yellow part and they wouldn't even see or really be they wouldn't know about the white part it would just be kind of in the back end essentially I mean they could know about it but they're not going to mess with that part it's just passed there to sort of guide the model to understand how to structure its response and then later we can pull apart that response and quite literally execute those commands just like I did in term GP so as you can see the output is it's so close to what we would want it really just makes this one small mistake of trying to create that home.html template file in the templates directory which is correct that is where we want it we do want that file we do need that file that file is correct but we never made that directory so in the attempt to make this file inside of that directory it's going to fail if we did just do a maketer that directory this would have worked and I think this is an example of where gpd4 is better than Falcon 40b it just simply doesn't make small mistakes like this as often it can it totally can but just not nearly as often and something as simple as this gpd4 does solve out of the box but Falcon 40b at least for my testing both here and just in general I dare say Falcon 40b is actually better than gbd 3.5 I also Suspect with respect to term GPT in general I think with more time working on a pre-prompt I could probably iron out these problems um and and for example gpt4's Behavior has actually already changed on me multiple times while I'm trying to develop term GPT some of this is like after the fact post-processing and kind of like double checking the answer and then some of that is also the the actual underlying list I call it foundational model has also changed now over time that is going to keep happening and you can specify an older model but I want to say they are just supporting like the single previous model and after that it's gone and you can't get it again so you spent all this time really like honing in and fine tuning and getting things just the way that you want them to work and then the model changes or the model doesn't even change but clearly something something else has changed they've changed some of that post-processing heuristic stuff and it just isn't working the way that you want it anymore and that's super frustrating whereas here with this model you can just depend now it's not a deter it's not totally deterministic but you can depend if you need it to you can depend on those weights being frozen they are not going to change on you in any post-processing heuristics that's up to you so you can depend on that stuff will not change on you but if I want to actually fine tune it specifically to this new use case guess what I can actually do that too it's my model and I just might like I was saying earlier the tii or Technology Innovation Institute is actually currently right now has an open call for proposals for ideas that you might use to on top of the Falcon 40b model and they're looking to essentially issue grants of ten thousand to a hundred thousand dollars in GPU compute power uh to people who have ideas about how they want to leverage this exact model so something like term GPT and I might end up submitting my own literally for that to fine tune it to be exactly what I want and my suspicion is just with a little bit of fine tuning I think Falcon 4 40b can be better than gpt4 is right now and again at the end of the day it's my model I can do whatever the heck I want and I don't have to submit my queries to open AI anymore and I just that's that's pretty cool so with my very anecdotal experience so far uh I am very cautious to say but I do think it's actually true uh that Falcon 40b seems to actually be better than GPT 3.5 the base chat GPT model struggle with saying this purely because Falcon 40b is such a small model relative to GPT 3.5 if if Falcon 40b was only available via like API or some web user interface I simply would not believe that the outputs were from a 40 billion parameter model alone I would just think something else was going on like more heuristics on top and stuff like that like gpd4 does but since we can actually download the weights ourselves we can clearly see no this is just the raw model output and it's already this good which is very impressive I very much look forward to the forthcoming paper on the Falcon models and training them and stuff I'm super curious to see what techniques were used in the actual training and the data set I'm also confident in saying the Falcon 40b model out of the box is just simply not as good as the current you know gbt4 via the API or the web user interface but we also do know that gpg 4 isn't just a model like this it's a model with a bunch of heuristics on top of it to make it quite powerful and rumor has it that it's actually more like eight 220 billion parameter models essentially in an ensemble or something like that we we know and have some ideas about some of those heuristics also used from like the rbrms for example that were shared in the open AI paper on gbd4 but not all and everything is rumored we really just don't know the only thing we do know with a large degree of certainty is that gpd4 is not just simply raw model output from a single model and nothing else so based on my experience so far with Falcon 40b I would suggest that if we like allow ourselves to run Falcon 40b with things like maybe rule-based reward models or something like that like forms and sort of Sanity checks on output and kind of double checking or and detecting things like is this a math problem show your work if so you know that sort of stuff um that I think is pretty clear that gpt4 is using heavily I all I suspect just in the response time it could be based on load but there are certain times where you ask a question and it truly feels like you ask a question to gpd4 and you should have already got an answer back but you didn't but then you get this like careful answer back and it my suspicion again I don't know this stuff but it feels like you're asking a question of gpd4 it probably generated an output and then there's another model that sanity checks that output and if there's something that it detects as possibly problematic it sends it in like a form with a bunch of questions and the model literally answers those answers those questions and maybe changes the response a little bit so my best guess is the gpt4 model sometimes gets queried multiple times from a single user query um and I again I think if we if we did the same thing with Falcon 40b I think we would likely get far more even even more now than we're already getting we give far more performance out of that model and it could be pretty comparable to gpt4 which is insane to think about but even if you couldn't you can still fine-tune Falcon 40b for example to your own specific use case and in that in that way you are highly likely to wind up with a better model to do whatever it is you're trying to do then gpd4 would be and at the end of the day it's yours it's an open source model it's yours so check out the tii call for proposals if you have any big ideas that you'd like to try and then also check out the neural networks from scratch book at nnfs.io if you're interested in learning more about neural networks and how they work otherwise I will see you all in the next video

Info

Channel: sentdex

Views: 90,964

Rating: undefined out of 5

Keywords: python, programming

Id: -IV1NTGy6Mg

Channel Id: undefined

Length: 23min 56sec (1436 seconds)

Published: Wed Jul 05 2023