I used the BEST Open Source LLM to build a GPT WebApp (Falcon-40B Instruct)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is Falcon 40b instruct it's the best open source llm that's out there in fact it's attempting to enter one of the hardest styles of prompting right now I'm going to show you how to use it in 12 minutes I took a deep dive into the world free and open source large language models to find Falcon toppling leaderboard but is this hype real I'm going to put it to the test against the three billion parameter model and Falcon's baby brother to find out but before we get to that what makes Falcon 40b so special well first and foremost it's the best open source model around it's topping the hugging face llm leaderboard out bombing llama stable LM and even MPT the model that we dug into in the open source llm video but the real kicker it's licensed under Apache 2.0 which means it's free for commercial use why combinator about to get an influx of new AI guys to get started first we need to install some dependencies the most important being pytorch you can grab the install command from the pi torch website here we're going to install it with Cuda 11.7 that way we can run Falcon on a GPU I'll come back to the fun that I had getting that up and running a little bit later though while we're at it we're also going to install Lang chain inops accelerate Transformers and bits and bytes now that we've got a bunch of stuff installed it's time to import them from Lane chain we're going to import hugging phase pipeline prompt template in llm chain I put this on two lines for Aesthetics these are going to allow us to use the Falcon llm as part of a lang chain powder the most important class being the hugging face pipeline class which will eventually pass out llm2 then from Transformers we're going to import auto tokenizer to convert our prompts to tokens and auto model for causal LM the class required for generating blocks of text from language model while we're at it we're going to import the Transformers Base Class so we can make a pipeline a little later I also imported OS and torch full disclosure I forgot to delete the OS import when I was recording this treat it like practice to test if we've got Pi torch successfully compiled for GPU we can run torch.cudo dot is available if it returns true yeah golden this brings me to the gpus though in order to get this bad boy running I had to run the code using some beefy as hell Hardware namely true A180 gigabyte gpus which cost a casual 27 thousand dollars each hence why H Shin Ramen and tap water for the last eight days I ended up doing this on run pod which worked out to be roughly 3.38 us per hour this is probably Overkill but while I was testing this I got so many out of memory hours that I figured let's just over spec and get it over and done with plus it had the advantage that inference was extremely fast I'll show that in real time a little bit later alrighty now it's time to load up this bad boy first we Define the model that we want to load namely t-i-u-a-e Falcon 40 being struck we then need to load a tokenizer here we'll use the auto tokenizer from Transformers and pass it the model ID then the big dog we'll load the model itself to do this we'll use the auto model for causal LM class from Transformers and use the from pre-trained method to load the model best find the cage steer allowed me to save the model and direct where I wanted the weight saved to important if you're using a cloud instance where space is limited we've also said a number of other keyword arguments namely the data type whether or not to transfer remote code device mapping as well as the offload folder almost then we can then set the model to inference by calling model.eval and pass the model and tokenizer to the Transformers pipeline through here we can also specify by sampling parameters like top P top K number of return sequences and the max length now I don't know if you know this but my favorite TV series is about a set of entrepreneurial sisters who end up becoming billionaires the Kardashians we can test out the pipeline with a question from that series and we get a pretty possible result let's say that we wanted to throw this into Lang chain though you guys went a little wild when I asked if you wanted me to do it so strap in it's about to happen first let's set up a super basic prompt template and set the input variables to input and template to that input then create a new instance of the hugging face pipeline we imported this from langchain originally to that we'll pass our Transformers Pipeline and last but not least we can stick it all together by passing the hugging face Pipeline llm and prompt template to the llm chain and storing that in a variable called chain I know verbose then we can ask who Kim is Again by passing a prompt to the chain.run method and if we print out the response boom not bad now one of the biggest challenges that I face when prototyping this video is building a front end because you guys know I like building user interfaces so that anybody can go about using this in the documents video and free llm video we streamlit to build a user interface but because I was running this on run pod I couldn't easily build a GPU interface unless I wanted to go down the whole API route which I didn't really want to do for this video mind you as I'm writing this script I realized that they're streamlit for notebooks ignore that for now we're still going to spice things up rather than using streamlit I'm going to go back to my roots with radio to install radio it's a simple pip install radio then we can get building this is the sick thing about radio is that you can build ml user interfaces that run inside of Jupiter notebooks and they look pretty good I actually show how to build psychic learn uis with radio and deploy them in my full stack course linked below we need a function that's going to trigger when a user submits a prompt we'll Define this as generate and pass through the prompt to it we can then pass the prompt to our llm chain using the run method and return that from the function home stretch now let's Channel our inner Tom Ford and make this look a little slick by adding your title and description then the final code book this is where radio comes in we can build a new radio interface using the gr dot interface class to it we need to specify six keyword arguments FN is the function we call when a user submits a prompt we'll set this to our generate function then we can specify our input and output types both of these will be text as will pass through the prompters text and expect text back then set the title description and last but not least set style here I chose Finley mckinlin's boxy Violet but there's a whole bunch more available at the radio gallery to kick off the app we can run the launch method you can set the port by specifying server underscore Port but most importantly set share equal to true to generate a shareable URL and boom that's the app dunski time to put it to the test data is ridiculously important it's the lifeblood that powers AI but have you thought about what's protecting your data the thin white line between you and that AI super Juice really a lot of the time it's not a hell of a lot more than a single password that's gone and the data's gone but you don't need to worry about that thanks to this video sponsor nordpass business nordpass business is built to help you manage and secure all your business passwords in a single unified collaborative Place unlike me using I love pizza 42 for every service I have it helps you generate strong and secure passwords but even better let's say you have shared team passwords so SAS Solutions servers git repositories nordpass business allows you to securely share them between those that need it ready to manage and secure your passwords with norpass business well you can get yourself a three-month free trial by heading on over to nordpass.com forward slash Nicholas Nord to get your free trial back to the video so we have three tasks that we're going to run the models through q a few short sentiment analysis and the hardest of them all Chain of Thought prompting a math problem the first model up is Dolly three beats the three billion parameter instruction train model based on pythio 2.8 which was fine-tuned by databricks our q a test prompt is explain how Mr Beast became famous running this against Ollie was a little lackluster calling out the beasters a notorious criminal two stars here it did call out that he was featured on the news and became very wealthy so we'll give dolly that onto sentiment a prompt starts with an instruction to classify text into neutral negative or positive and then has some examples the last of which being apart from the rocky start my holiday in the Bahamas was amazing I chose this as it's a little tricky given the misdirection at the start of the phrase but we did get back positive not bad five stars you might notice that there's a bunch of extra attacks being generated we could cut this down easily by limiting the max number of tokens and last but not least for Dolly and math word problem using chain of thought prompting I start out by asking it to think carefully and logically explaining the answer then have a few shots of examples to help it out the main question at the end being if I have seven potatoes and I turn one into Mash how many whole potatoes do I have left the correct answer being six the fascinating thing about dolly is that it manages to initiate the right Chain of Thought identifying seven potatoes as the initial volume then subtracting one for match but then its math devolves into that of me and returns 6.86 close but only three stars for you that's enough with dolly for now it's time to step it up with Falcon 40b's baby brother Falcon 7B it's a 7 billion parameter model trained by this Technology Innovation Institute in the UAE I've been seeing a bunch of people achieving great results with this one after fine tuning for specific use cases fine tuning video next anyone alrighty q a baby falcon let's go now keep in mind I kept all of the sampling parameters the same across all of these models for a fair trial weirdly enough it goes off the deep end identifying Mr Beast is creating songs including the Piano Man and the game as well as quoting his appearances in The Hangover I have no doubt this one would crash it if used in a retrieval augmented generation setting but for now seven bees out of the wolf pack two stars sentiment is a cakewalk and Falcon 7B correctly identifies a sentiment as positive but when it comes to our potato problem baby falcon comes so damn close it correctly formulates the expression and solves the equation calculating seven minus one is six but then it adds on a little too much sauce and does an extra step returning two whole potatoes remaining not bad again three stars all Falcon 40b do better though will it be The Infinity Gauntlet to our Iron Man the sunlight to our Superman the family to our Dom Toretto Falcon 40b instruct incoming like Falcon 7B was also trained by the tii in the UAE this model however has been topping the hugging face leaderboards even though it's taken me three weeks to edit this video it's still been up there anyway on to q a this is the Apple Mr Beast is a YouTube personality and influencer from North Carolina who became famous by creating videos that involve giving away large sums of cash to random people such as homeless people or individuals who need money for college or maybe vehicle builds his first viral video was last to leave giant squid wins five thousand dollars in 2014 where he hit a giant squid costume and gave away cash to the last person to leave the room his videos have since amassed millions of views I think it's in the billions to be honest and subscribers and he has become a successful entrepreneur and philanthropist Mr Beast YouTube channel is currently the seventh most subscribed channel in the world and has over 50 million subscribers is that right I'm pretty sure it's a bit more than that but anyway that seems like a much better response compared to what we got from the last route one of the unique features with Falcon 40b was that it was also trained on German Spanish French Italian Portuguese polish Dutch Romanian Czech and Swedish texts mind you it's still predominantly trained in English that being said I asked you guys in the community tab if you could translate the Mr Beast question asking it to Falcon 40b we get I don't think that's bad guys Mr Beast Jimmy Donaldson yeah so my French still needs a little bit of work give me a score out of 10 for that sure mozzle of a sentence anyway Falcon 40b scene sir trying to coherent response this is me yet again trading read this out to you YouTube YouTube I'm going to say that's a five out of five for Q a for now what about sentiment interestingly enough the model returns mixed as the sentiment whilst I wouldn't disagree with the fact that we maybe have mixed sentiment we really wanted neutral negative or positive three and a half stars but the real one we've been waiting for our mashed potato problem none of the other models have now this yet will our flaming Falcon be the one guys I got it as if that isn't absolutely amazing so seven minus one equals six I have six whole potatoes left it smashed it considering this is an open source model I think it is absolutely amazing and I know the Fielder llms is moving ridiculously fast but a banger guys I got it and that is five star worthy once again thanks to nordpass business for sponsoring this video get your three month free trial by heading over to nordpass.com forward slash Nicholas North thanks so much for checking out the video what did you think of Falcon 40 being struck if you'd like to see me take on some other models and build a GPT investment banker click here catch you later
Info
Channel: Nicholas Renotte
Views: 91,776
Rating: undefined out of 5
Keywords: falcon-40b, falcon-40b instruct, falcon 40b, falcon 40b instruct
Id: hMJgdVJWQRU
Channel Id: undefined
Length: 11min 27sec (687 seconds)
Published: Fri Jul 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.