Fine Tune a Multimodal LLM "IDEFICS 9B" for Visual Question Answering

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome to AI anytime channel in this video we are going to f tune a multimodal llm okay so this is going to be an interesting tutorial where we will try to find tune eix multimodal llm so if you don't know what eix is eix is an open-source reproduction of a very famous model named Flamingo Flamingo is the model that has been created by Deep Mind it's a close Source model that Deep Mind uses it for various purposes now hugging phase a team at hugging phase they reproduced Flamingo and they created eix now eix is available in two different weights variants we're going to look at the 9B 9B parameters edix 9B already have created a video on edix earlier where I have shown you that how you can use eix for visual question answering okay so if you uh want to uh upload some images and wants to analyze and query on top of those images you can use these kind of models like adix and there are other models like lava for example a very famous model now why this is important why this video is important and why should you watch it and why should you focus on multimodal llm or if I make it a bit narrower why should you focus on large Vision models okay now we have large language model in 2024 we'll see more multimod llm there will be improvements and this not what I am saying the entire Community is saying that because recently Bill Gates was talking you know with uh Sam Altman and Bill Gates asked the question that what are the changes that we will see in 2024 from the side of open Ai and then Sam Alman said that they are working on to improve the multimodel capabilities of their GPT models GPT 4 and we'll see GPT 5 coming on soon now GPT 5 will have more multimodalities capabilities over there like you can talk to video you can talk to images you can talk to perhaps audios also you never know right so audio is easy to talk because you can get the transcription and again you can use an llm to do that but videos and images and stuffs like there are some some something that they are focusing on now you should also know that how you can fine tune this uh multimodal llms mainly the because they have Vision capabilities so we'll call it large Vision models lvm and and how we can f tune with uh techniques like Laura koraa Etc now this is the agenda of the video we will F tune eix 9B on a publicly available data sets of Pokemon Go Cards so we have Pokemon go carts I'm not a Pokemon player I don't I don't have any information about those things but this is a data set that I found on hugging face and this is what they also recommend right now in the documentation that they have given so we will take this data set from hugging face it's an like a a visual question answering you can upload a Pokemon related card and you can ask questions on top of it and it will try to see if the model is the F model is able to generate the answer or not now we are also going to push this to hugging phase so we can also use it later for inference purpose there have been question that how you can inference it once you push it you can do it two ways one is you use Transformers pipeline to inference as simple as that the other one is you can merge the Lowa adapters and you can also use it again through the same conventional way we have been you know we either using autoc Cal LM or whatever to you know from pre-train to inference it so there are two ways we'll see how we can do that so the agenda is we'll you know uh download the model weights in Google collab I'm going to use a800 GPU you need that a00 GPU to work with uh this fine tuning task you can either use runp pod Lambda Labs or Google collab so I'm going to do this in Google collab Pro on a00 GPU we will take the data set from hugging F we will proc say transform the data set so it can fit into the uh fine tuning process of adix and then we will write uh configurations for bits and bytes Laura and so on and so forth and then we'll find tun so let's start our you know experimentation of find tuning eix 9B all right so to experiment uh with ID fix 9B uh we are going to experiment in Google collab you can see I have connected with a a00 GPU it says connected to Python 3 GC engine Google compute engine backend and this is the availability that I currently have you can see dis of 160 rgbs and RAM of 80 around 83 orgb now these are things that you need to install if you want to find tune ID defix 9B now the thing is that you have a data sets that we going to do because we going to uh use this data sets uh by the fusion 21 that's called Pokemon cards where you have an image as an input and then you have caption so the caption and image URL we are interested in these two and it has 13.1k rows on hugging face right now now this is the data set that we are going to use data set card for Pokémon go you can see this is how the data looks like an image URL a caption a name HP set name and they have explained that health of the Pokemon set name the name of the set of the cards blah blah blah right so this is something that will help you uh you know to learn learn multi model fine tuning this is a basic data set that you can take but you can also have your own data set which are similar to this let's install uh here in this a00 environment so we are using data sets then we are installing uh the Transformers thingy directly from git we are building it and then we are saying okay you install these libraries and utilities like bits and bytes that will help us load the model in four bits then we have sentence piece which is a uh dependent IES of Transformers for that helps you with vocabularies and stuff then you have accelerate then you have Lura Etc now then you have hugging hugging F pip uh pip as well over here and whatnot right so let's use that this says uh let me just have a look it says add did not find Branch or tag I just wanted to get this one but even that's not working then uh uh we will see we'll just remove this part from here and just install Transformer from the Source I just wanted to add that particular branch of it but it is fine now this will take a bit of time to install so let's wait for it and let's see how we can load the model the next thing is we need need going to import so import torch and I have said that I already have created a video on ID fix 9B so if you want to use Rix 9B in your uh Projects please watch that video how to do inference here in this video our focus is more on fine tuning how we can fine tune a multimodal llm like AIX now I'm going to get Pi so from PI import from PI we need Lowa config so let's get Lowa config and then we need get P model so let's get get P model and after that we going to have P so from P you're going to get import image and then we need from trans Transformers [Music] Transformers and I'm going to import a few thing the fun is eix we can directly get eix from Transformers it has a class eix uh for vision it fix for vision text to text and then we need Auto processor Auto processor and then we need trainer and we need arguments so training arguments and we also need bits and bytes config so let's get that as well bits and bytes config I hope this is done bits and bite accelerator not accelerator going to need config here and I think I also see that's wrong that b would be Capital bits and bytes bits and bytes config and I see there's a mistake in this import that should be data sets now once our Imports are done the next thing is to load the quantized model so let's get the model in and I will first bind it to Cuda so device equals I'm going to say a conditional there if to if Cuda if torch Cuda is available lcpu now let's first get this now the checkpoint so let's define a checkpoint so checkpoint equals and this is the ID fix model so if you open and ID fix is been created by hugging face M4 it's a model by hugging face itself so if you write here iFix 9B uh excuse me that's a good rhyming uh hugging face if you just write hugging face emper this is the repository that we're going to use you can see it says uh somewhere you'll read text generation and then we'll also uh read image to text multimodal and all right so that's why we are using it now let's bring our path of model checkpoint now after checkpoint we going to write the bits and bytes config so bits and bytes is an amazing library that helps you load the model in four bits and really helpful if you want to use qora to fine tune a large language model if you are not familiar with all of these terminologies watch my video on llm fine tuning that's called crash course of llm fine tuning where I've explained everything in detail I will give the link of that video and description now let's get bits and bytes config now in bits and bytes config the first thing is load in forbit so you're going to say okay load in 4bit and you can see it suggest you there uh in the documentation so the function documentation that you get load in 4bit and let's make it true so uh that is true the next thing is we going we're going to say also double quantization equals to true in this case so BN bore 4 bit score use double Quant and I'm going to say yes make it true you use that as well now what should be the quantization type nf4 which is a data type and if you want to understand more about nf4 normalized float I think then you should also watch my that video that I was talking about where I've explained everything in detail now uh for uh bit Quant type and then I'm going to use nf4 in this case nf4 and what else we have then we have compute type we're going to use float 16 maybe uh let's use that so BNB 4bit compute type yes compute D type the data type of Compu and I'm going to use torch. float 16 we going to skip some modules uh that because that cannot be quantized properly in this case so let's skip those so I'm going to say llm and it has an eight in it something that you can see see it over here right it says uh es skip modules yes and it's a list that you can define a uh different modules uh first is the head so let's use the LM head and uh this embed tokens and this has come from the documentation of course of uh the here now let's have our BNB config ready and once we are ready with our config we're going to use processor Auto processor so a processor equals autoprocessor that will help us get the model and you can see autoprocessor do from pre-trained and let's let's just do checkpoint and see uh what it's what it does and you can see it's getting the tokenizers and then we'll start getting the uh special tokens have been added in the vocabulary makees sure the associated word embeddings are fine- tuned or trained now let's get the model in so what I'm going to do here is let's do model equals and then it fix for vision Tex to te from I hope I have imported this uh it a fix for vision text to text not from pre-train and here you write your checkpoint and with checkpoint you use a quantization because you are going to uh load the model in four bits then you have to uh Define here a quantization quantization uncore config so in quantization config let's write BNB config this is right and then what else do we have let's have our device map so we have a Cuda here that's running so we have only one Cuda uh but let's make it auto automatically decides it uh if needed it can also upload but let's do that so device map Auto and let's see if this works okay let me add few more sales and now this will take a bit of time guys because it has to download you can see it says downloading s so it has to you know downloads all those things and we have to wait for all the model weights the SE ERS that you see here uh so let's wait for it and come back once it gets completed now you can see that model has been completed here you can see the model weights have been downloaded and now we can just print the model once you you print the model you'll be able to see all the uh like the entire pipeline of the model all the layers that you can see over here uh that completely have been uh printed now let's write a function for inference that will also help you once we find unit so let's write it inference and inference I'm going to write Define or let's write uh do uh like inference or something like this inside this I'm going to pass my model and processor image text not image text so we're going to have model processor let's then have prompts and then let's have a Max new tokens and keep it 50 as by default so max new tokens and then keep it 50 now here uh we're going to have tokenizer so let me just get tokenizer equals and processor. tokenizer we're going to use the same processor it iFix text to text and going to use the tokenizer from there and then let's remove the bad word thingy so let's have Define a bad words where you uh Define some tokens like you have for image and then excuse me and then you have uh what else we can fake token around image okay so all the fake token around image and let's going to close this and yeah that's the bad words now if length of bad words we can do a conditional checks over here if length of and let's make it small guys by the way length of bad words uh uh zero let's see what it has given it says bad word tokenizer uh do convert tokens to probably not here so maybe let me just remove this now we have bad what id is and let's just use the tokenizer function and inside tokenizer I'm going to pass my bad words add special tokens and the but then again that will be it's too fast I don't want it to be too fast I don't want to be input IDs perfect now this should do let's then have end token so uh we should have an EOS token so let me just do an EOS token and then you have a system so says so this is how but this would needs to be a string by the way sorry this should be any string and then uh us token ID and then it goes tokenizer do convert tokens to ID us token and then you define your inputs so now inputs equals you take your processor pass the prompts within that uh prompts so let's pass the prompts and up after prompt you use return tensor and two return tensors or torch tensor and then you do dot two and use the device which is on Cuda now it will generate an ID so generated IDs so generated IDs equals model. generate inputs and then goes your EOS token ID or let me just do it like this uh EOS model generate the first thing is your inputs so inputs and then after that you have uh EOS token ID and then it goes it should be a list by the way so EOS token ID and then bad word IDs bad word ID then you have Max new tokens Max new tokens and then let's use early stopping here for repe repetion and all so early stopping this should be a function yes early stopping true now this should do early stopping true now what I'm going to do here is I'm going to get the generated text here so generated text equals and then goes your processor do batch so let's define a batch decode so batch decode and generated ID is keep special tokens true and get the first one of it and then return the uh or let's not keep return let's just we can keep return as well but it's fine uh let's first print so print generated text we can just apply the function so print generated text yeah this is fine now Define to in all let's remove this we can directly apply this function so now this is fine let's run it it says invalid syntax so it gives you there's something wrong where is it wrong let's have a look and here it is uh wrong now let's run it again and now it looks good the function is ready for inference maybe we can take some uh for inference purpose but I don't want to uh look at more on the inference thing here I want you to more on the fine tuning but this function will help for the inference as well so you can just get a e or load it from locally as well you can also do uh local load if you want to do that and I will request you to watch my previous video on eix if you want to understand how does the you know uh this working on uh the local uh images as well and whatnot so if you want that please watch that video so let's go back on fine tuning task of uh that we have and for that we're going to use uh fine tuning data sets that we were talking and this is the data set that we're going to use Pokemon card so we have to write a a function that will convert the RGB basically the pre-processing thing so let me just write preprocessing here and in pre-processing let's write a function so I'm going to say convert to RGB okay now this is the function that I'm writing and I'm going to pass my image so you'll have image in your data right uh excuse me control J now you have your image you can take any other data now first is that we're going to check that so if image do mode is already already like there's already an RGB then you don't have to follow this step like you don't use the function just return the image now else like if not there then image rgba equals image. convert so I'm going to use convert rgba and then you have let me remove all of this for now image RGB and then have a background so background equals image. new and for for this we're going to use pillow so this is a different image by the way now image. new this don't get confused with the params input parameter image that you have you know uh you have considered in the function this image we are getting it from the library of pillow now image RGB size this is right and then you define 255 255 okay uh so 255 255 255 now after that uh let me just remove this line we don't need it now now I mean we need that so we okay that's right so we're going to use image Alpha okay to image alpha. composite image. alpor composite that's right where you pass uh so let me just let me just remove the image it might make it a bit confusion let's call it Alpha composite and then it use image. Alpac composite background ground and then you give your uh image RGB and now the next line again is to convert to RGB so return Alpha compos not yeah I mean we can do that so let me just store this in a we can just overwrite Alpha composite and then we can just return here and then let's return Alpha composite yeah that's correct now let's write one more function uh that would be transformation function to transform so let's call it transforms and here you can pass your examples in batches example batch you have you'll have thousands of images uh in your data set now image size equals processor do image uncore processor do image underscore size that's your image size then you need a mean so image mean equals then again you use the same image mean then you need standard deviation standard size mean image standard now you're going to use the transform so let's use uh image undor transform equals and then you give transforms. decode or excuse me compose not decod compose and it's a function that can take any number of list and here I'm going to Define convert to RGB at the first thing the function use the function or the pre-processing function that we have written convert to RGB then you're going to use random resize C all the augmentation stuff thingy for the transformation that we going to use here transforms dot random and then resized crop and then you can use let's see image size and then you have image size then you have your scale it skips you this say transforms. compose which is like didn't we we have the transforms but anyway it's okay now haven't we imported transforms now that's the question it should from trans okay let me just import it I think it's a part of torch Vision so let me just get that so import torch Vision uh do transforms as transforms now let's get that here and once you do that uh let me remove all of this okay now uh let's uh get our thingy here so let's write that then we're going to have so it's a function so in this right let's write image underscore size let's write that then we have your scale which is is a bit higher 0.9 to 1.0 and then again use interpolation these are all pre-processing steps guys know that you would have used in I don't know if you have used you would have used in computer vision as well sometime now enter uh it should show me transforms do enter it's not suggesting me what's this name of the module interpolation mode or something like that and I need a I'm not sure about it let's see by cubic if that works just a moment yeah interpolation mode I hope it's right uh and then you use transforms. two tensor and then you use transforms do normalize mean and let me make it write it over here mean equals to image mean and STD equals to image STD that's how you define it and then you remove this part here now image transform is done let's have our prompt so prompts equal let's have an empty list for now prompts and then I'm going to say for I in range of length of the example batch which is caption in our case so example batch and if you come here there a caption so let's see it a caption if you look at the data this is the data that we have taken and if you look at here this a caption that we have so caption example badge caption this will act as a prompt now we split the captions to avoid having very long example because this is I just want to show you the capability because the longer it is it will take more GPU RAM for training right so not going to do that so let's do capson equals example badore badge and then I'm going to write caption uh example batch of caption and then you're going to give of course I because you have n number of and then split that so split where you have excuse me where you have a DOT to keep it smaller length and then app appends so let's append that so prompts do append and we're going to append and inside append we going to have example underscore batch example uncore batch image uncore URL and then you write I now goes here F question what's on the picture something like this you can set it what's on the picture or analyze the picture and then you get an answer so the answer basically a bit of you know giving an instruction kind of a thing that this is what you have to do but you can improve this uh further example badge and then you give the name example badge name and then you going do caption yeah and I think that's what it is now this would do okay uh this looks good and then if you want to add more you can add it here so that's what it suggest now we are okay with this now so we have what else we let's see that so for okay now that is done so let's have our input so inputs equals and then I'm going to write processor and then you pass your prompts and then you give TR transforms transforms equals image transform that's what you going to have here image transforms and then you have return tensor okay so return tensor equals PT and then you give do two device and that's what you're going to do do two device and then you just uh get the label here so let me just do that inputs labels inputs labels and you're going to do input inputs uh not input by the way input IDs and then you just return the inputs and I think this should do let's run it and see what we get here it says image transforms image transform not transforms now we have done that so now load let's load and prepare the data for fine tuning so load and prepare we're want to apply the function load and prepare the data now let's have a data set that called DS and you want to use low data set from hugging phase and low data set and you're going to give the this is the repository you can take any other multimodal kind of a data that you want to take it where you have text and images of visual question answering and those kind of things so here what I'm going to do is d s train and let's have a split size of that so train test split and let's keep a test size of uh probably Z smaller one 0.02 very small uh 0.02 as a test size and then you just do a train DS so train DS goes your DS train so let me just do that here uh DS train and then you have have trains and then you have eval so eval evaluation data set then you have DS test and after that you have train DS and here you want to set transform so DS do set transform and there you use DS transforms and then you eval ds. set transform and then you just do DS transforms right now it's downloading the data and it has generated the train split that you can see uh over here download data extracted data files and all and it it did now let's use the Laura thingy so we going to use the Lura thingy over here if you don't know what Laura is L rank adaptation watch my video of Crash Course of llm fine tuning where I explained everything in detail now let's have a model name thingy over here so model name and I'm going to do checkpoint do split which is right not minus one one checkpoint do split and then let's WR the config for it so config for Lura goes Lura config so let's define loraa config over here and you do Laura config and within this Lura config you give Ral 16 and then you give loraa Alpha so loraa Alpha equals you give 3 to if your model is small then your rank should be higher so these are all very trial and error experimentation that you do to choose the right hyper parameter uh for that so Target modules for adix your Target modules would be Q projection K projection and V projection at least Q Pros then and if you don't know what these things mean you have to watch you have to go and read the the theory before you come into practical now Q Pros K Pros V Pros after that let's have a look La Dropout so laurore Dropout and Laura Dropout is nothing but 0.05 and then bias none so let's do that and then you define this and then you will load the model using get P model so getor piore model and then you give the model and then config of Laura this is fine now if you print the trainable parameter it will show you that how many it will print how many trainable parameters are there so you do print trainable and then you use do parameters I don't know why it's not suggesting once you need it only you can see 0.22% not even 1% of the entire model you can see all params and the trainable parameters that we have okay now we're going to use hugging face trainer we'll keep a very small uh steps I just want to show you that the capability of it and how to find tune a multimodal llm like eix or even if you want to do with lava 1.5 you can do that let's write the training arguments quickly so I'm going to do training ARS equals and you then Define training arguments so let's define the training arguments and here we're going to do a a bunch of uh listing the first is is an output directory so let's define an output Di and you know again you should always do your model base model name and whatever you are doing with the uh data set that we have so I'm going to call it Pokemon uh Pokemon cards or something fine let's call it let's make it a capital so Pokemon cards now our output directory is done where we'll save the weights learning rate I'm going to keep it 2 e minus 4 you can also keep it probably 2 eus 5 as well if you want and you can see it suggest 2 e minus 5 as the recommended one in collab but I'm going to go 2 e Min -4 uh and then let's keep it F6 you can also do bf16 if you want to use that maybe on a00 but we can see and then you have per device train batch size let's keep it two batch size per device and then per device eval batch size let's again keep it two as well then you have I'm going to keep number of training AO here uh per train let's have gradient accumulation that will help you with the GPU thingy memories and stuff uh gradient accumulation steps let's keep it eight then we have I don't know why it's so much obsessed with APO I'm going to use something called dat loader pin memory to false and right now I'm only using a single gpus that to in collab and if you get an error I think you will need a bigger machine to do that maybe probably use runport or something like that or AWS s maker to do that now save total limit let's keep it three evaluation strategy and this is where you define the steps that this is what I'm going to Define on as an evaluation strategy so evaluation straty and then it goes steps and after that evaluation strategy you have a save strategy that how would you save your weights and I think that's also let's call it exra uh exra a let's also call it steps then you have save steps let's keep it uh save steps let's keep it as 1,000 is too big because I'm only going to do it for 50 and eval steps let's going to call it 25 or something like that and then you have uh Max steps let's define your max steps so max steps equals it goes 50 as Max steps and then you also keep logging step after how many steps you should logging your loss ratio that you should see in your terminal or wherever you are running it I'm going to call it 10 at logging steps now Max steps is done so max step is done what else uh remove unused column yes remove on SCH un schol column false um push to HUB right now we'll push it separately push to her false push to her false uh label name yeah that's important so if you look at label underscore names then goes your labels in a list so that's go as labels so let's get labels and uh load best model at the end yes true and then you report I'm not going to use tensor board or uh or weits and biases when be I'm going to just say uh report to none so it has something called port to none and then Optimizer which is I forgot to get the optimizer so optim equals and I'm going to use Adam 8 bit so pasted underscore admw andore 8bit I think this should do uh so this is our training a let's run it and see if we get any error here okay we didn't get any error now let's use the trainer from Transformers so we're going to use trainer and I'm going to do trainer equals and just do trainer and inside trainer I'm going to pass model equals model so let's define a Model A model equals model and then you define a equals uh training a and oh so many things here training ARS equals training a and then we just going to keep two things train data set and eval data set so train underscore data set and eval underscore data set now let's run this and just do trainer. train once I do that it will take a bit of time you know it will take a lot of time to be honest and let's see once we run it it says Pro processor dot call got an unexpected keyword argument transforms transforms okay this is surprising uh I think and why didn't I get that error over there once you were transforming that but I think I have made a mistake that should not be transforms I believe this should again transform so bad my bad ah stupid thing to do but anyway let's again use that for the transformation and I'm going to use reload it mod Lo train blah blah blah trainer. train and you can see the training has been started guys the fine-tuning step that we have for 50 steps of course you can increase the number of steps but I just want to save some time just want to show you the capability of how you can f tune uh multimodal llm like it fix that to in Google collab on a single a00 GPU on a data set that we have it from hugging phas that's called a Pokemon cards data set now once we will F tune it we'll also proceed to hugging face Hub so let me do one thing let me pause the video here and come back once it is done so you can see that our finetuning steps has been completed but there were a few things that I would like to tell you I was uh fine-tuning it for 50 steps if you remember but I started getting Cuda errors so I reduced the training steps to 25 because I think it was exceeding my uh vram at that point of time so I just kept it only for 25 steps and refine tune it here you can see that I have to reduce the numbers over here on from 25 and stuffs now we have got good training laws what I see is that if you do it for around you know more steps on your data set this model will perform really well uh and you can see the training loss and validation loss over here on the screen now how do we do an inference so I have so this is how you can do it so let me first get an URL and I will going to use this URL that I have let me put the URL over here this URL that I kept it handy you can take any other URL as well you can see this is the URL that we have over here now you can also take other urls like the some Pokemon URL images that also you can take it so I have one uh let me search that so that was on uh Pokemon Pokemon cg.o I have my notes with me and uh let's try it out if there's an image here 2or highest. PNG now this is this is also an image that we can take so probably let's take this one because this is more related to a Pokemon so let's take this one because we have fine tuned on that data a Pokemon kind of a data Pokemon card data so I'm going to use this image so this becomes your url now now once you have your url you can just and you need a prompts so let me just do a prompts equals and then you pass your prompts here you give your URL and then you give question so let me just Define the question if you remember this is how we set the uh prompt before our training arguments now what's on the picture let's ask like this because this is how we were expecting a system prompt something like that now answer and then let's run this prompts now this prompts hold my URL and the question now what I can do I can use the do inference function so the function that we have written the do inference where we did not infer once we loaded the model but now we're going to uh do that on the finetune one so processor and then goes your prompts and then it goes your max new token so by default was 50 but here we're going to make make it like a bit bigger like 100 or something like that now Max new tokens now let's run this and see what we get over here you know is it able to answer it related to that uh Pokemon stuff or not so that's that's the thing let's run it and you can see it says what's on the picture the answer is this is Lucario gav a basic Pokémon wow uh it got it it says this is uh a basic Pokemon card of type Darkness with the title Lucario and 90 HP of Rarity blah blah blah and we got the output over here it generates the text and of course we can handle it Max tokens is 100 and you can see the Lucario or lario whatever it is over here Lucario hp90 and it gets the it's able to answer it from that visuals of Pokemon related images now this is this is fantastic isn't it now the next thing is how we can push it to HUB as well so let's try to push this to HUB so let me just see and later we can also use it or you can use it for your inference purpose or for your you know research purposes or study so let me just do push to HUB so for push to HUB uh let me just uh uh first in or log to hugging face uh CLI but for that I need the access token so let's go to hugging face and let me go to settings and access token and you need the right one so you can see I have a right uh access token over here let's see if this work otherwise we'll take a different route now I'm going to do hugging pH CLI login and CLI login will ask you to enter your exess tokens let's see if this works now hugging F utf8 local is required got NC uh I have solved this is the problem uh UTF is required let's see what it suggest uh yeah this one this should work so let's get this from here I don't know what's the problem with hugging face nowadays but uh by the way it's not a hugging face error this this error is I'm again getting the error oh surprising no I'm not getting it now and it ask for the token you give you a token here and what happened uh I'll say no it says invalid token passed again I will run this I don't know why it's I I did some mistake so ah okay it's invalid because we copied it uh the code there okay let me just run it again and once you run that now uh it will ask the token here you hit enter Then it all ask I'll just do not as a g credencial it says token is valid your token has been saved to root cach a hugging fish tokens and whatnot right now how you can push it to HUB is the up to if you bring this here you can see now we have an eix 9B Pokemon cards if you expand that you will see your checkpoint 25 because we find tune it for 25 steps now you can see your adapter model the optimizer your safe tensors and whatnot we can also merge it with the PIP we'll see that uh in a bit now let's do one thing let's push that so model and then underscore push to HUB and then goes your F model name and then Pokemon cards and then I will keep it let's do not keep it private because I wanted to give it uh to the community so you can also try it out and find out your findings that how it works now let's uh run this and see what we get you can see it's pushing right now to uh the hugging fish Hub and once it push pushed it there you'll be able to see your uh model right over there and it says commit message upload model you can Al of course you can make it more R and all better so if I go here and you will see I just pushed a model it AIX 9B of course you can of course create the r over there you can do it but this is the model that we have pushed skuma 307 eix 9B Pokemon cards now once you click on files and versions you'll be seeing your adapter model. safe tensors and adapter config.js now you can use Transformers pipeline to load the model and work with it you can do that but you will not be able to load this so the thing is that you can also merge that the Lowa adapters that we have and you can also do that but this is how you pushed it to uh hugging face and here is your fine tuning successfully that you know we have completed that's all for this video guys I hope you now have enough understanding that how you can at least get started to fine tune a multimodal llms I will have more videos on fine-tuning multimodal llms I'm looking at the laa and flowa you know fla will do it through pytorch and lava will do it uh the way that we have been doing it right now for eix and there will there will be a couple of more videos on fine-tuning multimodal llms I'm also working on an end to endend uh product development where we F tune push it to hugging face deploy it on sales maker and build a micros service or R API kind of a thing where we can leverage that in a product that is a bit lengthy video I'm working on it it will take a bit of time but that video is also coming very soon so these are on the cards these are on the road maps of the videos and you'll see more videos on fine tuning multimodal llms now if you have any questions thoughts feedbacks please let me know in the comment box you can also reach out to me through my social media channels please find those details on YouTube channel banner and YouTube channel about us if you like the content please hit the like icon if you haven't subscribed the channel yet please do subscribe the channel share the video and Channel with your friends and to peer thank you so much for watching see you in the next one

Info

Channel: AI Anytime

Views: 5,594

Rating: undefined out of 5

Keywords: ai anytime, AI Anytime, generative ai, gen ai, LLM, RAG, AI chatbot, chatbots, python, openai, tech, coding, machine learning, ML, NLP, deep learning, computer vision, chatgpt, gemini, google, meta ai, langchain, llama index, vector database, idefics, IDEFICS 9B, idefics 9b llm, llm, IDEFICS 9B LLM, IDEFICS LLM, multimodal llm, Multimodal LLM, llava, llava 1.5b, LLAVA LLM, LLaVA 1.5B, LLaVA LLM, fine tune LLM, lora qlora, LoRA, QLoRA, PEFT, PEFT fine tuning, fine tuning LLM, mistral

Id: usoTCfyQxjU

Channel Id: undefined

Length: 49min 4sec (2944 seconds)

Published: Mon Jan 15 2024