Apple Join the AI Race with MLX

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Apple's machine learning research team released of a new framework called mlx to build Foundation models this is very unlike Apple because usually Apple doesn't release open- Source software their goal with mlx is to design a machine learning framework that developers can use to build models that can run efficiently on Apple silicon here's a tweet from Ani hanun who is part of the Apple machine learning team just in time for the holidays we are releasing some new software today from Apple machine learning research mlx is an efficient machine learning framework specifically designed for Apple silicon so you will be able to run this on your laptop here's an example of running Lama 70 Bill model on M2 Ultra later in the video I'll show you how to do this on your own machine so you can train a Transformer llm or fine tune luras on it you can do text generation with myal image generation with stable diffusion and speech recognition with whisper so this is pretty amazing because now you have specialized software for Apple silicon they are also releasing mlx data that is supposed to be framework agnostic efficient and flexible package for data loading so it's going to support pytorch Jacks as well as mlx Apple has been releasing their M series chips the latest one is the M3 and it has support for neural engine so it has the capability to run AI applications but now with the release of mlx we will have specialized software to support the AI capabilities on Apple silicon and with this we might finally see a foundation model from Apple but we'll have to wait for that a bit longer okay so let's quickly look at the mlx framework later in the video I'll show you how to install this on your own machine how to run a llama 2 model using mlx and then at the end we're going to look at an example of how to fine-tune a llama 2 model using the same framework so mlx is a numpy like array framework designed for efficient and flexible machine learning on Apple silicon brought to you by the Apple machine learning research now if you're not familiar with numpy numpy is a python package that is specifically designed for scientific computation one thing you will notice even in this is their focus is on machine learning not on generative AI the python API closely follows NPI with a few exceptions and we're going to talk about those mlx also has a fully featured C++ API which closely follows the python API so this is pretty great because not only you can run this in Python but you can run C++ code which is going to be a lot faster now how this is different from numpy so the first thing is compossible function transformation so mlx has compossible function transformation for automatic differentiation automatic vectorization and computation graph optimization so seems like mlx is taking a lot from py torch when it comes to Vector computation then it supports lazy computation so computation in mlx are lazy which means that arrays are only materialized when needed it has Multi-Device support So operation can run on any supported device since there is Unified memory on these Apple silicons so both CPU as well as GPU is using the same memory that's why I think it's very important for it to have this Multi-Device support now according to them mlx is inspired from Frameworks like py torch Jacks and array fire now a noticeable difference between these Frameworks and mlx is the use of unified memory model so array in mlx live in shared memory operation on mlx can be performed on any of the supported device types without performing data copies so usually if you have a GPU and CPU some of the operations are performed in GPU and some are performed on CPU so there is this back and forth between CPU and GPU but since Apple silicon uses this unified memory so that's why mlx might be a lot more efficient on Apple silicon with this release the Apple team released this mlx example uh repo on GitHub now they have pretty good examples in there which are well documented so I'm going to show you how to run Lama 2 on your local machine you can also do parameter efficient fine tuning with Laura you can generate images with stable diffusion this is going to be pretty amazing and you can also do speech recognition with open eyes whisper so let's look at a few examples of how to do some of these things okay okay so first we need to install mlx on our local machine and in order to do that we're going to be using pip but before that I'm going to create a cond environment so first we're going to create a new virtual environment so I'm going to be using cond for that so we're going to use this command cond create dasn the virtual environment that I'm creating is called mlx and I'm going to be using python 3.10 now I have already a virtual environment by this name so I'm not going to create it again but I will simply activate that and the way you do it is we're going to use cond activate mlx command now you can see that it has changed the virtual environment that we are currently using okay next we are going to install the mlx package using pip so we will use the PIP install mlx command as you can see it's already installed on my system now in order to run some of these mlx examples we need to clone the examples folder we need to clone the examples folder so come here click on this green button then simply copy the URL so I'll go back to my terminal and here I'm going to use kit clone and then provide the repo ID now the repo has been cloned on my local system next I need to move to that so we are going to use the change directory command to move to that uh directory and now you can see that I am within the mlx example directory now first let me show you how to run a llama model using the mlx package so we will need to move to this llama folder so again we're going to be using the CD command so CD Lama and now we are within the Lama folder so in order to run the model we will use this Lama Pi file however first we need to download a llama model now in order to run Lama 2 models with mlx we need to convert them into this new NPC format so there is all already a hugging face repo which has a Lama 27 Bill model in this format so we're going to be using that there is also a script in the mlx package that will let you convert model wids to this new NPC format but for this video we're going to be just using the one available on hugging face now here are the steps that we need to follow we already install the mlx package so we don't need to do that now we need to install the hugging face Hub and the hugging phas transfer packages so I copied the commands here and let's just run those and we have already copied or clone the ml examples GitHub repo so I'm not going to do that again next we need to set the hugging face Hub enable hugging face transfer environment variable so here we're going to set that here and this next command is going to put download the model and then convert it it to the NPC format that is needed by the mlx pack package so let's run that now we are all set to run the Llama model so for that we're going to provide the path of the model in the npz format then the tokenizer and we're going to run this using python L.P and your prompt so the prompt in this case I'm just using the example prompt that is provided in the repo but let's see how quick it's able to run the model so this is basically real time processing from the model so it loaded the model from the disk and now if I hit enter or press enter it will start generation so this is the real time and it's actually pretty amazing it's able to do this pretty fast so the full generation took 5 Seconds okay I want to run this command again but now also want to look at the GPU usage on my M2 Max so let's run this okay it's loading the model from disk and and now it should ask me to press enter to start generation so let's see if it's using the GPU or not and you can see that the GPU usage went up so this is pretty awesome now Within These two runs both the prompt processing time as well as the full generation time remain pretty consistent so this is a really good news next let's look at an example of how to train uh Transformer language model using mlx so they have provided an example in this Transformer unor LM folder so all we need to do is just run the main.py with the GPU flag now by default it's using this PTP corpus now this is English pentry Bank Corpus and in particular the section of the Corpus corresponding to the articles of Wall Street Journal now if you want to provide your own data set you can do that with the Das Dash data set option so let's look at the code so here we have quite a few options that are available if you want to run this on GPU you will need to use the GPU flag again you can provide the data set with the data set flag you can Define the context window or context size by default it's 1024 number of blocks of the Transformer that you want to add then there are number of heads now number of iterations by default it's set to a very high value but I think we'll just run like 100 iterations just to show an example um of the GPU usage you can also uh Define the learning rate right so all the stuff that you will need in order to train a language model you can just pass on those parameters in here so I change the directory to the main example folder then we are going to change the directory to the Transformer LM folder and here we have the main.py file that we need to run in order to train our llm okay so keep in mind I'm not running a full training job in here just want to show you how it can be done so we're going to use the python m.p we're going to be using the GPU and I just kept the number of iterations to 100 let's see what the GPU usage looks like so this is around 50 million parameter model and you can see that it actually doing pretty good in terms of the training speed right so the loss is decreasing pretty nicely the iterations per second are also pretty good for training so I think it's it's pretty nice and it's using the full capacity of the GPU on Apple M2 Max I just restarted the training process and just want to run it for longer in a subsequent video I'll show you how to test the the trend model this is just a very small model so probably it's not really useful but we will see if we can fine tune something like a Lama 7B with the mlx package if that is possible I think that is going to add a lot of value to this package specifically for Apple silicon users I hope you found this video useful thanks for watching and as always see you in the next one
Info
Channel: Prompt Engineering
Views: 19,029
Rating: undefined out of 5
Keywords: prompt engineering, Prompt Engineer, Apple AI, MLX, Apple MLX, Machine Learning
Id: FplJsVd2dTk
Channel Id: undefined
Length: 12min 22sec (742 seconds)
Published: Fri Dec 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.