How to train a model to generate image embeddings from scratch

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

I think one of the most amazing beautiful Concepts uh from the modern AI world are embeddings they're extremely helpful and when you know how embeddings work when you know how to use them you can find all sort of interesting ways to take advantage of them okay so in today's video I want to show you how you can create your own embeddings from scratch okay how you can Implement a very simple neuron Network for you to understand how to create these embeddings in the first place now I'm not going to be using any complex structures I'm not going to be using Transformers there are multiple ways to generate these embeddings I'm just going to use a simple neural network okay and I'm going to show you line by line how I create those embeddings for images now if you do not have an idea of what I'm talking about here let me show you here the coh your playground so in this playground coher has a way for you to write some text and generate embeddings for that text and what's really cool about that display ground is that cohere uh gives you a visualization of these embeddings in space now as a reminder an embedding is just a vector an array of floating po numbers that's it a bunch of numbers so if I have an embedding uh 4,000 th numbers those represent coordinates in multi-dimensional space okay so I like to think about embeddings like latitude and longitude okay where with those two coordinates you can place a point anywhere in the world it's the same idea with embeddings but instead of working in two Dimensions uh they work in multi-dimensions so it's really a little bit hard to reason about them with more than two Dimensions anyway what you see here on the screen is the coher playground and I typed six words uh off camera three four of those words so I have bananas apples rice those three words are represent food obviously they are food Concepts and I have camera photography and triod which again they represent some sort of like uh photography related Concepts and I'm going to generate a embeddings for this okay so I'm going to click on this run button here and coher is going to compute it's going to use an embedding model to compute the coordinates of these Concepts to place these Concepts in the world okay and then it's going to give me a 2d representation here so obviously it's reducing the dimensionality of those coordinates and it's just giving me a 2d representation here now notice something uh super cool which is I have three concepts down here here and I have three more Concepts up here so it's like a small cluster here now if we look at it you can see photography and tripot they are closed together versus rice and apple which are closer together and that is one of the main characteristics of an embedding like similar concepts are going to have coordinates in this multi-dimensional space that are close to each other and separate Concepts Concepts that don't look alike they are going to have uh coordinates pointing to directions that are very very far apart that's the main advantage of these embedding so before I show you the code and before we run and build this embedding thing uh let me tell you something else why are these embeddings important why do you really care well uh you've heard about rack applications for example so when we're using rack application the whole idea is to provide some context for a model to answer questions so imagine a person is asking a question about a specific document that you have stored okay so the idea is you're going to get that question from the user you are going to find related documents in your database of documents and you're going to get that document the document that better resembles or is that the most similar to the question give those two pieces of information to a large language model and ask the model can you please ask this question using this document but the key here is how do you find related documents to that question and the answer is through embeddings because you can generate an embedding for a question and you are going to store embeddings for all of your documents so if you have that database of Doc of embeddings now you can use a concept which is called similarity to find which of those embeddings are the most similar to the query provided by the user and similarity is just distance so we can compute the distance between those two coordinates or two set of vectors which are just coordinates and the closer they are the more related they are okay that's the idea That's The Power of these embeddings now today I'm going to build this from scratch and hopefully uh we're going to learn something together now this is a little diagram that I put here to show you what the structure of the network that I'm going to build will look like okay and it's going to be very simple I promise it looks more complex than what it actually is okay so I'm going to be using this paper is called dimensionality reduction by learning and invariant mapping and it's an old paper okay it's a paper from 2005 and one of the authors is Yan Leon I hope you know who he is he's one of my Idols here all right so in this paper Yan and Ed all they propose a function to learn how to reduce the dimensionality of a vector I'm going to be using this paper to implement that loss function which is called the contrastive loss function let me just find it here uh so we can what is the last function here okay there we go so the last function is right here I think this page whatever you can find it but this is the last function that I'm going to implement and you'll see in a second what it means and why Works Etc but I'm going to implement this last function to learn how to generate embeddings four Images now not any images because I need to keep this very simple but I'm going to be using the amness data set which is just a data set of handwritten digits so you're going to get images like this three and this seven that I have here on my screen right so that is a handwritten three and a handwritten seven and what I want to do is I want to be able to generate embeddings for those images and if I do that correctly I'm going to end up with a model that given a handr digit is going to generate coordinates for that handr digit that will fall in this world multi-dimensional world a coordinate generated for the number three should be closer to any other coordinates for any other digit that represents number three and if I pass number three and number seven those coordinates those embeddings should be very very far apart that is my goal now how am I going to do this how am I going to get to this point well I'm going to use a concept that's called a sist Network and you can find more information about Siam networks online uh but the idea with a Siam network is that I'm going to create a neural network with two heads so they're going to be two inputs instead of a single input that you probably used to for a neural network I'm going to be passing two different inputs and each of those inputs are going to connect are going to lead to a trunk or a body if you will which is this what I'm calling here the network and that network is the the neural network that's going to generate the embedding like the output of that neural network is going to be the embedding that we want to learn right so that is sort of like the main work is going to happen through this network here and again I'm I'm just displaying here the body and you can see that I'm I'm using dotted lines and I have a note that says the two twings I'm calling them twings because it's S network so obviously two inputs there are two twings these two twings will share the same weights so basically what that means is that for representation purposes I drew them separate so it makes sense but in real life these two inputs will be connecting to a single Network a single twing here and the output are going to be the embeddings of the two inputs that I pass and then I'm going to have a distance function and this distance function will compute how far away these two embeddings is okay or these two embeddings are and the closer the embeddings are that indicates that the images are the same if the if the distance is large that indicates that the images are different so with this sis Network let's let's reason through it really quick I should be able to get this network to generate coordinates that are correct depending on whether two images are the same or not so this is the way it's going to work I'm going to take two images a three and a seven okay and I'm going to give these two images at the same time because I have two inputs to the network I'm going to tell the network the ground Truth for these two images is going to be a value that's far from zero because these two images are different so I want the distance to be large so I'm going to tell the network these two images should the distance between them should be large so these two images are going to run through the network at the end of it I'm going to compute the distance of the two generated embeddings if that distance is close is small I want a loss function that penalizes my network basically I want the loss function that generates large updates to get these two embeddings to match each other to get closer to each other now if I pass a three and three what I want that distance to result is be close meaning the distance at the end is going to be close to zero it's going to be very small and if it's large if I pass a three and a three and the resultant distance is large I want a loss function that again penalizes my network by generating large updates so these two embeddings get close to each other that's the whole idea here okay I'm going to be providing pairs of images and the network is going to be pushing when those images are the same pushing them closer when those images are are are different pushing them aside and at the end of it if I do this enough times hopefully this network will learn how to generate embeddings that are either close to each other when the images are the same or farther apart when the images are different so how do we build this okay so let's start with the code I'm going to be using caras the the version three of caras it's amazing it's an amazing front end particularly for now I'm going to be using the tensor flow back end but this is the beauty of kasas when you're using kasas you can change your back end actually let me make this a little bit bigger so you see better you can change the back end to Jacks or torch and nothing else will change in your coat and that is amazing caras is just a front end okay the actual processing is going to be done by the back end but you don't have to worry with the details of that back end you just need to switch it here now because I'm running on an M uh Mac so Apple silicon Jackson torch uh have partial support for that Mac I'm sure tensorflow also has partial support but this network the one that we're going to be building is is a little bit custom is not your regular Network so there are certain operations that are still not supported by torch or Jack for Apple silicon this code works perfectly fine if you have an Nvidia GPU or over CPU okay you don't need to worry about it but I want to use my GPU and tensor flow is well supported so I'm going to be using tensor flow here as the back end all right so I'm going to execute this that should work that should initialize caras there awesome so again I'm going to be using the amness data set set because it's such a popular data set it already comes not pre-loaded but you can load it easily uh just from using caras if you're using caras you can just reference it by going data set. mnist and then you do the load data and you know it's great it already gives me a train set and a test set so it's it's just doing that for me I don't have to do anything which is awesome so I'm going to load this and you can see that I'm getting 60,000 images in my train set okay 60,000 images of 28 by 28 pixels so every image is just a square it's a matrix 28 pixels 28 rows 28 columns awesome and the Y uh values so the labels are just numbers so it's only one dimension it's just 60,000 values here so I'm going to show you here what this looks like so I have a little bit of code a little bit of M plot lip code that is going to display the first 10 images on the train data okay just so you see them so you can see here I'm showing the image and I'm showing the label associated with that image so number five this is the image and it's you know the label is number five and 0 044 etc etc so this sort of like yeah it's working sort of like gives me some confidence that so far the code is working all right so here's the thing I want I'm going to go back here I the network that I'm going to be building is going to be expecting it's going to have two inputs and I want to pass to each one of those inputs this is going to be a fully connected Network very very simple it's not going to be a convolution or no network or anything fancy again therefore I'm going to be expecting every single Pixel as an input of the images now I have let me pull up my calculator 28 time 28 because that's the size of the image if I do that multiplication I get 784 pixels therefore I'm going to configure my inputs they're going to be two of them with 784 values so 784 values are going to come in representing a single image but the data that I have it's structured in a different way the data comes in three dimensions batch size or not batch size but how many images 60,000 and then the second dimension will be how many rows and how many columns so I'm going to reshape that to make it 60,000 images of 784 pixels so you can use to do that NP has a functions called reshape very easy to use so as you can see here I'm just going to reshape my tensor here and I'm going to say hey keep the First Dimension the same don't touch it that's what the minus one uh or the negative one uh means and then change the other two Dimensions to 784 uh just an array with 784 positions and hold and behold when I execute this that's what it does so now you can see the shape is 60,000 by 784 so now I have all of my pixels all of my images are just stream of pixels that's it that's what I want so it's easier for me to provide this information to my network when I build it all right cool something else that we got to do I'm going to display here I'm going to create a new cell really quick and I'm going to display the first image that we have okay so this is the first image of the train set so look at this there are a bunch of zeros and there are a bunch of numbers that are not zeros these are pixel values this is a black and white image and every pixel is represented with a value between 0 and 255 0 means black no light 255 means Pure White okay so full uh light so when you get something like 26 that's gray closer to zero but it's it's not super dark it's lighter gray right and 253 is Almost White neural networks don't like to work with numbers that have a very large range so they would not like to work with numbers that go from zero to 155 so they are going to work better if we give them numbers that stay within a very small range so what I'm going to do right now is I'm going to normalize this image normalization basically takes these values and squeeze them into a smaller range in this case that range is going to be zero to one so I want all of the pixel values to be between zero and one all right so I'm going to go back here I'm going to remove this this is the formula for normalization that I'm going to use it's a very simple one it's called mean Max scaling and basically the way works is I'm going to get a pixel value and subtract the minimum of all of the pixel values now the minimum we know is zero that's the minimum value so whatever the number is minus 0 that's just x divided by by the maximum of all the pixel values we know that's going to be 255 minus 0 so this is going to simplify to every pixel is going to be equal to that pixel divided by 255 that's it that's how you squeeze all of the values so there is an explanation here in the notebook if you want to write it in the code I'm just going to apply that transformation as you can see I'm just dividing the whole Matrix divided by 255 that's gonna change all of the numbers so let's display again the first image so we can see now the transformation applied and you can see all of the zos and now notice what before were pixels between 0 and 255 now they are 99 and 71 and44 so we basically just squeezed all of those numbers all right so this is good this is looking pretty awesome we are almost ready so here I have a function that is going to create the pairs to train my model remember I'm going to go all the way up the way this is going to work this is called a sis Network very important sis they're going to be two hats okay the way this is going to work is I need to generate a bunch of pairs positive Pairs and negative pairs now a positive pair I'm going to call it whenever we have two images that are the same the two number threes that's a positive pair and the label for that for that pair is going to be a very short distance so zero so I'm going to give it here is a three and a three and the distance must be zero so do the computation and at the end the distance that you come out with all the way down here should be zero okay that neuron should be equal to zero and if I pass two images that are different let's say a four and a nine I'm going to give it a large distance so I'm going to give it a one okay assuming that one is as large as it's going to get I'm going to say hey here is a four and a seven that's a negative pair because they're not the same four and a seven and the label that you should be getting at the end the output that you should be getting must be one or close to one okay that is what I need to generate now generate a data set of pairs so how am I going to do that very simple I have a data set of samples so I can get every sample I have 60,000 of them I can get every sample and then grab that image and then get a random image that represents the same digit and that will be a positive pair and I can get a random image that's a separate digit a different digit and that will be a negative pair and then I can go to the next image and do the same so if you do the math if I have 60,000 images and for every image I generate two pairs a positive one and a negative pair I'm going to end up with 120,000 pairs to train my model so let's see how the code works just very very simple I have a couple of arrays here to just hold that information I'm going to go through all of my images in my data set I'm going to grab the digit representing that particular image let's say this is a number five just uh for gigs and then I'm going to find with this random Choice I'm basically finding any other number fives that are in my array and then I'm grabbing one randomly so I'm basically filtering my whole list of images saying find me another number five okay I'm going to grab that index so I can just reference that image I can add a pair saying this is the original five this is the random one that we picked that's the pair and the distance the Y value is going to be zero remember if the if there's a positive pair the distance must be zero so there must not be any difference in those embeddings and then I'm going to do the same thing but with a negative pair the only thing that changes now is number one instead of finding a digit that represents the same value I'm going to find a digit that represents a different value so if I started with number five I'm going to say give me a random value random digit here in my data set that is not a number five okay just going to grab number nine whatever it is okay I'm going to add that the original number five with a number nine as a negative pair and I know it's a negative pair because I'm going to be adding the Y value for this one here is just going to be one indicating the distance should be the maximum value that we allow which is number one okay so by going through this I'm going to be getting the 120,000 uh pairs with all of the Y values okay all right so this function after I go through that that Loop uh the only thing that I'm doing here is chling that so basically I'm rearranging all of the pairs so they don't look positive negative positive negative positive negative it's going to be all messed up uh so I'm going to execute this function and now I'm going to call that function to generate pairs for the training set and to generate pairs for the test set okay now I had 60,000 images on my train set therefore I'm getting 120,000 pairs now notice the dimensions here I have 120,000 pairs two is the second dimension indicating there are two images and each of those is going to have 784 that's how you read this so 120,000 elements each of those with two images each of those with 784 pixels and then for the test I had 10,000 images to test whatever model we're building therefore now I'm going to get twice as many pairs okay so now I have 20,000 pairs cool so I built a quick function here I'm not going to go through the code uh I promise there's nothing magic in this code it's just um this function is going to receive um an array of pairs and it's going to display the pairs so in a funy way so we can take a look at how they look so I'm going to execute that and what I'm doing is displaying 10 of the pairs okay so you can see the first one here is one and one and it's positive pair because they are the same then I get another positive pair eight and eight a negative pair a four and a seven so four and seven they're different therefore this is a negative pair that's why it indicates here that is negative then I get 0 and eight and that's a negative a six and a four that's a negative Zer and zero is a positive and two on two is positive you get the idea okay so those are random pairs that I got from my generation of pairs cool time to start building the network this is where the magic is going to happen this is the beauty of the whole thing okay I have the pairs just need to build that Network train it and see if it works so let me go back up and here you can see what that network will look like I'm going to start by creating those two inputs defining the inputs that are going to be expecting my pixels okay so very very simple if I go down here I'm going to be creating these two inputs here okay that's how you create an input in caras there is no magic here notice that the cape you need to specify what is the cape that you're expecting and I'm expecting 784 pixels each of those two inputs input one is expecting the first image of the pair input two is expecting the second image of the pair beautiful next in line is the trunk the the T the network the one that is going to do the work let's go back up really quick this network here is going to have an output of 128 values what does that mean well that is the size of the embedding by the way if you uh know about the open AI embeddings the one that they use when you're using CH GPT for example uh those embeddings are 1536 uh that's the size that's how many coordinates are in one of these vectors 50, 536 okay that's a lot of numbers in my particular case just because this is a toy problem I'm choosing to create a very small embedding of only 128 that is a value that you could explore whether or not 128 is the right size it depends and different problems are going to need different sizes you can go with 64 you can try with 22 uh 56 you can try with any number that it occurs to you okay for this example I'm just going to generate them with 128 now because these are the embeddings that will be the output layer of this network here okay the output layer of the network has to be those 12 128 so let's take a look at the code what I did in there all right so I'm going to be using the sequential API this is a caras API when you're building just a model that's one layer after the order I'm just using the sequential API so I can specify again one after the other the layers here now this network one of these networks is going to expect on one side 784 pixels so that's what's going to be coming from the outside into this network then I have a couple of T layers A dense layer is a fully connected layer nothing fancy every neuron on one of these layers connects to every other neuron in the next layer that's it that's how the fully connected layer works why these sizes here because those are the numbers that I used you could experiment with different numbers if you want to each one of these layers uses the nonlinearity relu so that's the activation function that I'm using here the second layer is 256 so it goes a little bit uh narrower and then the final layer this is my embedding this final layer here 128 pixels or 128 values uh I have no activation function on this I don't want to mess up with those values I want the raw numbers and those raw numbers those are the coordinates those are my embeddings okay so this is my network now notice that in my chart I have that Network twice so two the two inputs are pointing to that Network there okay and they are twice here for representation purposes remember I'm just going to be using I'm going to be sharing the weights like the node says so let's see how I do that okay so I'm going to go down to where we Define that Network and here is what I'm going to be doing pay attention to this I'm creating twing one by the way I have to run all of this I keep talking and not running code okay so I have TN one that's how I'm calling it which is going to be the top uh sort of like branch of my my application here my network it's going to use the network that I just created and I'm going to be passing the input one to this network okay and twing number two is going to use the same instance and this is important the same instance for input number two so these weights are CH are shared so they're going to be two twins and these are twin one and Twin Two I'm connecting this input one to T one and input two to Twin two but the network is actually the same instance therefore when we go when we're doing back propagation and changing the weights we're just doing that once because the weights are the same okay so want to make be clear that that's what's happening here all right so I have my two twins and now what I'm missing I have my two embeddings those are the outputs of the network what I'm missing is the final layer it's a one neuron layer and that layer is supposed to represent to compute the distance between my two embeddings how do I do that well fortunately we can use what's called in caras a Lambda layer which is basically a layer that is going to execute a function for you so we are going to Define what the function is that we want to execute we're going to attach that Lambda layer as the output of this beautiful thing that we created and that's going to be our neural network so let's see how we do that let's go back down I'm sorry about the going up and down it's a little bit dizzy sometimes so doesn't matter so here is my output layer so forget about the function for a second this is how I'm going to Define my Lambda layer I'm calling it distance that is the name of the layer is a Lambda layer notice that Lambda I'm importing it here from k. layers and I'm passing the function that that Lambda layer will execute and that function is the ukian distance that we're going to talk about in a second and then I'm I'm connecting this layer I'm specifying what is the information coming from to this layer how do you I need to connect this layer with the the stuff that comes before the layer and those are my two twins so I have an array and the two twings are going to be coming into this layer as you can see here that's exactly what's happening this top portion here is 2 in one that's coming into this layer this top portion here is 2 and two that's coming into this layer okay so let's go to the function to the ukian function that is going to be doing the computation in this layer what this function is receiving we know are the two embeddings because that's what's fitting into the Lambda layer so these twins here it's just basically the two the twin one output or the embedding of the first image and the twin two output or the embedding of the second image now what I'm doing in this line is unpacking this so the computation here is clear that's it so I'm taking that twings unpacking it into these two variables and now I'm going to compute the distance between these two variables here and the distance of two vectors we can compute using the norm of two vectors it's also called the ukian distance is that it simple function that we can call saying just subract the twin one output with the twin two output and just give me the norm of that and that is going to give me how close we are how close these two vectors are that is the distance that I want to compute in that output layer so the only thing left here on my network to to finish the architecture of it is just putting it together putting all of the pieces together and the way you do that is just using a model class so I'm going to get the model class I'm going to specify well the inputs to this model are the two inputs that I created before and the outputs to this model is just the distance function that's it now this model here is what you see in this diagram now we need to train it to train it we need to specify a couple things so first we need to spe specify what is the loss function that we are going to use second we need to specify the optimization algorithm that we are going to use we're not going to mess up with the optimization algorithm we're going to use stochastic gradient descent or a variant of stochastic or stochastic GR in descent we're not going to worry about that the L function is with the beauty of this is so remember that what we want is if these two images are different like a three and a seven and this distance is small we want a loss function that returns a big value so the updates are big but if these two images are the same and the distance is large we want the L function to give us a big error a big value and the opposite if the two images are the same and the distance is small we want not to mess up with the weights we want a loss function that recognizes that the network is working fine that is what the L function should do and that is where we go to this page paper here that is going to give us that contrastive learning loss function it's called U let me find the name the contrastive loss function and the contrastive is you're contrasting something to something else that's where the term comes from all right so I implemented bit by bit the same formula that you see here didn't make any changes so let me show you how how it works or my implementation uh let's go down and and here is the contrastive L so this is a function and it's going to receive my function is going to receive a couple things like any good regular loss function it's going to give me the Y True Value so what is the value that it should be and what is the value that I got now if you look at my diagram I I'm not going to pull it up here again but the value the output of my network is a single value it's the distance right that's the distance that we're Computing and the white True Value the ground truth the label of one pair is either zero or one it's zero if those images should be close together or are the same image or one if the images are different so those are the two values that I'm receiving here the Y value and the distance okay that's Y and D all right so let's go to this line here and try to track it bit by bit with the paper the paper says the loss function should be 1us the Y / 2 times the square of the distance that's the first term right there 1us the Y / 2 * the square of the distance bit by bit plus y / 2 plus y / 2 times the square all of this is squared so the square of the maximum the square of the maximum of zero and a margin minus the distance and a margin minus the distance now that margin is a constant uh I'm going to the constant is just like a radius and you have to read the paper to sort of like get the idea it's not important here I'm using one as the constant uh you can see it here I'm initializing that with one uh so not a big deal but in short and in English not in math terms what this loss is going to do is if images are the same but the distance is large the loss is going to is going to be bigger if images are large or different and the distance is short the loss is going to be big if the images are the same and the distance is small the loss is going to be small if the images are different and the distance is large the loss is going to be small that is what the loss function is going to do now now we need to set it up and we're ready for training so this is how you set up this L function let me execute that line and you come here by the way I I don't even know no I did not execute this line here what's going on oh I need to execute this then need to execute this then need to execute this and now we get to the compilation process so I need to compile my model so I can train it and I'm going to specify the loss is the function that I just created this contrasted loss function the optimizer is Adam Adam is just a variant of stochastic grd and descent a little bit more advanced use Adam and then the metrics that I'm going to use or uh just I'm going to use binary accuracy because Hey listen this is very simple to do with binary accuracy I either get a pair correct or not so if my distance is small enough less than 0.5 uh and the images are the same that's good if the distance is large greater than 05 and the images are the same that's bad that's that's basically what binary accurus is going to do for us so after doing that I'm actually going to execute this I'm going to come back but first let me start just training the model because it's going to take a few minutes to train the model so you can see it working so after compiling my model I'm going to plot it so this utils here has a plot model function where you can see what the structure of the model looks like and notice this looks all almost the same to my diagram except here the trunk the network the body of this neural network it's represented only once so we get the input layer which is expecting 794 pixels two input layers so remember is a pair of images that I'm going to be showing this network each connected to one of these input layers they both go uh to a sequential model which is the simple Neal Network that we built the input of that sequential model is 784 so it's going to be grabbing one of these images and the output is 128 that's our embedding that's what we care about and then then this sequential model is ending is fitting into the Lambda function which is receiving two embeddings one for each image right and the output is one neuron and that is the distance so it's a value between zero is a small value it's not even between zero and one it's just going to be a small value that is the whole structure of my model here beautiful so now it's time to train the model to fit the model so how do I do that well in car it's just very simple so first I need to pass the X values so the training data I need to pass the pair of images so how do I select the pairs well I'm going to grab the training pairs all of the values so that's what the column means so the first axis which is the batch size the number the rows just give me all of them but only take zero meaning the first image remember that the if we go well now it's executing so I cannot show you well it's almost it's going to end now so I'm going to show you here I'm going to just create a new code just so you see here the shape of this so it actually makes sense let it finish running by the way all of these run in my GPU I'm going to show you my activity monitor here if we're lucky you're going to see you're GNA see the GPU here in one second uh well it's it's already coming down I should have done it before but anyway this is uh my GPU what it looks like you can see here the blue lines that are showing up and that is I just got it at the end of the training but those pretty sure they got all the way up to 100% while it was training because it was using the GPU you can see it now how it's coming down uh unfortunately I only got the GPU just just you know I only showed this by the end of training anyway uh not important let's go back here let me show you the shape of this so I have 120,000 pairs so I'm saying get all of the pairs and from the other axis only give me the first image on that axis I have two images so get all of the first pairs so that is the first thing that I'm going to pass okay right here and then for the second pair just get all of them but now get the second image that's what this one indicates so that's going to get the second pair okay so that is my training data now what is my uh labels or what do they look like well just get the Y pairs which are values if I go here just grab all of them and everything else so remember if I get just one of these you get like this is a zero meaning that is a positive pair if I get another one maybe I'm lucky well I just got the same one but maybe I'm lucky and I get yeah there we go this is a negative pair it's a one so that is what I'm passing as the X and the Y let me just remove this the validation data I'm passing is the same thing I'm passing uh this with using the same structure I'm passing the pairs from the test set that's it so it can actually validate this thing then I'm using a batch size of 32 so that means that during every iteration the model is going to be taking 32 pairs running them through my network in order to compute the distance at the end and compute the loss function okay 32 at the same time you could experiment with this number with 64 with I don't know maybe eight whatever the numbers is and see if you can get better results I just use 32 is fine and I'm going to do do this for five times five ebooks now here is the result of my training let's try to understand what's happening here first you get this weird number 3750 well remember that I told you I'm using a 32 32 batches at the same time so if I have 20,000 images divided by 32 that means that every one of those have to go through 3750 times that's the number that you see here right it's the number of samples divided by your bat size that's how many times we're going to go through how many iterations we're going to run okay so during the first Epoch that's what we did it took 33 uh seconds the binary accuracy during the first pass was 89% that was the training accuracy then the loss was this uh what's really important here oh the validation accur was 96% and the loss was 01 what's important here is just to see this accuracy that's going up as you can see 98 99 that's cool and the validation accuracy is also going up it s of like stab stabilizes here so it doesn't go up after that that's fine I have a plotting here that's going to show us so for my five EPO you see the training accuracy going down the validation accuracy going down a little bit as well I might you know you might want to train this maybe another another ebook or so and see if that makes it better so now let's do the evaluation so for the evaluation the interesting thing comes now I promise for the evaluation I'm just going to use the test pairs uh same format that you saw and I'm just using the predict on my model on my siis network I'm using the predict function and then I'm going to be displaying the pairs and I'm going to be passing another parameter to the PA so it displays this beautiful green or or red if it makes a mistake uh so let's see what we got here all right so I run all of the test data and it's when I pass a seven and a seven the model is telling me yeah they are the same I pass a five and a five it's the same let's find one that's different I passed an eight and a nine and the model set is the same it's different the green color means it did not make a mistake uh you can get it to make mistakes so there are mistakes you're going to see now that I compute the accuracy right here and the accuracy is 97% so there are certain mistakes where the model thinks they're the same but they're actually different but it's pretty cool it's pretty darn good by doing it like that this is using the sis network with a pair of images but what I really wanted to do is just get the embeddings because now I can do interesting stuff with the embeddings so here is the trick not the trick but here you can see I'm displaying the list of layers of my sis Network and no surprise there is one input layer here there is another input layer here then there is the sequential model remember we have two input layers there is the sequential model and there is the Lambda function those are the four components that that make up this IMS Network the embeddings are the sequential model so the sequential model the output of the sequential model are the embeddings that we care about so I'm going to grab that sequential model I'm going to say my embedding model is the second the third layer so 0o one two the third layer of my big model that is going to be my embedding model and now I can use that embedding model just a portion I'm getting a section of the big siames Network a section just to generate embeddings okay I don't care about passing pairs I just care giving an image just give me the embedding it all right so let's see if this works uh so this code here is basically selecting three images and notice that the first image the the image represented by index one is a random image that represents number three the second image is a random image that represents number three and this is me just just testing this out okay so index one and index two they both represent the digit three and then index index three represents digit seven okay so these are the indices okay so awesome beautiful well actually I reexecute that and I got new values so these are the indices now I can use my embedding model to generate embeddings for those indices so look at this embedding one will be my embedding model. predict and I'm going to pass the image represented by that specific index so I'm going to generate embeddings for all three indices now this reshape here it's just because I need to create the format that the embedding model is expecting there's nothing fancy here I'm just changing how the the structure of the arrays look like so I'm going to execute this and now I have three embeddings that I generated with my model after creating these three embeddings I can now compute how far away they are so in this particular case I'm going to compute the distance between embedding number one and embedding number two and you can see that the distance is3 almost zero it's very close to zero because embedding one represents number three and embedding to represents number three and if I do the same by the way this is the old code so if I execute this is 04 even smaller if I do the same thing now with embedding one and embedding three so that's a three and a seven so this distance here should be large and embedding two and embedding three should be large let's execute this and now you can see a 0 99 meaning those two numbers are very far apart and 0 98 those two numbers are very far apart so hopefully this gives you an idea of how to use these embeddings there are a bunch of applications for anomal detection again for rack systems if you wanted to and obviously your large language models are not using such simple embeddings they're way more complex embeddings they're trying to sort of like condense the knowledge in the world into these multi-dimensional vectors but hopefully this small uh neural network gives you a better understanding of how you can create and use these embeddings and I'll see you in the next one bye

Info

Channel: Underfitted

Views: 6,923

Rating: undefined out of 5

Keywords: Machine learning, artificial intelligence, data science, software engineering, mlops, software, development, ML, AI

Id: GikIJpUv6oo

Channel Id: undefined

Length: 51min 43sec (3103 seconds)

Published: Mon May 27 2024