Image Classification with Neural Networks in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys welcome to this video in today's episode we're going to build an image classification script in Python using tensorflow and convolutional neural networks now convolutional neural networks are a type of neural networks that you use when you're dealing with image data or audio data or whenever you want to find patterns in data and in this particular tutorial we're going to use a Karass data set of many many images and these images have ten possible classifications so for example we can have images of planes of trucks of horses and so on and we're going to train our neural network to recognize these and then we're going to get some images from the internet and feed them into the neural network and see if the classification is right or not so let us get into the code so first things first we're going to need a couple of libraries here as always so let's run CMD or open up your terminal I'm going to activate my Kannada environment if you have one do this as well and then we're going to install a couple of libraries now the first one is we're going to install is a numpy so if you have it will say already satisfied in my case it says that because I have it installed otherwise it's going to install it also we're going to install a matplotlib because this is the library that we're going to use for visualization of the images we're also going to install or is it pip install tensorflow this is the library that we're going to use for the neural network stuff also installed in my case and last but not least we're going to install OpenCV so we're going to say pip install OpenCV - python that is of course also installed in my case already so we install these modules and then you're going to import them so we're going to start with import CV - as CV which is open CV you don't need to give it an alias I just like to do it you can then import numpy as NP then matplotlib dot pi plot as PLT and the last but not least we're going to import some things from tensorflow we're going to import from tensorflow Chara's we're going to import datasets layers and models so that we don't always have to type tensorflow top care uh stat models tends to float up Kara's the data sets and so on but we also don't have to import all the individual things that we're going to use so these are the things that were going to import and now we're going to get the data from the data set because the data set is already inside of Kara's and the only thing that we need to do is we need to call a load function and then store the data in tuples and training tube in a training tuple and a testing tuple in order to work with them so what we're going to do is we're going to say training images and training labels and one tuple and we're going to say testing images and testing labels in another tuple and the data set we're going to get from data sets dot cipher or I hope it's pronounced I for I don't know ten dot loads data and this load data function here returns training and testing data in this exact format so the images are just a race of pixels and the training labels are just the labels for example plane or bird or cat and whatever and that's basically it now what we're going to do next is we're going to normalize the data we're going to scale it down because the pixels are basically activated from 0 to 255 so you have an activation depending on how bright the individual pixel is you have a value from 0 to 255 now what we want to do here because it's just more convenient to work with we want to scale the data down so that all values are from 0 to 1 so in between 0 and 1 in order to do that since 0 is the minimum value and 255 is the maximum value the only thing we need to do is we need to say training images and testing images equals training images / 255 and testing images / 255 I hope this notation does not confuse you here are the only thing that we're doing is we're assigning the training images / 255 to the training images and the same for the testing images here so it's just a double assignment you could say that's how you get and how you prepare the data so what we're now going to do is we're going to define a class names list and then we're going to visualize 16 out of all those images from the data set so the first thing is we're going to say class names equals and here you can put in the classrooms because in the training and testing datasets we just have numbers as labels we don't have a label plane we don't have a label car we just have a label one for example what we need to do is we need to assign some names to these labels or to these numbers actually and we're going to do that with this particular list here and it's important that you keep the order here so you cannot just pick another order then I'm picking here because they are in the exact right order so the first thing that is in the labels data set is a plane the second one is a car the third one is a bird the fourth one is a cat then we have a deer then we have a doc then we have a frog then we have a horse then we have a ship and last but not least we have a truck so these are the objects that we will be able to classify so the neural network should actually be able to differentiate between a car and a truck or a horse and a deer and that is not as simple as it is for us humans because a horse and a deer might look pretty similar on an image depending on the image of course and you also need to take into account that the images that we're feeding into the neural network here are not very high resolution I don't know the exact resolution but I think it's something around 20 to 50 times 20 to 50 pixels I don't know like 28 times 28 or 30 times 30 something like that I don't know the exact resolution but it's pretty low and it's not that easy to recognize the individual parts you however this is the list and we're going to use it later on but now we're going to visualize 16 of the images so we're going to use a 4 times 4 grid just to get an overview of this data set looks like so we're going to see for image in range or actually this is an index so let's just call it I for I in range 16 we're going to define a subplot here with Matt Phillips of PLT dot subplot four four and I plus one this basically says that we have a four times four grid and with each iteration we're choosing one of those one of these places in the grid to place the next image so we're just iterating over these over this whole grid then we say P ot dot X tix nothing and PLT dot Y takes nothing so that we don't have any coordinate system that annoys us since we're not doing anything mathematical here and then we're just going to say PLT em show so we're going to show the image and we're going to show the training image with the index I so we're going to show the first 16 images and we're going to use a binary color map here so we're going to say see map equals PLT dot CM which stands for color map and binary and last but not least we're just going to say PT dot X label so below each image we're going to have a label and this label shall be the class name and which class name should be it should be the Train labels class names so basically of this particular index and we're going to pick 0 here's to the first element so basically what we're doing is we're getting the label of the particular image the number and then we're passing this number as the index for our class list so if the image label is 3 we're going to get cat since this is the fourth position but the third index so this is how we're going to do it and last but not least we're going to say PLT touch show and that should be it this should look fine takes a while there you go so as you can see the pictures are not very high-resolution here but you can see okay this is a frog we could see that as well this is a truck obviously I am not sure if I would have recognized this one as a truck to be honest this is a deer it could be anything this is a horse this is a bird and so on but as you can see it's not that easy to recognize these animals or vehicles on these images here at least not of all of them like this year could just be you know nothing but it's a deer okay so the neural network will have a quite difficult job in classifying all these images but we're going to make it happen and we're going to see pretty interesting results in yet now let us get started with building and training them all and the next step for that is an optional one but I'm going to do it it is reducing the amount of images that we're feeding into the neural network so I'm going to say training images equals training images up until 20,000 and then I'm going to say training labels equals training labels up until 20,000 and then we're going to say testing images equals testing images up until 4,000 and then we're going to say testing labels equals testing labels up until 4,000 so instead I'm training the neural network on the whole data what we're going to do is we're just going to pick the first 20,000 training examples and the first 4,000 testing examples we're doing this because it saves a lot of time and if your computer is not super fast training a neural network takes a lot of time so if you want a safe time or if you want to save resources you can just take the first 20,000 days should be decent should be enough to make them all quite accurate of course if you want to have the best accuracy possible the maximum accuracy what you can do is you can train it on the whole data set but it takes way longer to do that so if you have a lot of time or just a fast computer you ignore this step you just you know delete this code here but if you want to safe resources you can just say okay I want to train on the first 20,000 if have a super slow computer you can also just train on the first 5,000 of course the accuracy goes down the last examples you feed into the neural network but yeah you can save a lot of time here so now when you have the data prepared what we do is we build a neural network and for that of course we just start by defining the neural network as model equals model starts sequential so a basic sequential mall here and the next step is to define the input layer and the input layer here is immediately a convolutional layer so we're going to say model dot add and we're adding layers layers dot conf to conf to D and we have 32 neurons here and three 3s and as a convolution matrix here as a filter then we have the activation function off riilu if you don't know what an activation function is and you don't know what rectified linear unit is you can check out my neural network simply explain video on YouTube there I explain how neural networks basically work and what an activation function is and what it does and we have an input shape of sorry input shape of 32 times 32 times 3 and here you can see obviously that the resolution has to be 32 times 32 pixels and three kernel color channels so this is what we have here what the did I just did did I just do a semicolon in Python then we have the next layer which is a max pooling layer so every time you have a convolutional layer you have a max pooling 2d layer which simplifies are the result and reduces it to the essential information so what we're going to do is we're going to see max pooling oh sorry layers not layer max pooling 2d and he were just going to say 2 2 this is the filter and after that we're just going to add another convolutional layer so we're going to see layers dot Toodee and this time we're going to have 64 and again a filter or 3x3 an activation function of a really rectified linear unit then one more time we're going to add a max pooling later max max pooling 2d again two times to pool size and then we're going to add one more convolutional layer final one layers comm 2d and again it is 64 units three times three filter and the activation function is again just rectified linear unit really and that's the last convolutional layer then we're going to flatten whatever we get out of this so we're going to say okay mall dot add layers dot flatten which is basically just flattening the input making it one dimensional so if you have a ten times ten matrix you just make it into a hundred into a straight a layer of 100 units so basically just a flattened layer and after that we're going to add to dense layers so one dense layer layers dot dense with 64 units and the activation function is really again and then we have the output layer the final output layer that we'll have of course just ten units so we're going to say layers dot denson we have ten units for ten possible classifications and the activation function here is called soft max soft max just scales all the results so that they it add up to one so if you have I don't know 50 60 and 70 it will just scale them so that they are percentages and adding up to one so that you get a distribution of probabilities so that you know how likely is it that one particular answer one particular classification is the case so this is the full model here let me repeat one more time we have a convolutional layer followed by max pooling convolutional maxpro n convolutional basically a convolutional layer filters for features in an image so it will look at okay a horse has long legs whereas a cat has pointy ears or something like that an airplane has wings and a truck is bigger than a car for example it filters out all these features a max pooling layer then reduces the image to the essential information then again we use a convolutional layer to process the result and we do this over and over again and yet we flatten the result we put one more dense layer of complexity in between the flatten layer and the dense layer which is the output layer and in the output layer we just scale the results so did we get percentages or probabilities for the individual classifications then what we need to do is we need to compile them all as always so we're going to say model compile as an optimizer optimizer as always we're going to pick Adam then we're going to define a loss function and we're also going to pick the one with we almost always pick which is sparse categorical cross entropy and then we're going to define the metrics that we're interested in and we're interested in accuracy only so we want to know how accurate you rusty how accurate is Dimauro and then last but not least we can go ahead and fit them all so we're going to say mall that fit on the training images with the training data training label sorry training labels and we're going to run ten epochs so epochs basically say how often is the mall going to see the same data over and over again if we specify ten epochs that means that the model is going to see the same data ten times and then we're going to say validation data validation data is the training know sorry testing images and testing labels so we're going to validate them all with that let us take a look at this and just see if it runs always takes a time because tensorflow needs to be imported of course we have the virtualization here and as you can see it's start training you can see epoch one is running you can see it's training on 20,000 examples here and this is the accuracy of classification right now so it's around 30% which is not that bad considering that you have ten possible guesses you could say or ten possible classes and if you were just guessing you would have a 10% chance of being right so it has some you know knowledge in there now it's almost 50% so we're going to stop right here because it doesn't make sense to fit them all all the time but as you can see this is how you build the neural network and we can train it and we will train it in a second what we now want to do before we run the script and train them all is we want to test them all we want to evaluate them all and then also we want to save them all so that we don't need to train it every time we run the script you want to train it once save it and then just load it which is way more efficient than wasting all the resources and all the time for training them all over and over again so after the fit we're going to same model or actually we need some return values we're going to say loss and accuracy equals model dot evaluate on the testing image images and out of testing labels there you go then we're going of course to print these values I think lost accuracy was the right order I'm not sure but I think it should be lost in accuracy so let's just use a strings here loss equals loss and [Music] accuracy accuracy is the accuracy of course then we have these two metrics the loss is basically just a numerical value indicating how wrong our mole is how how much off it is from the ideal result and the accuracy is just how many or how much percent of the testing examples were classified correctly and after that we're just going to say Mall dot safe and we're going to save it in image classifier model and later on what we can do is we can say okay model dot or actually models dot load model and we can use the same string here to say name to just load them all and we can assign it to the model instead of training it over and over again so that should be it we can now run the model or actually we're on the script and train them all so it always takes some time because we need to import tensorflow and that's not that fast and now as you can see it trains them all and I'm going to skip the step and get back to you once the model is straight so now the training is done and as you can see we have some metrics down here we can see the loss is one point zero one I mean it doesn't say a lot because the loss is more like a metric for the computer as I talked about in the neural network simply explained video but we can see that the accuracy is 66% and actually you could consider 66% to be quite low but considering that there are ten possible classifications in the neural network doesn't even know what a horse is or a plane is sixty-six percent is pretty good for a neural network so it's a pretty decent accuracy you maybe get a high one if you use all the training data and testing data yeah you can see here that the image classifier model is up here stored in the same directory so what we're now going to do is we're just going to get rid of actually I think all of this here yes and we're going to say model equals models dot load model and we're going to say image classifier dot model so we're just going to load the MALDI for just trained and now we're getting into the interesting part we're now going to take images from the internet random images off horses or trucks or planes and we're going to classify them with this model so for the purposes of this video I'm going to use pixabay since there we can find license free images and I don't want to break any laws on my YouTube channel here so I'm going to use pixabay and we can start out by looking for images of horses or off a horse we should find just a single horse here maybe an image with less distractions oh this is a good one this is obviously a horse so we can download it 644 to save it to the desktop horse dot jpg then we can look for a plain hope we find one ah there's a pretty clear image of a plane so we're going to download this one is well lane dot J pack and then let's pick one off a car you [Music] that is one that is actually the one that I used in my book on computer vision so let's pick this one card dot jpg and last but not least let's pick I don't know a deer just to see the difference between a deer and a horse um yeah that's a good one so go to three downloads there you go deer dot jpg there you go we now have all these images on a desktop but of course they are not in the right resolution so what we need to do is we need to open them with some kind of you know image editing tool I'm going to use here and what we're going to do is would just go to say we're just going to say we have to scale it to 32 times 32 pixels you can cut it you can scale it whatever you want I'm going to cut off some parts of the image here so that it's a little bit more quadratic so I'm going to say crop to selection then we're going to say canvas size no scale image I think that's it and we're going to scale it to 32 times 32 scale and this is the image it's really a clearly recognizable as a horse so we shouldn't have too much problems here too many problems here then let's load all of the other images as well and do the same thing shouldn't be that hard oh just one we can actually I think we could actually scale that immediately but let's cut off some of them parts left and right image crop selection then we're going to say again where's it scale image 32 times 32 scale BAM there you go all right going to very fast for the deer right now this is a good image of a deer image crop to selection image scale image 32 times 32 scale over right and that's been placed a car pick this car here crop to selection scale image 3232 I'm showing you the full process here that's it and this is how we prepare the images we now just need to load them give me to click export every time party now dunkey the plane didn't work for the plane something went wrong what ran what went wrong actually over right dear that JPEG export right plane export now it should work there you go we now have prepared the images and we need to load them into the pycharm directory here so we're moving them here and that's it we prepared the images we scaled them down and now we're going to feed them into the Train model now we're going to use open CV and numpy in order to load the images into the script make a prediction and then visualize that prediction or at least print that prediction out onto the screen so the first thing we need to do is we need to load the image by saying CV dot M reads ah and let us just start with the horse dot jpg file here and the next step is very important because when you load images with open CV when you use the M read function you load them in a BG our color scheme so blue green red however up until now in this project we've only worked with RGB color schemes or with the RGB color scheme so all the images that we train them all on used RGB and also matplotlib when visualizing the data or visualizing the images is going to use RGB so what we need to do is we just need to convert the color scheme and basically swap blue and red values so we're going to say CV dot convert color of the image and we're going to use C V dot color from BG r to RGB that's it and now I can go ahead and show it so PLT M show image and the C map is PLT dot other map dot finally this is just visualizing the image and now what we're going to do is we're going to add a prediction here so we're going to say prediction is model dot predicts and here we need to pass an umpire right because our model has a certain structure that it takes and what we need to do here is we need to pass the image in a numpy array so we're just going to say NP array and in the list we're going to pass the image and very important since in the beginning we divided all the values by 255 255 to scale them between 0 & 1 what we need to do is when to do this with every input as well so we're going to divide all the pixels by 255 here and what we get out of this prediction here is we get 10 activations of the 10 softmax neurons and what we want to have is we want to have the maximum value because this is the most likely prediction here and we want to have the index of that prediction so what we're going to do is we're going to say index equals R NP arc max of the prediction and arc makes what it does is it gives us the index of the maximum value so one of these neurons will have the highest activation and with arc max we get the index of this activation or of this neuron and this index of course is the same index that we need for the class name so what we do is we say print F string prediction is and here we just say class names index that's it and it should work pretty well I hope the classification is right let's see should say horse hopefully of course we can then remove this line prediction is hoarse it worked for the first image let's let's just comment out or actually that's just elitist part of the code here so the first prediction was correctly it was a horse then let's try the plane shouldn't be that hard we need to say I of course use pretty clear images yeah prediction is plain then let's go for the car actually it's not showing the prediction not the prediction but the image so we might need to add a PLT show but let's do it after the prediction so now we should get a car you prediction is cars so up until now we have 100% so last but not least try the deer but I guess it will also be correct here we get probably misclassifications with images that are not that obvious or not that clear but in this case it says it's a deer so it's perfectly fine and as I can see the model works you can play around with it further if you want you can just go ahead and you know take thousand images and see what you get as a result you can do whatever you want with it but as you can see it works pretty well with obvious examples with a little bit more tricky examples with an animal that looks a little bit like a deer and a horse at the same time maybe we don't get 100% accuracy but the results are quite impressive here so that's it for today's video I hope you learned something and I hope you enjoyed it if so let me know by hitting the like button and supporting this channel also feel free to ask questions and give feedback in a comment section down below and if you haven't done it yet subscribe to this channel to see more future videos for free other than that thank you very much for watching and see you in the next video bye [Music] you
Info
Channel: NeuralNine
Views: 168,616
Rating: undefined out of 5
Keywords: programming, python, neural networks, neural network, artificial intelligence, machine learning, data science, computer vision, tensorflow, tutorial, coding
Id: t0EzVCvQjGE
Channel Id: undefined
Length: 31min 53sec (1913 seconds)
Published: Wed May 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.