Hey everyone! In this video we'll learn how to
train a TensorFlow Lite object detection model on our own dataset. We'll walk through
the process of preparing training data, training the model, and exporting it all
using Google's free servers inside Google Colab. By the end of this video you'll
have a fully trained lightweight object detection model that you can run on computers, a
Raspberry Pi, cell phones or other edge devices. I'll train a coin detection model as an example
for you to follow along with. This model can be used in a change counter application that
tells you the total value of change in an image. I'll provide the dataset and sample code
for this application but you can also use your own dataset to train an entirely different
model. This video walks through a Google Colab notebook I wrote for training models. All
you need to do is open the notebook in your web browser to follow along. Click the link in the
video description below and let's get started! Colab is a free Google service that allows you
to write and run Python code through your web browser. It connects to a virtual machine on
Google servers that's complete with a Linux OS, a file system, Python environment, and best of
all a free GPU. We'll upload our training data to this Colab session and use it to train our
model. Click the connect button to initialize the environment. Make sure you're using a
GPU-equipped machine by going to Runtime and Change Runtime Type and making sure GPU is
selected in the Hardware accelerator drop down. The first step in training a machine
learning model is to create a dataset. We need to gather and label at least 200
images to use for training the model. If you don't want to gather images yet and just
want to practice training a model, you can skip this step for now and download my coin training
images in Step 3. Building a good image data set is the most important part of training a model.
I made a YouTube video that gives step-by-step instructions on how to gather images and label
them using an annotation program called LabelImg. The video also shows data set tips and best
practices that will help improve your model's accuracy. Go check it out! To gather images
use a phone or webcam to take pictures of your objects with a variety of backgrounds and
lighting conditions. For my coin detector I took pictures with my phone and also set up a fixture
to take pictures with my Raspberry Pi camera. You can also use images you find online but
I recommend taking your own pictures because it usually results in better accuracy for
your application. Once you've gathered about 200 images use an annotation program called
LabelImg to draw bounding boxes around each object in each image. Again, my other YouTube
video will walk you through how to do this. When you're done gathering and labeling images,
you should have a folder full of images... ...and an annotation file for each image. Create a zip folder called "images" and
add all of the images and annotation data into that zip folder. We'll upload this
to the Colab session after the next step. Okay the hard part's over! Now we can let
the Colab notebook do the rest of the work. First, we'll install the TensorFlow Object
Detection API inside this Colab. The API contains scripts and libraries that we'll use for
training the model. Click the Play button - oops! So when you click the play button you'll get this
warning about it not being authored by Google. Go ahead and just click "Run anyway". Then, click the
play button on these first four blocks of code. Allow each block to execute. It'll take
several minutes to get everything installed. If you see any errors or warnings related
to package dependencies or requests to restart the runtime, you can just ignore them. We'll verify the API installed correctly
by clicking play on the next code block, letting it execute, and verifying it says
model built successfully when finished. Okay! It says it ran okay, so we're good to
go. If you get errors go to the Common Errors section at the bottom of the notebook to see
how to resolve them. You can also comment on this video or send me a tweet at Twitter at
@EdjeElectronics and I'll see if I can help. We need to transfer our training images onto the
Colab virtual machine. There's a few options for doing this. The easiest way is to just upload
your images folder through Google Colab. Click the folder icon and then drag your
"images.zip" folder into the sidebar. It'll upload the images directly into the Colab
file system. It may take a while depending on your internet speed. This little orange
circle shows the progress of the upload, so it may take a while. Keep in mind
that the images will be deleted if the Colab session disconnects, so you'll need
to re-upload them each time you restart. Another option is to upload the "images.zip"
folder to Google Drive and then link it to the Colab file system. Read the instructions here to
see how to do so. The nice thing about using Drive is you don't need to re-upload your zip folder
every time you use this Colab. If you have a slow internet connection or if your data set is more
than 100 megabytes, this will save a lot of time. Finally, as a third option, you can
also just use my coin dataset and practice training a model with that.
I've uploaded 750 labeled coin images to Dropbox. Download them into the Colab
file system by clicking Play on this block. These coins are United States currency, but I
know I have a lot of international viewers on this channel. Keep an eye out for new download
links to coin data sets from other countries. At this point, whether you used option one,
two, or three, you should be able to click the folder icon and see your "images.zip"
file. Now that the data set is uploaded, let's unzip it and create some folders to hold the
images. Click on the next code block to unzip the images and split them into train, validation,
and test folders. Each of these image sets have a different purpose. The "train" images are
used for the actual training of the model, the "validation" images are used to periodically
check progress during training, and the "test" images are used by us at the end of training to
visually check how accurate the model is. I wrote a Python script to randomly move 80% of the images
to the train folder, 10% to the validation folder, and 10% to the test folder. Click Play on this
code block to download and run the script. Now, when you check the file list, you
should see an images folder with train, validation, and test folders
that each have images in them. Next we need to convert the data set into
TFRecords, which is a data format used by TensorFlow. We'll use Python scripts to do the
conversion, but first we have to define a label map for our model. A label map is a simple text
file with a list of classes that you want your model to detect. We can create this text file in
Colab using the command in the next code block. Replace "class1", "class2", and "class3" with the
classes you used when you labeled your images. For example, for my change counter model, I'll
put "penny", "nickel", "dime", and "quarter". Make sure you spell the classes correctly. Once
you've listed your classes, click Play on the code block. A labelmap.txt file will appear in your
list of files in the content folder. It's just a basic text file with your list of classes.
Here's what it looks like if you download it. With the label map defined, we
can create the TFRecords. Download and run the conversion scripts
using the next two code blocks. The scripts will also create
a labelmap.pbtxt file which contains the label map in a different
format that's needed by TensorFlow. Finally, click Play on the
next code block to store the paths to the TFRecords and label map.
We'll use them later in this Colab. Okay! The next step is to set up training
configuration for our model. We'll select the model from the TensorFlow Model Zoo,
which has a list of models that we can fine tune on our own dataset. The models
have varying levels of speed and accuracy. I wrote a blog post comparing performance
of several models from the Model Zoo. It shows the accuracy each model achieves when it's
trained on my coin data set and the FPS they run at on the Raspberry Pi 4. Go check it out to see
which model will work best for your application. I set up this notebook to make it easy to switch
between models for training. You can select which model you want to train by changing the text in
the "chosen_model" variable to match one of the options below for this video. I'll select the
"ssd-mobilenet-v2-fpnlite-320" model. Feel free to try one of the other models. Click Play on
the code block once you've made your selection. Next, we'll download the pre-trained
weights and pipeline configuration files for the selected model. Click Play
on this code block to download them. The pipeline configuration file sets all of
the parameters for training the model. Two key parameters are "num_steps" and "batch_size".
"num_steps" defines the total number of steps to use for training the model, and "batch_size" sets
the number of images to use in each training step. For good numbers to start
with, let's set "num_steps" 40,000 and "batch_size" to 16. During training,
if you see that the model hasn't converged within 40,000 steps, you can increase "num_steps"
to a higher value and try again. The next code block defines other
information for the config file, like the file path to the pre-trained
model files. Run it and confirm that it prints the correct number
of classes for your detector. Next, we'll rewrite the pipeline configuration
file to use parameters that we just specified. This code block will go into the config file
and override it with the necessary information, such as number of classes, batch size, path to the
training files, and so on. Click Play to run it. If you're curious, you can check the
config file's contents by clicking this next block. It displays
the full file in the browser. The file contains all the configuration
parameters that are used for training. Finally, click Play on the next code block to set the path to the pipeline file
and model training directory. All right! We're ready to start
training our object detection model. Before we start training, we can start up a
TensorBoard session to monitor training progress. Click Play on this code block, give it a few
seconds, and a TensorBoard interface will appear. It won't show anything yet, because
we haven't started training. We'll use the "model_main_tf2.py" script
from the TensorFlow Object Detection API for training. The script parses the
configuration file, loads the model and dataset, and then starts training
the model. Click Play to begin training. The program will initialize and display some
log messages. Once it's done initializing, it will start displaying training messages
every 100 steps. It takes a while, so if it seems like nothing is happening, just wait a
couple minutes. If you encounter any errors, please visit the Common Errors section
at the bottom of this notebook. Training takes anywhere from two to
six hours, depending on the model, number of training steps, and batch size
you're using. Now, let's let it train for a while. You can minimize the window and
work on something else while it's training. Okay, our model has been training for about two
and a half hours, and we've trained for about 29,000 steps. Let's go back up and check
TensorBoard to see how training is going. Click refresh to update the interface. TensorBoard has several graphs that
show the model's overall loss over time. You want to look at the total loss graph.
As the model trains, the overall loss will decrease. We should keep training the
model until the loss stops decreasing. It looks like the loss is still going down just a
little bit, so let's keep on training. If you do want to stop training early, click the Stop button
or right-click and select "Interrupt execution". Otherwise, training will stop automatically
once it reaches the number of steps we specified earlier in the "num_steps"
variable. In this case, it's 40,000. Make sure to be present when training stops,
because if the session is idle for about 15 minutes, Colab will disconnect and delete
your runtime, and you'll have to start over. Okay our model has been trained!
Now that training is done, we need to convert our model
to a TensorFlow Lite format. Run this first code block to freeze your
model graph in a TFLite compatible format. When that's done, run the next code block
to convert the graph to a TFLite file. The resulting .tflite file contains
the neural network and weights of your object detection model in
an optimized FlatBuffer format. Our custom model has been trained and converted
to TFLite format, but how well does it actually perform at detecting objects in images? Let's use
it on the images in the test folder to visualize how accurate it is. Click Play on the next block
to define a function to load the model, run it on each image, and display the result. The code is
based off the "TFLite_detection_image.py" script from my GitHub repository, so feel free to use
it as a starting point for your own application. The next block lets you set the confidence
threshold and number of images to test. Click Play on this block to start inferencing. The inference results from each image will
display in the browser. The results should give you a sense of how well your model
actually performs at detecting objects in new images. In my case, the change counter
model is mostly accurate ... but let's see if we can find any that it incorrectly detects
coins in. It's still looking pretty good. Here we go. In this one it incorrectly identified
a nickel as a dime. But, it's really doing pretty well. So when you run this with your own model,
if you don't see any results drawn in the test images, go ahead and go back up and change
"min_conf_threshold" to 0.01, which is basically the minimum it can be, and run the block again.
You should see some boxes drawn on your images. Next, we can quantitatively measure the
model's performance by calculating the model's mAP on the test images. The higher
the mAP score, the more accurate the model is. To learn more about how accuracy is
measured for object detection models, check out this insightful article
from RoboFlow that explains mAP. We'll use an mAP calculator script from
another GitHub repository to determine our model's mAP score. Run the first block to clone
the repository, remove the existing sample data from the repository, and then download a script
that I wrote for interfacing with the calculator. Then, run the next script to copy the images
and annotation data from our test folder to the appropriate folders in the repository.
These will be used as the ground truth data that our model's detection results are compared
to. Click Play on the next block to convert our annotation data to the format that's expected by
the calculator tool. Next, we'll reuse the same inferencing function from the previous step to
run our model on every image in the test folder. Unlike last time, it'll just save a .txt file
with a predicted bounding box data for each image, rather than displaying the result. Go
ahead and click Play to run the code block. Now that we have detection results and ground
truth data to compare them to, we can calculate mAP. Click Play on this last code block to run
my script for calculating the COCO mAP metric. The final score reported is your model's
overall mAP score. Ideally it should be above 50%. If it isn't, you can increase your model's
accuracy by adding more images to your dataset. See my dataset video for tips and tricks on how to
capture good training images and improve accuracy. Now that your custom model has been
trained and converted to TFLite format, it's ready to be downloaded and deployed in an
application. Run the next two cells to copy the model and label map files into a folder, zip
the folder, and download it to your computer. The "custom_model_lite.zip" file containing the model will be downloaded
into your Downloads folder. Okay, so now that we've downloaded our
trained TFLite model what can we do with it? Well, TensorFlow Lite models are great for running
on a wide variety of hardware including PCs, embedded systems, Raspberry Pis, phones, and
whatever other edge device you can think of. This section of the notebook provides links to
instructions for running your model on various devices including the Raspberry Pi, Android
phones, or Windows, Linux, or Mac OS computers. I'll update this section with instructions
for other devices as I write them. As you guys know, I love Raspberry Pis,
so in this video, I'll be showing how to deploy your model on a Raspberry Pi.
TFLite models are great for running on the Raspberry Pi because they require less
compute power than regular TensorFlow models. The quantized SSD-MobileNet-FPNLite model
runs at about 2.6 FPS on my Raspberry Pi 4. To run your model on the Raspberry Pi, first
you need to install TensorFlow Lite and prepare a Python environment for your application. Check
out my TensorFlow Lite on the Raspberry Pi video for step-by-step instructions on how to set
it up. It only takes about 20 minutes to step through the whole process. If you'd rather just
run it on your PC, follow the instructions on my Windows TensorFlow Lite setup guide. I'll
be writing a guide for Mac OS and Linux too. Once you've got TFLite set up on your Raspberry
Pi, move the "custom_model_lite.zip" file that you downloaded from Colab over to your
Pi. You can upload it to Google Drive, or use a USB thumb drive, or do whatever
your favorite file transfer method is. I'll copy my model onto a USB drive and
then fly it over to my Raspberry Pi. This Pi has already been set up with
TensorFlow Lite, and has a folder called "tflite1" that holds all the scripts
and model files for running detection. Move the "custom_model_lite.zip"
file into the tflite1 folder. Then open a terminal, issue "cd tflite1"
and then "unzip custom_model_lite.zip". Before we run the detection script, let's activate the virtual environment by
issuing "source tflite1-env/bin/activate". Now, you can run the TFLite scripts with your
model by using the "--modeldir" argument. For example, to run the webcam detection script, issue "python TFLite_detection_webcam.py
--modeldir=custom_model_lite". A window will appear showing
a live feed from your webcam with boxes drawn around objects of interest. You can press 'q' to quit. Finally, if you trained a coin
detection model and want to try it out with my example change counter application, issue "python examples/ChangeCounter.py
--modeldir=custom_model_lite". Point the camera at coins on a surface.
I built a cool camera stand out of K'nex, but you can also just hold it above the
table with your hand. The program should identify each coin and calculate the
total value of all the coins it sees. This is just one example of the many cool
applications you can create using computer vision and machine learning. Check out my
website and YouTube channel for other examples. You can squeeze some more performance out of
your model using a compression technique called quantization. Step 9 of this notebook shows how
you can quantize a model with TensorFlow and recalculate its accuracy. If you have a Coral
USB Accelerator, the notebook also shows how to compile your model for EdgeTPU. I'll release a
follow-up video walking through these steps soon. Okay, you did it! At this point you should have
a fully trained TensorFlow Lite model running on your Raspberry Pi or other edge device.
There's a ton of different ways you can use AI-powered computer vision to solve everyday
problems. I'll keep making videos and examples showing how to build cool programs that use
object detection. Stay tuned for more videos. In the meantime, if you wind up making
a cool application with TensorFlow Lite, you can comment here or tweet it to me on Twitter
and I'll share it with the rest of my followers. Thanks so much for watching this
video, and I hope it was helpful. As always good luck with your
projects, and I'll see you next time.