Machine Learning with Synthetic Data | @Unity + @TensorFlow

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

As great as the rest of the video was, what phone and software was he using to get 3D smartphone scans of that quality??

👍︎︎ 1 👤︎︎ u/Dziar 📅︎︎ Apr 27 2020 🗫︎ replies
Captions
okay so first we 3d scan the object throw it into unity and click Play unity labels everything automatically so we don't have to do anything by hand and then we train for eight or ten hours haha it works yes I wanted to play around with running machine learning models in unity again but I know if I do that I'm gonna want to train my own models now training your own model sucks because collecting data is a nightmare and having the label stuff by hand will make you want to kill yourself sick only 3,000 more these to go so the idea was synthetic data is you have a computer generate and label all the training data for you video game engines are perfect for this and unities ml agents already does this now the main thing holding back my ml projects is the fact that well I don't know about machine learning but the main thing holding back most other people's projects people that do know what they're doing is the fact that they don't have good data and it's really hard to come by now you know all the big companies like Apple Google Facebook they have all the data and if they don't have the data they have the resources to create that data you've probably heard the saying that data is the new oil now without getting too political you can see how synthetic data creation could help democratize technological advancement anyway in my case I want to train my own object detector so I'm going to use unity to generate and label all my images for me and then pass those to tensorflow for training and inference in Python so I want to be able to detect these Pokemon here but I'm gonna need a 3d model alright so here's a couple they aren't exactly the same but I don't know let's roll with them for now now we need bounding boxes so we can just use unities built-in function to get the bounds of the mesh renderer in the image piece of cake what so if we plot the 3d points of the renderers bounds you can see that they are with respect to the object and not the camera that's viewing the object 5 this took a minute to figure out and it's not performant in any way shape or form but what I ended up doing was getting all the vertices of the object converting each point from world space to screen space and then finding the max and min a both the X and the y now we need to convert these boxes to a format that tensorflow can deal with so tensorflow needs everything in this TF record format that uses protocol buffers I have no idea what that is and I don't even want to know but they do give you some nice examples to work with so from unity we can take our pictures and write all that info to a file now these files will have the picture number window size the objects label and box coordinates then in Python we can use the TF record examples to write a script that parses our text file from unity and creates a TF record file from our synthetic training data beautiful now we can split our images into two folders with 20% of the total images into a test folder and the remaining images into a train folder the last file we need is a label map file it just Maps our objects labels to an ID for this I just loop through each object parent and find all the children with unique names and add them to a list and we write that to a file in the format that tensorflow wants now I trained an object detector once before to recognize some medical objects like 3 years ago and the only thing I remember about that project was that I could not get it to work at all until I took pictures with a ton of different background images I remember my first idea to get images was to take a video of the objects and then I made a Python script that just took frames out of the video this did not work even a little bit eventually I wised up and I took pictures with all different backgrounds and then it worked so in this case I used phat combate downloader to download a bunch of indoor images from Google and put them into unity I can loop through this folder and change the background every time unity takes a picture so at this point I have everything I need technically so the general consensus on Google seems to be that you need like a thousand images of each object and you need to train for like 20,000 steps which is gonna take like I don't know 10 hours all right here we go it started alright this is fine we can handle this it won't be too long [Music] [Music] all right let's see we're at 2,000 steps what the [Music] [Music] all right moment of truth 27 thousand steps Wow literally nothing awesome honestly I'm not even upset right now because this is just how all my projects go so I really have no idea what the hell is going on here but my intuition is telling me that our 3d model is not similar enough to these toys that we're trying to track so I found this photo grandma tree app that lets you scan an object and generate a 3d model I scanned all the objects and threw them into unity okay let's try this again [Music] [Music] okay well this this still sucks but at least it's working a little bit so we're on the right track alright so now we need to do some heavy experimentation and tweaking to see what gives us the best results I made this interface called I change bullet that has one function called change random every time we take a picture in unity we loop through all the objects in the scene and if they implement this interface we call this change random function so I made a bunch of scripts that just randomize everything one randomly changes the position of the objects in the image one changes lighting one changes the color of the object you get the idea I'm not exaggerating when I say I generated new data and trained this thing every single day for more than a month it just became part of my daily routine generate new images trained up I go to bed wake up test nothing works make some changes repeat I tried randomizing everything using different shaders to modify color changing the number of images the number of training steps I tried everything honestly the real reason I even decided to try this project was because I wanted to see if it was possible to make a robust detector from a single 3d object so currently were 3d scanning this object so it's only ever going to detect this exact object which it's pretty much just a waste of time because there are other better ways of doing this I originally thought if I could modify this mesh data randomly every picture that it might trick the model into detecting not just this Pikachu for example but maybe all different types of Pikachu's that exist so I played around with this a lot and I tried to deform the mesh in all different types of ways but I really never had any success so if anyone has any ideas on how to achieve something like this definitely let me know down in the comments or shoot me an email any way I can at least tell you from my experiments what did work first of all having a ton of different background images definitely helped so I ended up using 2500 background images and generating 5000 training images with 20% used for testing it was also very important to change the position of the object in the image each time as well as having multiple objects in each image at the same time that was really important I also scanned each object twice so each Pokemon had two different versions of a scan I ended up changing each object's rotation randomly but only by like 20 degrees in each direction any more than that gave me poor results I also did not use a directional light but instead I just used ambient seen light and I modified that intensity randomly for every picture the last thing that seemed to help a lot was changing the unity window size before tasting taking a picture so that all your images aren't the same size that way you can run inference on different devices different camera sizes and that shouldn't have an effect now there's a ton of other things I wanted to try with this but to actually see what works you can realistically only make one change at a time now each change you make takes about a day due to training time so you can see how absolutely frustrating this project was for me and why I needed to just make this video and get this project out of my life as quick as possible I just really don't know enough about machine learning and this project literally drove me crazy ultimately I can only get really the Bulbasaur and the squirtle to track well the Pikachu works sometimes but the Charmander just never works at all this project is a bit of a headache to even get it running so I'm gonna run through all the steps now in case anyone wants to try it out so first of all you need a good GPU to get this training I have trained an object detector with the Google cloud platform before but it was kind of a nightmare to get it set up and working so don't even try this on your Mac with tensorflow CPU the training will take days here I was using tensorflow GPU on this Windows machine that has a gtx 1070 even with this thing 20 to 20,000 training steps took like 10 hours so if you aren't scared yet let's try to get this running on your machine now I'll put all the links to everything I talked about down in the description below it's really important that you try to stick with the same versions I used because I ran into tons of compatibility issues with this project and it was an absolute nightmare so first of all make sure you're using python 3.5 64 bit install pip if you don't already have it and now you need tensorflow installed on your machine and also the tensorflow models directory you can follow the instructions to all that from here it might be a good idea to use anaconda in a virtual environment in addition to tensorflow there's a bunch of other dependencies that you need but i think you will want to stick with some tensorflow version before 2.0 personally I use tensorflow 1.8 and I installed it by running pip install tensorflow GP one point 8.0 I then got the models directory for tensorflow 1.8 from here to make sure you have open CV installed because we will use that to run inference on the model once it's done training now I installed it with pip install OpenCV Python once you have everything installed CD into the models research and run setup I build and setup I install CV into slim and run pip install - a dot now grab my synthetic data project on github and we have to create some data so first you need some background images so install the fat khont batch download Chrome extension and download like a thousand images make sure to go into your settings and turn off a square to save each file and put them in a folder called textures with the capital T and then drag that into your resources folder now either download a 3d model of what you want to track or create one I use this app called trineo you put the models that you want to use as a child of objects to train and position them in front of the camera add a change transform script and an object bound script drag in the custom GUI skin from the GUI folder open up the take picture script and change the total images to two thousand or something along those lines and then go back to the root folder and open up the TF utils folder in a text editor here we need to just change the number of classes that you have in the test detection pie script and then in the cocoa config script now go back to unity and hit play to generate all the training data now once that's done take your unity stuff folder from your streaming assets and put it into models flash research open up the research folder in a text editor and go to unity stuff slash the TF utils and open the object detection notes CD back into models research and run the pro talk line and the set Python path line now CB into unity stuff / TF utils and run create TF record pie now copy the trained PI command and run that now we wait for like 20 thousand steps or so once you've trained your model for long enough go to trained output and find the checkpoint number that you want and put that number in the export inference graph line and run that finally we can run test detection pie all right so that's it that's all I got for today if you want to learn more about this type of stuff definitely check out jameson tools article on heartbeat fritz AI where he used unity to generate synthetic data to track coke cans and also check out adam kelly's project on immersive limit where he used synthetic data to detect weeds in his yard so yeah let me know what you guys want to see in the next one down in the comments below and with that we'll see you in the next video goodbye [Music] [Music]
Info
Channel: MatthewHallberg
Views: 30,456
Rating: undefined out of 5
Keywords: machine learning, augmented reality, tensorflow, object detection, machine learning unity, unity ml agents, AI, AI unity
Id: lHXcYhyhHGA
Channel Id: undefined
Length: 13min 31sec (811 seconds)
Published: Mon Apr 27 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.