Machine Learning Models in Unity with Barracuda: Image Classification

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so i spent the better part of my weekend struggling to run machine learning models in unity with barracuda so i thought i would make this video to help you guys out maybe there's a couple things that i can tell you that might make things a little bit easier for you so this video is not going to be a tutorial but i'm just going to run through how i got an image classification model running in unity with their barracuda inference engine okay so first things first i'm just going to do like a high level overview of what i was going on here so um before i don't know if you ever seen my previous video where i was running machine learning models in unity we were using tensorflow sharp which was what uh unity's ml agent was running on okay since then they've created their own inference engine called barracuda this is now what ml agents runs on but as an extra feature to this you can also run other machine learning models in barracuda that were created outside of unity with nothing to do with unity's ml agents those models have to be in onyx format so onyx format is open neural network exchange it's just an open format for machine learning models now to get an onyx model you can convert tensorflow models pytorch models keras models all those to onyx format now you can't run any onyx model in unity you can only run what's under their supported architecture so here you can see tiny yolo for object detection mobile net image classifiers fully convolutional models fully dense models blah blah blah if you go to this onyx model zoo a lot of these models here you can run in unity but i only did a simple one for today because um i didn't want to deal with like complicated outputs like i dealt with um object detection bounding boxes and that's kind of a headache so i wanted to stick with something something simple for today so what i ran was this efficientnet lite4 so if we click here we can get more details about this model and we can download all the necessary files so here you can download the model and then if you go up into dependencies you'll also need the label map file okay so a couple of things to note here are the pre-processing steps and the post-processing steps now this particular model actually does a good job of laying out exactly what's required if you click on some of those other models in the model zoo you may have to do a little bit of investigating to find out exactly what they expect and how you can handle the output but for example here you'll notice that it wants a image in the dimensions 224 by 224 so notice that square dimensions which is not going to match any camera input that we're going to be able to give it so we're going to have to actually crop our camera image to square dimensions and then you can see here this line converts pixel values from 0 to 255 to negative one to one so you know that's not like naturally occurring pixel values in unity we're actually going to have to convert our picture values from whatever we use to negative one to one so if we head over to unity you can just drag that onyx model file in that you downloaded and drag in that label maps text file so if you click on the model file in the inspector you'll see that uh this is kind of like a like a netron it'll give you the input and output shapes and uh you know what it expects and everything so you can see here uh input tensorname is images and it wants uh 224x224 images with three channels okay so the first thing that i have here is a canvas with ui raw image and an aspect ratio fitter so if we go to this camera view script you can see that it just gets the webcam texture sets it to this raw image and then uh sets the aspect ratio of the fitter to that of the webcam texture okay so the next part of this project is this inference object this has a pre-processing script which we'll talk about in a minute and this classification script so this is what is going to actually run this classification model so if we open this up you'll see that first thing you need to do in the start function we will load the model file and then we'll create a worker with that model file the next thing we need to do is load those labels into memory so that we can query them later now the label file if you open it is not in a very unity-friendly format so there was not an easy way i could like use the json utility and load this or anything like that so i just split it by quotations and then i grabbed every other item in that split array and added it to the label map file so the result is we now have a string array of all 1000 labels now the next thing you'll notice is in the update function we grab the webcam texture from our camera view script and then every time the webcam texture updates we run a pre-processing script to get our image ready to run through this model now so if you remember before the pre-processing steps for this particular model was we needed to crop it to square dimensions 224 by 224 and then we need our pixel values to go from negative one to one so there are many ways in unity to like scale and crop an image none of which are super performant so that's why i made this pre-process script i think before in my older tensorflow start videos we did all of the pre-processing like on the cpu and it halted everything and like say you wanted to crop the image and then you wanted to scale the image you'd have to do like a texture dot apply in between there and that is going to cause some big performance hits so what's come out since then is this uh async gpu read back request so here what we can do is a graphic stop blit from the webcam texture to a render texture but we can give the render texture a scale and offset that will handle the cropping of the image and then when we create the render texture we can just give it a desired size which in our case is 224 by 224 and then the downsizing will occur when we do this graphics.blade as well and then we do this async gpu readback request which um when this is complete this gives us the pixels of the image which we can then uh deal with according to the specifications of our model so back here in our classification script you can see that the preprocess function has a callback which is run model so we get the pixels from that read back request and then we have to do our final pre-processing step which is doing this uh transformation of the pixels so so down here we have to convert our pixels from zero to 255 to negative one to one so if we go back to that um model you can see that when they do the conversion they are subtracting 127 from the pixels and then dividing by 128 so all we have to do is just do that for each pixel and then we have our input tensor for our model okay so then we just have to execute the model and we get our output tensor pretty much the way it works is there's a thousand labels in that text file the output tensor just gives you a thousand floats that are in the same order as that text file the higher the flow the higher the probability of that label being in the image that's all you really need to do to get this classification model running so if you can see here it works pretty well like we can get water bottle we can get wine bottle oh beer glass so there's one bottle there we go so yeah it works pretty well i think like the inference time seems to be pretty quick the frame rate uh running here is pretty good uh i just looked at the framerate and i was like oh my god 20 fps but it's cause i'm screen recording uh the one other thing that i did try to run in here and i was unsuccessful today was this um like emotions uh onyx model where it's supposed to detect like one of eight uh emotions of a person so it's this fer plus emotion recognition i tried to run this and honestly i spent like i don't know hours playing around with this today trying to figure out why it wasn't working and then when i ran this demo in my browser i noticed that the model itself just doesn't really work very well like it gets happiness but then i don't know i'm gonna look really stupid here but i can't get it to do anger i don't know i i can't get it to work very well um maybe somebody else can figure this out but i did put that in here if you look on this like there's this emotion script the transform input is a little bit different this one wanted you to have a single channel grayscale image so it's possible that i did this conversion wrong not really sure to be honest oh yeah so one thing i did forget to mention is that uh you can use compute shaders to do a lot of the stuff that we just did i did not do that in this video because compute shaders on android you can only run them with vulcan and uh air foundation cannot use vulkan yet so i don't i didn't think that it made sense to do that in the context of this youtube channel since we do mainly ar stuff but uh yeah if you guys want to do the next video with compute shaders we can do that otherwise i was thinking maybe we get object detection to run or some other interesting machine learning model or maybe um i can do some research and try to figure out how to export from tensorflow or pi torch to an onyx model that will actually run in unity and um help you guys out with that i don't know but let me know in the comments what you guys think and with that we'll see you in the next one goodbye

Info

Channel: Third Aurora

Views: 21,176

Rating: undefined out of 5

Keywords: augmented reality, artificial intelligence, mixed reality, technology

Id: LhzKfx2kuDs

Channel Id: undefined

Length: 8min 55sec (535 seconds)

Published: Tue Feb 09 2021