Salient Object Detection with SnapML | Pytorch to Onnx Conversion

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay today we're going to talk about running the salient object detection model on a mobile device we're going to use this use squarednet model because i have a light version that's only 4.7 megabytes this model finds the most salient or the most noticeable or important object in an image here is a little copy and paste demo i made with it and here's me attempting to run it in real time with a distortion shader so in the last video we ran an onyx model in unity with barracuda and it was a little complicated to get the right inputs and outputs even for a simple classification model so today we're going to go through the process of converting the saliency model from pi torch to onyx and then running it in snapchat's lens studio with snapml they make it super easy to deal with inputs and outputs even i was able to figure it out and i don't know anything about lens studio or machine learning for that matter so yeah let's look into that process i'm sure you've seen this copy and paste app with photoshop by cyril i always thought this was incredible and when you look at his github it looks like he was using this ml model called basnet for removing the background of objects in the camera image well the same people came out with a new model called u-squared net this model has a light version that's really small and i thought maybe we could run this on a mobile device what immediately comes to mind with this model is star wars holographic video so if it runs fast enough this might be possible anyway let's download this use squarednet github repo and the pre-trained weights and test it out in the readme you can see all the required libraries so i made a new anaconda environment and i installed all the proper dependencies so now i can activate the environment and open u2 net test dot pi and let's change the model name to u2 net p and run that test if we go to test data and test images we can see that the input images are all three channel rgb images if we go to u2 net results we can see the output of the model is also an image but these are all single channel black and white images that are masks of the most salient object so the model takes an image as an input and outputs an image mask sick now this model is using pi torch so the pre-trained model is a pth file we need it in onyx format so luckily snapml already has some sample conversion code let's use that as a starting point and fill out what we need so this code won't run with the recommended pi torch version that we have so we have to upgrade now we're going to load the u2 net p model because that is the small one the test input is a three channel image with dimensions of 320 and we have to make sure to use offset 11. now run this conversion script and you can see that we get an onyx model file if we open this in netron you can see the input and output tensor sizes let's drag this into lens studio and hook everything up uh what so here's where things get a little bit hairy so the error says the up sample layer must specify resizing scale and height so when i look at the model it uses this function for up sampling which takes the previous tensor size and up samples to the desired tensor size this took me a while to figure out but lens studio seems to want a constant up sample factor so i printed out the tensor size of what was going in and out of that function call and they all used a factor of two so this might be common ml knowledge but i have no idea what i'm doing here and so this was news to me the other thing i noticed in the conversion example was they were talking about setting align corners to true or using uh nearest neighbor interpolation for like the best compatibility in regards to up sampling so i changed the up sample function to interpolate and set the mode to nearest and i replaced all the instances of that function call with this hard-coded scale factor and that seemed to do the trick the last thing i did was this model was outputting like seven different images that got normalized into one but when i check each one of them they all look the same so i just removed them and i only had the model output one image this is probably a terrible idea but it was much easier to work with in lens studio and i don't know it seems fine finally i reconverted the model to onyx format and now it successfully imports into lens studio with perfect compatibility across the board now for the input scale we need to see what format the model wants the pixels in so if you go to the test script back in python you can see that the object data gets loaded with a rescale function to 320. so note that it does not crop the image it only gets scaled and then it uses this to tensor lab function which looks like it normalizes the pixels to zero to one so when the output gets saved you can see they multiply the pixels back to range zero to 255. so back in lens studio this means that we need to multiply the input by 1 over 255 or 0.0039 the stretch option should be checked and then the output should be multiplied by 255. so let's just hook this up to an image and just run it in real time for the sake of this tutorial just know this doesn't run very well in real time you probably want to use this only like one shot at a time in practice anyway let's first delete the lighting we'll just use an unlit shader for this process and we'll add a new screen image set the stretch mode to stretch add a new unlit material and add that to the image on the camera add an ml component and drag in the model check auto build add the device camera texture to the input and create an output texture now back to the material we just created change the blend mode to normal change the base color to something bright and then add the device camera texture as the base texture and the prediction texture as the opacity texture and now we can see it running in real time it's insane how actually easy this is and now you can just click send lens to device and run it on a phone that's nuts so now like i said this does not run well in real time at least on my iphone 8 so i made a little github project for you guys that uses this model to do copy and paste in lens studio you just click the screen once to copy the object and then it gets childed to the camera so you just move the phone around and then you click again to place it this does not work great because i did not get the 3d position of the object or the real scale or anything i'm just literally copying the entire image and placing it at a fixed distance from the camera so maybe we could take this further in another video if you want but anyway i put the link to that project down in the description below ironically when i was working on this project i saw that jameson tool the cto of fritz labs was using lens studio to do almost the exact same thing so he just posted this tutorial a few days ago where he makes a much better copy and paste lens he uses his own ml model for this which has a similar architecture to what we used here and his lens also allows you to place copies in 2d space as well as 3d space so i'll link to that in the description as well definitely check out his tutorial as well as all the other articles on that site they are super interesting so i ended up sending him a dm after i saw his copy and paste demo because i felt like a total piece of i was like doing the exact same thing that he was doing but he ended up being like super cool and he was like really encouraging so i cannot say enough nice things about the guy that's all i got for today but i do really like this particular model and i'd like to do more stuff with it in the future i think what would be a really interesting idea would be now getting this model to run in unity with barracuda and then kind of like pitting both models against each other and seeing what runs faster unity or snapml i have a feeling that snapml is doing a ton of performance stuff behind the scenes but i think that'd be like a really interesting test anyway um let me know what you guys want to see in the comments down below in the next video and with that we'll see you next time goodbye
Info
Channel: Third Aurora
Views: 3,793
Rating: undefined out of 5
Keywords: augmented reality, artificial intelligence, mixed reality, technology
Id: UGEfOiiTQ6Y
Channel Id: undefined
Length: 7min 1sec (421 seconds)
Published: Mon Mar 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.