Free Houdini Tutorial: Machine Learning with ONNX in Houdini 20

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
with rini2 we got a first official machine learning note from side effects while we previously had example scene files that did machine learning related stuff or plugins like melops this is the first machine learning tool that you get just by installing hini so this makes it quite interesting and we should take a look at what it does and what we can use it for so this is the note that we got this is the Onyx inference s and I should first of all maybe start by explaining these two words right here let's first of all start with inference if you know your new network life cycle you know that we start out by building our new net then we training and testing a neural net with a training data then we validate the use of a neural net and if we finally are happy with the results that we're getting then we can finally use a neural net for the actual task that we set out to do with this neural net and this final part this usage part this is what we call inference so onx expence so in the end will also be just there to use trained neural nuts it's not there to train some neural Nets or to build some new Nets let's talk about the onx part why is onx important well this starts because doing inference the traditional way is quite cumbersome we need quite a lot of stuff for this we need a coding environment usually python we need some machine learning libraries maybe pytorch we need our code usually started in a p file and we all also need the data that we got through training on net this is usually a checkpoint file and for pyou this is a PT file so previously if we wanted to do something machine learning related in Houdini we had to make sure that we brought all of those things to Houdini as well which me set up a custom python environment with those libraries and it's all again quite cumbersome so to make this easier what also was developed alongside those machine learning tools are so-called machine learning runtimes and onx is one of them and what those runtimes allow you to do is a runtime first of all takes what's usually done with a coding environment and a machine learning library and simply turns it into a program in this case this onx runtime and then we as a user we can simply take a neural network project a pi file and the training checkpoints and Export them as an onx file as a single file that we can simply hand over to Onyx runtime and our Onyx runtime will make sure that we can use this machine learning file in a lot simpler way and what this in the end means for Houdini is that side effects took this Onyx runtime and implemented this into Houdini in this Onyx inference St and this onx file is something that we can provide we can provide it either because we trained our own du net and exported it as an onx file or this may be something we got from our technical director at the studio or maybe this is also something that we simply found on the web because Onyx is an open standard and it's been around for quite a while so we also find a lot of really good onx files online and the last part is exactly what we're going to do today we're going to download a bunch of onx files and test them inside hini so first of all to download some files I went to github.com slon x/m models this is a collection of pre-trained onx models that we can simply download and play with and the first one that I'm going to download right here hides under domain based image classification and is an onyx version of the amnest model which is a sort of hello world for machine learning so let's download this and we can download all these models by clicking on this link right here and here we have usually a lot of options for selecting different versions basically an onx version and an upsert version in the end there are some technical details surrounding those but in the end this mostly boils down to trying different models out and see which work in Nini and which don't and in this case I'm going to use this amnest -12 model and I'm going to use the simple download without the sample test data so let's click on this button right here on this link and finally on this page we can click this download button right here handwritten digits are a bit boring let's select something that's maybe a bit more interesting and in here under image manipulation we have a fast neural style transfer so let's try this out as well again we have some options to choose from here we have first of all a bunch of different styles that we can try and we also have the selection between an obser version of 9 and8 I think in this case this doesn't matter for hini but at the end I'm going to download this mosaic model and with have an offset version of nine so let's click this download button right here and let's download it here and finally I wanted to download something that's maybe a bit more useful for us I wanted to find an onx version of the midus depth estimation model where we can put in an image and get a depth map out and in this case I found this project from Julian K which used an onx export of Midas for Unity however since this is an open standard we can simply use the same files in hudini as well and we can find those files under files and versions in the onx folder and in here I want to download the dpd swin 2 base model with a resolution of 384 pixels and I can hit this download button right here to download my model now finally in hini I want to drop down a g note I want to dive inside and I want to drop down my Onyx inference node and the first thing that I'm going to do always when I bring in such an onyx node is I first of all want to load in the Onyx file that I downloaded so in this case let's start with amnest let's select this amnest 12 model right here and next we want to hit this button right here this setup shapes from model button let's click this and but this button but does is it fills in those four parameters with the right values that this net this model that we loaded in expects and what those values are are basically how the data that we're going to feed into an itet should be shaped so let's take a closer look at this this consists of a bunch of values those values are read from left to right and top to bottom and these are the dimensions of the input tensor this is a fancy way of saying that we're going to feed our data to an in a multi-dimensional array and that array should have a specific structure that our net expects now the first value inside that array will always be the badge size so in this case how many pictures of handwritten digits we're feeding to our net at a time and usually with those Onyx models this will be either one so we going to feed one image at a time through our net or this is going to be minus one which means that we can feed any number of images through a net at a time in this case the is set to one next because we're feeding in an image there's a sort of standard data structure and that's quite common with machine learning which means we get first the number of colors in our images and in this case since this is a grayscale image this is just a value of one and then we have the resolution of our image this is 28x 28 pixels this is what we're putting in and what we're getting out is again or batch size and then just 10 float values which will be the 10 probabilities How likely a certain number is based on the image that we're feeding in so let's finally try this out let's first of all create a canvas to write on in this case this will be a grid I'm going to set the size to one by one but this doesn't matter in this case what does matter though is the solution we have to put in our 28x 28 pixels like this then we need to paint a number onto this so let's drop down a paint note and I'm going to choose the paint a mask variation of this note wi this in let's switch to our handles tool let's make a brush a bit smaller and let's maybe first of all get a visualizer going so we can see what we're doing and let's maybe draw a nice four in here like this jump back to a camera tool and let's feed this to onx net now we have to set up the way how we're going to feed this data to our Network in this case I'm going to use a Point attribute and this point attribute is called in this case mask since I used this paint mask note earlier and finally we have to select how we're going to write this data back out again in this case again I'm not going to choose a volume I'm going to choose a point attribute and let's maybe call this probab for probability and now taking a look at the info panel we have 10 points for 10 values and if you take a look at those values we can see them right here and if we sort by the highest value we can see by a large margin four seems to be the most probable digit that we drew right here and this is of course also correct we drew a four in here as well as well so our net right here is obviously working however this isn't hugely exciting let's maybe try our style transfer next let's drop down another Onyx note let's load in our Mosaic 9 model again let's hit setup shapes from model now this looks like this again we have first of all a badge size right here now we have three values so an RG and B value since we're putting in color values and you have a different image resolution so in this case 224 by 224 pixels and also our output matches exactly our input because again we're just going to transfer a style to our image so the image size should match our input image as well let's set up this let's drop down a grid node again again set the size to 1 by one but in here I want to put in 224 by 224 pixels I also want to set up UVS on this grid let's Dr in a UV texture node and for the image that I'm going to load in I want to change up the scale and offset a tiny bit and going to set the Y scale to minus one and an off set of one to flip it the right way around and I want to set the x scale to a value of 6 and give it an offset of2 and this is simply there because I'm going to load in a 16 by9 image and putting this onto a square patch of geometry would squash it and this is there to limit that squash now let's finally load in our image drop down that from map and in this case I'm going to load in a tea image from a previous episode and you can choose your own input image that you like in this case mine will be this right here and let's maybe also turn off a UV grid like this let's write this into our onx note again we're going to set up how we want to write in and out our data so in this case this will be both Point attributes and let's put in our CD both in the input and the output and since this Onyx node always collapses this output around world origin we have to copy over attribute again so it's Dr down copy note and let's copy over the ecd attribute that the Onyx note spits out in the end and also maybe R like this so you can actually see the output and as we can see right here there seems to be something wrong with the output we seem to have made some mistake right here so what mistake did we make right here this is something that's quite common with feeding an image data into this Onyx node because the data that we feed in right here the CD attribute doesn't quite match up the data structure right here that onet expects what do I mean by that well what an net expects is first of all all the red values then all the green values within image and then all the blue values within image this is what this tells us however what we're getting is by writing in just a CD attribute is all the RGB values for the first point then all the RGB values for the second point and so on and so forth so again what our net expects is matching the data that we're feeding in so in the end we're getting a garbage output luckily this is quite easy to fix we just have to restructure our data first of all I want to set a different attribute that I want to write in instead of writing in a vector attribute I just want to write in a float attribute that I will call in and I'm going to write out a float attribute that I will call out and what I want to create now is a set of points a set of points of a total number of three by 224 by 224 that matches this data structure and I can create this with three point wrangles this we in our first one again I'm going to create a new attribute called in and this should be a float and this should be equal to from a CD attribute just the red value like this this is all this Wrangle does I'm going to copy this two more times and on the second one I'm just going to choose all the green values and on the third one I'm just going to choose all the blue values and now with our data separated like this I can merge this back together and now taking a look at the info panel I should should have 3x 224 by 224 points right here and this should add up or multiply up to this value right here and on those bunch of points I have the value that I want to feed into a net as a single float value called n and this now should match this structure up here so let's feed this into our net our net isn't complaining anymore and now we have to do the same that we did right here to write our data back out to our grit so attribute copy won't do here we need another another Point Wrangle and on this point Wrangle I first of all want to grab all the float values this will be a point function on gam one which takes a look at the value called out and our float values should simply match our PT num that we're putting in then let's grab our green value a green value should match the PT num that we're putting in plus the total number of points that we have on this G stream because we have three times less points on this ge stream than we have on this ge Stream So to get a number of points let's first of all create an in called NP and this should be equal to the end points function on G stream zero which gives us the number of points on that g stream and let's simply write at PT num plus NP and let's do this one last time for our blue values and our blue values will be the values of our PT num plus NP plus NP a second time like this and finally let's write this out to our CD assist is equal to a vector consisting of our r g and B values like this now this looks a bit more correct the one thing that we need to do in the end to make this work is to normalize all those values that our onx net here is putting out because the output values of neural Nets usually are in between zero and one values like we need for a color we have to normalize those ourselves so in this case I'm going to use a Labs attribute normalized float let's wire this in and w fire in our select OD attribute in here and now finally we have the style transfer on the image that we're putting in and since I put in an animation I can scrap in my timeline and as we can see the style transfer is working really quite fast and nicely so this is our style transfer working let's maybe Now quickly take a look at our depth estimation as well this in the end should be pretty similar to this setup so let's just copy this let's set a display flag at our next note let's load in our DPT swin model again set up shapes from model wait for this to cook now the data that we're feeding in looks quite similar we still have a badge size of one still three colors but now a different resolution 384 by 384 and the output looks a bit different as well because we don't have any colors for our output we just have single float values for our depth so in this case out is just again our badge size and then just a resolution like this so let's set set this up here on a grid let's enter 384 by 384 everything else that we did here loading in our image and separating it and restructuring our data this should all still work we can still leave this attribute normalized float in here the only thing that I want to change in here is how we write back our data we can already see a sort of dep map appearing here this in this case doesn't need to be this complicated what we want to do in the end is simply first of all grab or depth value so float depth will be a point attribute on G stream one called out and since we have the same number of points on this G stream as we have on this G stream we can simply put in at PT n in here and then let's move our points by our depth value so yet P plus equals a vector that we're going to create and this has an x value of zero and Y value of depth and a c value of zero again let's take a look at this this seems to be working already let's maybe bring over attribute from map so we can also see our image overlay onto this and yes this seems to be a sort of working death map at least the kind of quality from a death map that I'm going to expect from a machine learning model let's maybe just make this a bit more shallow let's maybe take this to a value of3 like this and yeah I think this looks a bit more accurate now this is where I'm going to leave this for today if you're curious about the T image of this episode you can take a look at the scene file you can see a combinant version of How I build this in the end a style transfer on a geometry inside hudini what I want to talk about quickly now is what this means for machine learning in hudini in the future is this now our standard way of doing machine learning stuff inside hudini does this make something like mlops obsolete can we now run for example stable diffusion just with those Onyx nodes and this is where it gets a bit complicated sa diffusion in this case is quite a good example because it's a quite complicated neural network pipeline I made a very simple diagram of this here our PRT goes into the text tokenizer this goes into the clip encoder then we have a latent noise which gets D noise to Output image in the end through this unit and schedular loop and finally it gets resized to the final output resolution by a martial order encoder and this in the end gives us output picture if you played a bit with aops you're somewhat familiar with this structure right here if you haven't this isn't really important what is important is that we can take a look at all these stations that a prompt goes through and sort of categorize them into two categories you have those orange stations which all are consisting of neural Nets and we have those blue stations which are all just standard algorithms and at least as far as I'm concerned as far as I got with my testing I think we can just use the Onyx node for those neural net parts so we can rebuild or use the clip encoder inside hini and the unit and the variational order encoder but not the schedul or the noise generation or the text tokenizer for those parts of this sa diffusion pipeline we still would need to either code this ourselves or use some sort of custom python environment with custom python libraries and this in the end sort of defeats the purpose of using onx in the first place so with some newer neural network workflows or some more complicated neural network workflows at least as far as I can see we are running into some limitations of the tools that we have right now in hudini I'd love to be proven wrong on this I'd love to see some more talented TDS tackle this challenge right here and maybe create a working version of sa diffusion inside a vanilla Houdini just with those Onyx notes I'm really looking forward to this one but at least from my EXP experience this at the very least seems to be a pretty difficult task this isn't something that you build in an afternoon however still with the options that we got right now we still have quite a lot of neural Nets that we can use and we have also quite a lot of really interesting applications where we can train our own dual Nets this is definitely something that we're going to explore in the future but until now I hope you have fun playing with this new Onyx note trying out different models that you can find online and until next time it's cheer and goodbye if you like what we're doing please consider becoming a patron of ours not only for supporting anma but also for access to end of courses like our Advanced setups course which already includes a couple of videos about new Nets or a complete rent beginner series with 30 videos telling you everything about attributes simulations and so on also let me say thank you so much to all our existing patrons without you this channel would not be possible thank you
Info
Channel: Entagma
Views: 9,330
Rating: undefined out of 5
Keywords: Houdini, ONNX, SideFX, Houdini 20, VEX, Machine Learning, Neural Nets, AI, Style Transfer, Depth Estimation, MNIST
Id: aCAatiY53s8
Channel Id: undefined
Length: 20min 56sec (1256 seconds)
Published: Mon Apr 15 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.