The Future of Facial Animation with AI | Unreal Engine Metahuman

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone welcome back to my channel today's topic is what is the best facial animation solution for Indie filmmakers that means cheap fast and accessible let's Dive Right In [Applause] facial animation is definitely the most challenging aspect in making realistic digital characters over the years there are multiple different solutions that developed to tackle this problem the latest the way of water delivered the best facial animation to date in CG random film in my opinion it was so realistic and well done and made us believe that the blue Navi people actually exist but only one problem we can use the tech so in today's video I don't want to talk about the most advanced expensive or industry-leading facial animation solution that nobody has access to and I probably don't even fully understand but instead I want to share my experience with facial animation and tell you all about the different solutions either I used in my own production or I tested an r d with my team the solutions that a content creator actually can have access to in their production or maybe save you some trouble with the ones that I personally think that didn't work or not worth trying foreign is very hard to animate and bring emotion into the character it often takes days to nail a few seconds clip iPhone arcade is probably the most to use the tool by creators for the facial animation capture epic even made an app that you can stream your iPhone captured directly onto your meta human but unfortunately the Fidelity for AR kit is low so Ark tries to interpret every human face movement into a combination of 52 blank shapes but our face is so unique from person to person and the way we moved smile talk speak so the raw data from the iPhone capture is already heavily optimized or it can cut it Smooths so when we plug the low resolution data into unreal omaya no matter how high fidelity the rig is like metahuman in this case it won't fully unleash the potential of the rig it only activate the basic layer of a controllers or a part of them so just like imagine if we put a blurry 360 video onto a 4K screen it won't look better it will still look blurry but even though arcade is a low Fidelity capture but it does capture a couple things pretty well in my opinion for example the eye tracking is really good and big face motion like Jaw open eyebrow it can capture pretty smoothly in fact pretty much all of my short film I use the arkit plus a lot of post work to get facial animation I made a short video before on some tricks I used in the spatial workflow in general I would record a basic performance and focus on getting the timing and eye direction right and then bake all the captures onto the meta human controller but before you bake you can actually choose to shift the clip or trim it and blend them together to tweak the timing just a little bit more or even add a clip on top of another as an additive layer before the bake it's useful for situation like you want to add a global emotion like add a bit smile to all your animation after you're happy with the timing and the remix of the clip now you can act you can directly bake the animation back onto the meta human facial controller and add another additive layer to tweak even further so there's not too much secret to this step it's just pure labor if you want to use arcade outside unreal I've been used this little app called uh face cap or you can actually use the Livelink apps CSV data it spits out of fbx format that has 52 blend shapes data and you can import them into Maya and hook it up with a brick in there either using set driven keys or other method that you choose and during this step you can activate as many or as little controllers you want and when you're doing the retargeting profile setup in my opinion epic did a pretty basic mapping in our reel for their AR kit profile to metahuman so there is a bit more juice you can squeeze during the retargeting stage if you want to go really granular but in general you might be able to hit a better shape but the motion is still pretty linear so check out the motion you can tell the motion is pretty linear due to the lack of number of blend shapes 52 is just not enough even not to even mention four in-betweens so now let's move on to some AI driven facial animation solution the first one I want to talk about is Facebook this is one of the HMC face good developed and they send us for testing and we worked with them a lot last year to feedback on their system we even break down their system and our facial solution opinion in our talk in cgraph 2022 I'm sure it probably came across in the videos that Facebook put out and they work with some other creators as well in general face good uses a video or RGB input to train their AI neural network and match it up with a controller combination in the training data set later on when you're feeding a new video or you put on the helmet to stream it live the AI will generate the controller Keys based on the training data this is a very simplified version of what's doing under the hood let's break it down a little bit more so you can understand the AI driven facial solution what exactly they are about and first let's talk about the data set we manually created 20 plus more minutes of training data and those are actually just footages of me doing range of phase motion and talking this means I blink eye motion mouth motion pretty much try to cover everything that my face can do in that 20 minutes then track my facial Landmark manually for that 20 minutes data with all the data tracked then we have to manually retarget those data to meta human controllers and this is a highly subjective and artistic step and they can greatly affect the final result because air really doesn't know what a good facial animation is it matches the input and the output based on your training set right so the better the retargeting will more polish the training data is when it comes to the controller animation the better the final output AI will generate in theory which is in my opinion is the one of the drawbacks for the Facebook system there's no standard or ground truth to compare to when doing the retargeting so all we can do is try our best to match it up with a facial post to the plate and this can be again very highly subject and artistic it took us over a month to polish that 20 minutes data for the training set it's very front heavy and it's quite a bit of an investment to be honest but regardless how about the result well it's complicated from the user experience side it's not bad once you have the AI trained you just need to fit in the plate and air will automatically generate the keys onto the meta human rig very fast and you can live stream the face into unreal rig as well but in terms of the results when we're feeding the same plate that we use for training which in now we have the ground truthful with AI generator key somehow only hit about 60 to 70 percent of the Fidelity of the ground truth Which is less than ideal and you can see here the jaw probably will open last and the mouth movement is less sometimes the eye doesn't blink properly why does that happen here's my assumption when you compare the tracking points from face good versus something like cubic motion and this is from The Matrix awakened demo behind the scene you can tell that face good samples a lot less facial land markers so in this stage the tracking data already contains less information of what my face is actually doing especially around the mouth area so that's a first stage compression and the second stage compression might happen during the AI training imagine in the 20 minutes of data my eyes might blink 100 times and each time it blinks a little bit differently so when AI trained through those data it will pick the average of this motion and sometimes it might not understand what the jaw open is so it will over smooth it and therefore you will see that when the Jaws Post open now it only opens half and that happens pretty much the big emotions like a smile or eyebrow jaw I think is the most obvious one you can see the clip okay so why did I go into such a depth to talk about the Facebook system well in my understanding all of the AI driven facial Solutions on the market right now applies this similar principle they may train on different data sets or the AI algorithm might work slightly differently but the big stage of operation stays the same you track the face and you retarget the face going through the training and AI splits the output put and I want to note that the retargeting step is just a rabbit hole that can go really really deep for example iPhone uses a preset profile for key mapping to a meta human rig and the face good is manually retargeting by the artist and the diva used a 4D scan data as the ground truth to back solve to the controllers Avatar looks like they are using a stereo head Mount camera to produce a 4D mesh approximation as the ground chews on the stage regardless with those key stage in mind tracking retargeting training and output next time when you see a new phase solution with the AI component in it Without Really knowing how it's working under the hood you can take a look and see what's the input is that a single camera or a stereo camera and what it needs from you in terms of the retargeting and you can somehow predict what the output is going to be so with everything that we discussed it is pretty apparent that Al would play a huge part in this facial animation solution moving on in the future but also for the overall transformation of the creative industry and for us Indie filmmaker I don't think there is a way for us to create the amount of the data or to the quality of the data like 40 scan to train the facial animation AI to compete with those existing Solutions and for now I will just stick with iPhone AR kit and wait for someone to fully Unleash the Power of those devices I mean you got a lidar here and it you got that sense in the front as well so potentially it can provide all the data you need to train a much better AI model and will result in a much higher Fidelity facial animation and also I want to do a deep dive into the facial rigging system especially meta human rig to share with you the r d work my team did over the years I'm actually working on something with my friend Tobias and David starting with a very ghetto scan rig in his basement so stay tuned for that hopefully you found this video is helpful I'm trying to share everything I know about with you guys I firmly believe the more people have the knowledge the better and the faster progress can be made so make sure you subscribe and hit like if you enjoyed it and leave a comment about any questions you have and also make sure you check out this lighting tutorial I did in unreal about how to create a realistic night rainy scene I'll see you in the next one [Music]
Info
Channel: Unreal Han
Views: 34,467
Rating: undefined out of 5
Keywords: unreal engine, metahuman creator, facegood ue5, facegood metahuman, facial animation, metahuman face animation, unreal engine cinematic, facial animation reference, facial animation tutorial, facial animations at 200, facial animation unreal engine 5, unreal engine 5.1 metahuman face good, facegood live metahuman facial animation, metahuman facial animation livelink, metahuman creator tutorial, livelink face unreal engine 5.1
Id: Wv9tWxm5GcA
Channel Id: undefined
Length: 11min 36sec (696 seconds)
Published: Tue Feb 28 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.