Creating a MetaHuman Identity from smartphone or HMC footage

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
SPEAKER: Hello and welcome. In this video, I will be sharing best practices for capturing footage that will be used to calibrate MetaHuman DNA for a MetaHuman identity. A MetaHuman identity will need to represent the performer delivering the take in order to transform performance footage into an animation. For this reason, a performance that is captured by someone else should be processed using that performer's unique MetaHuman identity asset. If the appearance of the performer changes significantly, especially their tracked features, it is best to record new footage in case recalibration is needed. It is strongly recommended that the MetaHuman identity is calibrated with footage from the same device class that will be used to capture the performance. A few details for good calibration footage include an unobstructed view of the performer's facial features, no accessories or heavy makeup, low reflectivity, and limited facial hair. This is a marker-less solution, so there is no need for facial markers. Regardless of the device class being used, a frontal frame while holding a relaxed neutral pose will be needed. The face should be framed in the center from a slightly lower angle, with an upper facing tilt. The mouth should be closed, with the seal of the lips in view. The eyes should be looking directly ahead in the distance. The direction of the gaze is important only for the frontal frame. With an iPhone, two additional side frames while holding a neutral pose provides more depth data and helps with the accuracy of the calibration. The same markers should be visible in all three frames without the corner of the eyes obscured. For a head mounted stereo couple, it is only possible to capture a frontal frame. So in this case, a single frontal frame, followed by a teeth fitting pose, should be used. If the iPhone will be head mounted, you could do the same, but it would be better to take it off and capture three frames instead. A frontal frame teeth fitting pose, where the corner of the incisors are visible, is used to register the bite. This is optional, but is highly recommended. Bite down normal for this pose. It is fine if only the upper or lower incisors are visible, depending on the bite. For footage captured with an iPhone device, indirect lighting is ideal, as it provides even lighting on the face, softening any shadows. A desk light pointed at a white wall or low ceiling or a ring light attached to a tripod for a dedicated capture station can be used for even lighting on the face. Good lighting will not affect the depth sensor quality, but it can help reduce motion blur, giving clearer frames when there is movement. The iPhone can be mounted to a tripod or held with a steady hand. If holding the iPhone, consider holding it with both hands or using one hand to support the wrist or elbow for more stability. If the iPhone will be head mounted, using either of these methods will be better for the additional side frames. Otherwise, capturing a neutral pose followed by a teeth fitting pose will be fine. In the app settings, by enabling depth preview, we can gauge the optimal distance the face should be from the iPhone. Depth is shown as gray shading, indicating we are getting as many pixels as possible. Black artifacts begin to appear when getting too close, indicating near detail clipping. This means the depth camera is failing to register part of the face. If you move too far, this leads to reduced frame coverage, lowering the resolution. The ideal distance is where the face occupies as much of the frame as possible, while adjusting the distance once clipping is seen on the tip of the nose. It is OK if there is a little bit of black on the sides of the nose. Position the iPhone at a slightly lower angle with an upper facing tilt to reveal more of the lip seam and inside of the upper eyelids, while keeping the face framed in the center. You can now begin recording the frontal frame with a relaxed, neutral pose. The mouth should be closed and the eyes should be focused on a point directly ahead in the distance. For the side frames, turn the head slightly to one side. Be mindful of motion blur and try to keep the head still and hold the relaxed neutral pose. Only angle a little to show more jaw and lip curvature, while making sure all of the facial features can still be tracked, particularly the corner of both eyes. It is fine if you are not looking directly ahead, as the eye line is only important for the frontal frame. You can now finish with a teeth fitting pose. Bite down normal without forcing your teeth to be perfectly aligned or making an extreme full bared teeth expression. Depending on the bite, it is fine if only the corner of the upper or lower incisors are visible. For footage captured using a stereo coupled device, with the helmet mounted it to the performer's head securely, a combination of diffuse and onboard lights should be considered. Hot spots on the face might challenge the trackers. In this case, it is better to be underexposed than overexposed. Powder or matte makeup can help reduce reflectivity. Ensure that there is plenty of room at the bottom of the frame for your jaw to open fully so that all facial movements are captured correctly. The face should be framed in the center of the image, and the center of the image should be aligned with the upper part of the base of the nostrils. Using a grid overlay can be helpful for this framing process. The nasolabial and ocular area of the face should be where the lenses are focused, especially for cameras with a shallow depth of field. A neutral pose can now be captured, followed by a teeth fitting pose. In order to evaluate the quality of the MetaHuman identity, for footage captured using an iPhone, for the frontal frame I have promoted, I am making sure that the facial features have been tracked correctly and that the eyes are looking directly ahead. For the side frames, I am making sure that the same tracking markers that were visible in the frontal frame are present here as well. For the teeth fitting pose, I have left all four teeth tracking markers enabled. Even though the corner of my bottom incisors are hidden, based on my bite, the system has positioned the bottom tracking markers in a fairly accurate location. If you are not sure how to annotate the corner of the bottom incisors due to an overbite, you can turn the bottom teeth trackers off when fitting the teeth. But in general, the position guessed by the trackers should be fine. If the performer's bite shows the other two or has all four in view, make sure there are no gaps in the bite. Otherwise, consider recalibrating the teeth on a different frame. To inspect the MetaHuman identity asset I have calibrated, I can use the various viewport modes. By selecting the single pane wipe mode, I can compare my footage as an overlay. And with the dual pane mode, I can compare the frames side by side. I can also enable depth mesh to check if any clipping was present when I captured the footage. For the stereo couple footage, I'm going to repeat this process by validating that the eye line is looking directly ahead and that the lips are sealed for the frontal frame. If everything is matching up correctly without any strange artifacts, the MetaHuman identity asset has been successfully calibrated and is ready to be trained for performance. To test the quality of calibration, I have processed performance footage in the MetaHuman performance asset and am comparing the gaze and teeth alignment of my footage with the MetaHuman identity asset. If the eye line is not correct, I can go back and select a different frontal frame and recalibrate the MetaHuman DNA in the MetaHuman identity asset before processing any more performance footage. If the teeth alignment is off, I can also go back and select a different teeth fitting frame, then refit the teeth and train for performance again. A property in the teeth pose called manual teeth depth offset can also be used to move the teeth slightly forward or backward. By default, it is set to 0 and has a maximum allowed value of positive to negative 1. Increasing the value above 0 will push the teeth back into the head, and decreasing the value below 0 will move the teeth forward. This will only become visible after refitting the teeth. Now that I've covered the process of capturing and assessing the quality of the calibration footage for a MetaHuman identity, in the next video I share best practices for capturing performance footage to be used with MetaHuman Animator.
Info
Channel: Unreal Engine
Views: 31,458
Rating: undefined out of 5
Keywords: #MetaHumanIdentity, #MetaHumanbestpractices, #digitalhumananimation, #realtimeanimation, #NPC, #videogamecharacters, #videogamecinematictools
Id: qPhn28Jk3Mo
Channel Id: undefined
Length: 9min 2sec (542 seconds)
Published: Sat Sep 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.