Image to Mesh using ComfyUI + Texture Projector

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone this is Jake recently tripos Sr jointly produced by tripo Ai and stability AI has amazing results the 3D Reconstruction from a single image has also become a Hot Topic in the AI world the models you see are first generated through the 3D Reconstruction from a single image process and then adjusted let me talk about the conclusion first the 3D Reconstruction from a single image can be integrated into the 3D or game production process today I will introduce how to achieve it this is the final bake texture the tool used is the texture projector I created my original plan was to try the image layering custom node in comfy UI to directly generate a masked image similar to the layer diffusion function in Forge however the test results were unsatisfactory with the same parameters this layer diffuse node will greatly change the original image and the image quality will be poor if you reduce the weight the image will be close to the original one but after the layers are split the color layer will be Overexposed and The Mask will become Pure White it currently does not support image to image so I gave up using it this time and looked forward to a better image layering node if you have Forge installed you can use its layer diffusion to generate we are seeing Meshi currently the most popular commercial solution for the 3D Reconstruction from a single image this is an image I generated using comfy UI the imageo image and IP adapter processes were enabled you can see the promps in the subtitles I have uploaded this image to Meshi and it generated a mesh model for me the accuracy of the model and the uniformity of vertex distribution are quite good but I don't know why its nose is crooked when the material display mode is enabled the result is not ideal although the resolution of this image is only 512 it is not so bad in theory maybe it's because the type of pictures chosen happens to be something meshy is not good at this result is difficult to use in projects not to mention meshy charges a fee I then tried tripo Sr currently the most popular solution for the 3D Reconstruction from a single image this result is acceptable but the shape of the back of the head is not ideal click this button to download the model but in terms of results because the UVS of the model are automatically expanded and the texture resolution is low it is hard to put it into use let's take a look at Wonder 3D it first calculates the viewing angle in six directions which is a great idea the six directions include the front left side right side left 45° viewing angle right 45° viewing angle and the back but the head proportions are not good and it doesn't understand the back the head very well the second set of images is the corresponding World coordinate normals calculated based on the six directional images we keep this set of plans as a backup after all we have generated multi-angle images that can be used I would run Wonder 3D locally and generate the model let's take a look at triplane gaussian this solution combines gaussian splatting but it doesn't quite understand the recognition of human faces unexpectedly a mask was generated moreover the operation method of the window interface is a bit strange and it cannot be rotated to the front of the model let's watch the video lgm also incorporates gaussian splatting the result is not bad but a point cloud is generated the model is transparent due to insufficient Point Cloud density from its Edge the apply format is a point Cloud file format you need to use a tool it provides to convert the point Cloud into a mesh model similar to those provided by this page I originally planned to try this solution as well but because I didn't have enough VR I gave up however because the algorithm is based on point clouds there is basically no possibility of adjustment for the 3D production process finally we focus on this CRM solution this was proposed by the Chinese teams including teams from chingua University and rmin University not only does this solution turn out great the most important thing is that its process products that is the six Direction extrapolation results are great and these images can be used in the later adjustment process the difference from Wonder 3D is that CRM generates six positive direction Maps front back Left Right top and bottom there is also a set of world core coordinate position Maps instead of normal Maps CRM uses these two sets of images to generate the final model this is the homepage of CRM which introduces the algorithm logic images of six directions generated from a single picture then the images of world space coordinates based on the six directions and finally the textured model let me tell you my guess about the CRM algorithm CRM provides the generation of Lego style models from which we can imagine that the algorithm behind it should be based on point clouds not a random Point Cloud but a point Cloud uniformly distributed by World coordinates this is a lot like creating a Lego model in Houdini first a point cloud is generated within the volume of the pigh head model and then Lego cubes are generated based on the point Cloud each Cube reads the color of the model surface at the corresponding World position which comes from the texture in the UV space CRM first calculates the color information of six views from one image which is very impressive the model volume and point Cloud are then deduced because the currently supported resolution of its large model is 256 there are 256 * 256 * 256 points excluding points in the texture mask part that is points outside the model volume then color sampling is performed based on the corresponding relationship between the world coordinates of each point and the six-sided views to obtain the color information of this point finally the volume Point cloud is used to reversely generate a three-dimensional model and the point color is used to generate a UV space map if you are interested you can check out the video link of generating a Lego model by Houdini to learn more about the algorithm behind CRM the CRM solution has been integrated into comfy UI through this 3D pack custom node 3D pack integrates all the solutions of of 3D Reconstruction from a single image I just introduced I believe this will be another must install custom node now let me introduce CRM in detail what you see now is the interface of comfy UI open the workflow for the 3D Reconstruction from a single image before this workflow an existing concept design or image generated through the text to image or imageo image workflow should be prepared note that the image must be close to the front view of the subject the reason will be introduced later at the same time the image background should be removed the image Alpha Channel stores the subject mask to avoid misunderstandings by CRM and reduce computing power consumption several workflows are preset here the first is to generate the MV MV is multiv view take a look at the multiv view generated earlier the consistency of the results is amazing the only disadvantage is that the current large model only supports 256 resolution which is slightly lower you can see that each image is only 256x 256 which is relatively small let's look at other cases now I use this headshot to create the CRM model you can remove the background using the select subject tool in Photoshop you can also directly use the image remove background node in comfy UI was custom node to remove the background so you don't have to switch software back and forth first enable the multiview generation workflow and use the default parameters of the node for large models use this file prefixed with pixel click run the above node loads the image sequence and can directly read the process multiviews without generating them again this random seed setting works perfectly for the bottom and back of the model the multiview generation speed will be faster after the CRM model is readed into the memory I'll adjust the parameters and try again if the multiview results are not ideal you can change the random seed and generate it again the head shape generated this time is better I set the project name and the generated images will be automatically stored in the corresponding directory I will choose this set of images to continue enables loading image sequences and turns off the multiview generation workflow notice the set of nodes here because the multiview format directly generated by CRM is a bit strange an error will be reported when reading in the subsequent upscale workflow the function of this group of nodes is to convert the image format to make it usable the principle is to Overlay each picture on a pure black background image image and output the picture remains unchanged but the format is converted then generate CCM which is a multiview canonical coordinate map or World coordinate map note that the large model here needs to be switched to CCM the generation is [Music] complete something goes wrong with the bottom of the model it feels Hollow the calculation behind the chin is very accurate adjust the par parameters it feels softer and the bottom of the model can be recognized we can also repair multiv views mainly adjusting the proportions of objects in the six views to ensure correct CCM multiv views this will be introduced in the Viking Treasure Chest case later next use two sets of multiview images to generate the CRM model enable the generation workflow the model has been generated the result is great even better than the one on the official website we still need to do one more step upscale the multiview images I enabled the method of upscale by model upscale the multiview images 4X to 1024 resolution for the upscale model I chose Kim 2091 anime sharp the generated image will be saved in the upscale directory the mesh model file just generated is stored in the mesh directory a total of five pictures except the front view have been upscaled how do I use these images in addition flaws in the images also need to be repaired in addition to the CRM model I will use the solution based on wonder 3D combined with the new s to generate another one the new s model has a higher number of faces than the CRM model first generate multiple views use the default parameters after reading a large model the generation is also fast multiview colors and normals were generated and masks were generated using the background removal note the results are the same as what we tested on the web page instead of using multiview images we will look at the model it generates generating a newest model takes a long time the normal of the model is reversed we can enable flip normal in the switch mesh AIS node to flip it but I will do it in 3D Max later now you see a scene I created in 3D Max there is a cube with a side length of two units the standard size of the CRM model six cameras are placed along the coordinate axis facing the CRM model the camera has orthographic projection mode enabled with a viewing angle of 90° these six cameras represent the observation perspectives of the CRM multiview images I imported the CRM model into the scene it looks good the first problem is that the texture lacks accuracy because the resolution of the sampled multiview images is only 256 in addition the resolution of the baked image is only 1024 when you zoom in you can see that it is blurry and the details are insufficient even if it is high resolution fixed to 2048 the details of the original image are difficult to reproduce the second problem is that the model has a relatively High number of faces and is not built according to the topological structure of the human head although this model can be used the result will be better after I adjust the model and texture let's take a look at the differences between the three models they are CRM tripo Sr and wonder 3D plus new s when using the 3D pack corresponding process to generate the models but the latter two did not generate textures maybe I made a mistake first compare the number of faces the CRM model has 40,000 faces which is more appropriate the tripos SR model has 160,000 faces and the newest model has 1.07 million much more than we need furthermore the CRM model is closest to the reference image the tripos SR model is quite different and the proportion is wrong the newest model has rotation and cannot be used the front is somewhat similar to the reference picture but the back is terrible through comparison it is not difficult to conclude that the CRM model is better for this head I applied a white material to the model and found many bumps on the surface this is an accuracy problem caused by too low a resolution however the detailed features of the character in the reference are all shown then I optimize the model I first take Pro optimize which comes with 3D Max as an example and I will compare several optimization methods in a moment control the number of faces of the model to 10% to 20% of the original number add the relax modifier to reduce bumps and enable saving outer Corners add an automatic smoothing modifier note that Pro optimize will cause some faces to to appear interspersed and need to be repaired manually finally turn it into quad faces what you see now is a comparison of several model optimization processes this is the original CRM model the models listed horizontally are the results of four optimizations based on the same model Pro optimize is the built-in modifier of 3D Max poly cruncher and rapology are thirdparty modifiers all major 3D production software have corresponding versions zbrush is also on the list retopology and zbrush will retopologize the model not just optimize it the models listed vertically are reconstructed using different methods based on the CRM model subdivide is a built-in modifier of 3D Max that can evenly subdivide the CRM model voxal Remer is a volume modeling method similar to Houdini's voxal mesh but without optimization it is remodeled with a new more uniform topology based on the volume of the CRM model zbrush Remer is a model generated using the Zer measure tool in zbrush to specify the face looping method and retopology according to the high number of faces the CRM model is similar to volumetric modeling you can recall its Lego model example comparing it with voxal Remer it is not difficult to see the similarities of course the voxal Remer model has a higher resolution the pro optimized modifier is added to the CRM model and the number of faces is controlled at 20% of the original value from 40,000 to 8,000 the quadri modifier is added to turn it into quad faces the smooth modifier is added to create Autos smooth groups Pro optimize will adjust the face number distribution according to the changes in model curvature reducing the number of faces in gentle places and increasing the number of faces in places with large changes this is its Advantage however when using Pro optimize optimization face interspersion will occur this was discovered in previous testing poly cruncher is very similar to Pro optimize with the same modifier stack there will be no face interspersion and the vertices will be more evenly distributed the subdivide here chooses the Adaptive method which allows the vertices to be evenly distributed while subdividing after subdivision and then optimization compared to the model without subdivision the vertex distribution is more even and the number of face interspersions is reduced when using Pro optimize after Pro optimize optimization the number of model faces is adjusted according to the change in model curvature the model optimized by poly cruncher will be smoother note that there is a problem with the quadri modifier of the model in the video many polygons appear and quadri modifier need to be added again comparing the models in the upper and lower rows the model with subdivision will have better results check out the voxal Remer let's adjust the resolution value and see the results the same modifier stack is still used here of course it can also be optimized by reducing the resolution of voxal Remer but details will be lost in theory this method will be better than the previous two rows because using voxal Remer is equivalent to subdividing the volume of the CRM model we can add more transition details the solution for optimizing the zbrush remesher model is not recommended it is just listed for comparison because optimization will destroy the topology then retopology loses its meaning the rapology modifier combines the three functions of retopology quadrilateral isation and automatic smoothing I selected reform mode and set the number of sides to 8,000 it can be seen that the model topology is good and there are Loops but it does not follow the structure of the human head this zbrush model is generated using the Zer measure tool to specify the face looping method and rapology according to a low polygon count I will introduce it in detail in a moment let's compare them from the front view my first choice is read apology because it is simple and effective if you don't use thirdparty tools then choose subdivide with Pro optimize in today's video I will use the zbrush Remer of relatively low poly counts and the subdivide model with Pro optimize as examples to continue the adjustment work because I chose a human head model for demonstration the best solution is to sculpt details in zbrush and use Zer meure to read apologize and optimize import the model into zbrush gentle breeze blows it dances on my skin I feel the one of the sun filling up the air it's like a symy of nature everywhere the sound of the Waves crashing along the r of Lees as they fall to the floor the rym of rain drops H the ground it's a Harmony Nature's [Music] around the true Nature's music all [Music] [Music] around [Music] no for the music speak so clear and refreshing bring [Music] Che you can add subdivisions and sculpt the model for more detail before drawing guidelines and read apologizing optimization of both models has been completed and their UVS have been recreated because it was just to demonstrate the subsequent baking process the UV unfolding did not strictly follow the specifications of the human head model I prefer this model using subdivide with Pro optimize the face is tougher because I didn't sculpt the zbrush model looked too smooth export the model one by one I'll make the final adjustments in UI I now use the texture projector widget to make final adjustments I won't explain it in detail here if you are interested please watch the detailed video I will put the link in the [Music] description [Music] [Applause] [Music] how can it match the multi viw view angle of CRM when you create a projection rig it will be moved to the center of the bounding box of the selected model its position value needs to be reset to zero to comply with the requirements of CRM make sure each projection camera is in orthographic mode and the ortho width is set to 200 this is because the scale of the CRM model is a 2unit cube in 3D Max and uee the unit is cenm after the model is imported it is upscaled 100 times and becomes a 200 CM Cube rotate the top and bottom cameras based on the multiview reference image you can see that it matches perfectly check each Channel I mainly use outline images for alignment click bake all images after creating the projection material you can close the texture projector start to sing as the gentle breeze blows it Dan on my skin I feel the warmth of the sun filling up the air it's like a symphony of nature everywhere the sound of the Waves crashing The Long Shore the rustle of these as they fall to the floor the rythm of rain drops H the ground it's a Harmony of Nature's music all [Music] around true Nature's [Music] music [Music] [Music] the music speaks so clear up and refreshing bringing Che [Music] or you can first add details using the imageo image method in comfy UI and then make adjustments the ears also need to be repaired based on the Contour images [Music] [Music] [Applause] [Music] you can complete this adjustment step by choosing your familiar texture projection and baking tools I fixed the flaws behind the ears import the textures back to UEI you can use 3D painting tools to repair the UV seams of the texture import the zbrush model dark spots appear on the surface this is because the model has changed and the depth texture needs to be updated the projection rig position value must be reset to zero before updating in unlit mode both models look identical subtle differences can only be seen under lighting conditions put the CRM Pro optimize and zbrush models together and compare next I will use the Viking Treasure Chest as an example to explain why the reference image must be close to the frontal view and how to adjust the Distortion problem on a bright sunny day the birds to sing as the gentle breeze blows it Dan on my skin I feel the warm of the sun fing up the air it's like a syy of nature everywhere the sound of the wav crashing along Shore the r of thees as they fall fall to the floor the rym of rain drops H the ground it's a Harmony of Mak music all [Music] around true Nature's music [Music] around [Music] no forward the music speak so clear up and refreshing [Music] bring [Music] I plan to use control net to control the perspective to ensure I only see the front of the treasure chest but seeing more from the side will be better which I only thought of [Music] later [Music] [Applause] [Music] on a bright sunny day the birds start to sing as the gentle breeze blows it dances on my skin I feel the warmth of the sun fing up the air it's like a symphony of nature everywhere the sound of the Waves crashing the r of leaves as they fall to the floor the r of rain drops hit the ground it's a Harmony nature [Music] around true only Nature's music go around oh [Music] [Music] no we forward the music speak so clear up and refreshing [Music] bring you can see that choosing a reference image close to the front view is very important for the 3D Reconstruction from a single image workflow just as I edit this video the stability AI launches the SV 3D model which is more stable than stable 0123 and the resolution reaches 576 if true we have another locally Deployable and practical 3D Reconstruction from multiview images workflow s sp3d goes around the model and generates 21 pictures however the first and 21st pictures are the same they are both the reference image that is one image every 18° the consistency of multiviews is good but the Distortion problem still exists which can only be weakened by making the reference image closer to the front view these images in the directory were generated using the image of the Warrior's head as a reference after testing when recording this video kyui only supports s sp3d with an elevation angle of zero and the number of frames is Best Kept at 21 if lower the resulting results will be more distorted and it seems that the the algorithm behind it involves calculating the difference between the before and after images the quality of multiv view is higher when closer to the frontal view the side view is not good I also included the stable 0123 process for comparison its viewing angle control is more flexible but the resolution is only 256 it is generally only used as a reference for IP adapter image 2 image or image 2 video process if they are to be used to generate models High defition restoration is also required stable 0 123 has a poor understanding of the head structure after comfy UI support for S sp3d is upgraded stable 0123 should be eliminated there are several details to pay attention to in the workflow first s sp3d or stable 0123 requires the reference image to have a solid color back background so here's a process for adding a white background to an image with an alpha Channel another advantage of adding a white background is that it will be more accurate when removing the background for the generated multiv view masking is required for gaussian splatting calculation which can limit the calculation to the subject range without including the surrounding environment note that adding background and removing background is not a necessary pre-processing process the mask can be pure white indicating that all content needs to be reconstructed in 3D this generate orbital camera poses node can quickly generate a pose list based on azoth animation for each picture based on the number of frames of the input image you can specify the frame range orbit radius elevation angle start and end of the azth angle to the reference image sequence however it does not support camera pose lists for generating orbit radius and elevation animations but I believe this node should be updated s sp3d currently only supports generation with an elevation angle of zero so there is no need to animate it unfortunately no orbit radius parameter is available for stable 0123 I used a stack orbital camera pose node which can animate many parameters and directly export the camera pose list the load image from batch node I used has a bug when reading the image masks when reading all image masks the first one will be missed an additional node is needed to read the mask of the first image separately and then merge with The Masks of the remaining images if the image sequence contains elevation animation you cannot use this node to generate the camera poses list you should make the list manually and convert it to the orbit camera pose data format when using stable 0123 the generated multiviews are are stored as an image list and should be converted into an image batch before they can be used when using SV 3D the generated multiviews are stored in an image batch but the number is limited to 20 and the 21st image is ignored the frame range should be set to 0 to 19 in the generate orbit camera pose node this 3D pack custom node also provides another 3D reconstruction process first read The multiv View and use the radiance field to generate Point clouds and perform gaussian splatting then use dmet which is deep marching tetrahedrons to convert the gaussian splatting results into a mesh model in contrast Nerf or neural Radiance Fields takes time to render the results through the neural network finally the multiv view is resampled and the texture is baked onto the mesh model just generated this is also the 3D reconstruction process officially recommended by stability AI but because I didn't have enough VR to run it so I had to give up one limitation of this 3D Reconstruction from multi-image workflow is that you need to input the poses of the current camera including orbit radius elevation angle and azimeth angle therefore this process is more suitable for multiview images generated by AI because these data are set when the images are generated if you want to try to use the captured footage you need to find a way to get the camera poses of each shooting location either by utilizing a camera array the poses of each camera are trackable or use video to estimate the camera's motion structure sfm however this becomes the camera tracking process in traditional 3D production and requires thirdparty tools currently I have not found any custom nodes with this function for comfy UI however I think someone will complete it soon because the colmap sfm algorithm is open- Source but there is little need to do this after all this is a very mature tool and process and the effect will be better than that achieved in comfy UI and this is not our original intention of using comfy UI for local deployment what's more from now on we already have the 3D Reconstruction from a single image workflow based on CRM plus comfy UI plus UE plus texture projector in addition if you are interested you can visit the s sp3d homepage it was compared with three other models and It produced the best results an example of 3D reconstruction is also included here you can open the video to see the process many commercial applications for 3D reconstruction have adopted the gaussian splatting method such as polycam Luma Ai and reality capture 3D presso is also one of them I want to know if running the 3D Reconstruction from multi-image locally can achieve this effect if you are interested in 3D reconstruction check out the links in the description to compare photogrammetry neural Radiance fields and gaussian splatting thank you all for watching
Info
Channel: Kefu Chai
Views: 13,179
Rating: undefined out of 5
Keywords:
Id: Y6-JGi_ksos
Channel Id: undefined
Length: 45min 9sec (2709 seconds)
Published: Mon Mar 25 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.