3D Gaussian Splatting - Explained!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Gaussian splatting y'all 3D scanning and rendering is moving so fast I got my Splats up and running and I am still mind blown getting 100 frames per second for this complex 3D scene so you probably heard about photogrammetry and NeRFs how is this stuff different why should you care let's explore five reasons why you should care and then let's get into what's actually happening under the road cuz this is a really really cool Tech number one this is way faster than Nerfs for comparison a Neural Radiance Field for this exact scene would take around 10 seconds per frame instead I am zipping around with FPV controls at 100 frames per second without breaking a sweat though I do crash a few times towards the end there number two this is old meets new gge and splatting is super cool in that it fuses classical computer graphics and modern deep learning techniques so like NeRFs this is still a Radiance Field just without that slower neural rendering part which brings me to number three unlike NeRFs you have an explicit representation here so you represent the 3D scene as a collection of these ellipsoidal Splats called gaussians the beauty of this technique is you can do rendering in real time with a couple of basic computer graphics operations and unlike photogrammetry which is super fast to render you still get all those photorealistic effects because gaussian splatting uses spherical harmonics to represent all the view-dependent effects and lighting so surfaces change color when you view them from different angles giving you all those photorealistic effects the glint the veneer the specularities without needing a neural network and while it doesn't use a neural network the training loop is very similar to deep learning finally number five and perhaps the most exciting gaussian splatting enables direct editing so it's not just speed with gaussian splatting you get 3D editing support so you can select move delete stuff even relight stuff this type of editing has been so much more tedious to do with NeRFs because they've got this implicit Black Box represent presentation and oh my God NeRFs are just like 2 years old here's the new Talk of the Town so you've got speed you've got editability so it's this like weird Middle Ground between NeRFs and classical photogrammetry is super super exciting for creators now if you want a visual comparison to NeRFs here's the same data set about 334 photos captured with a Sony a7s camera processed with Luma Ai and instant NGP honestly gaussian splatting looks the best to me and the fact that it is real time it just makes it a no-brainer and to think that NeRFs are only 2 years old it is wild how quickly things move all right so let's get into what's actually happening under the hood all right so check this out this is a gorgeous 3D scene in New Delhi running at 180 frames per second absolutely mind-blowing but how does this magic work exactly so it actually starts from a point Cloud you probably seen these if you use any photogrammetry tool the best part is you don't need a particularly dense Point Cloud the sparse one that you get right after aligning or posing your images works great so all you need to do is use structure for motion whether you use COLMAP reality capture agisoft meta shape Bentley context capture really doesn't matter as long as you have posed imagery and a sparse Point Cloud you're off to the races with gaussian splatting so it's this point Cloud that surf as the foundation to initialize essentially a 3D gaussian splat centered at each of these points then through iterative optimization these gaussians are adjusted to match the original images that you captured this is where they're still using gradient descent so while you may not get a neural network the training Loop is very similar to machine learning now you obviously can't have an even distribution of gaussians to model the complexity of reality so they solve this by having the density of gaussians adjusted so you have more gaussians where you need to model the detail for example the trees in the scene and less where you don't for example the sky and then the view dependent lighting is modeled using spherical harmonics you only optimize the base color at first and slowly add higher frequency bands over time so here's the process in a bit more detail you capture the input imagery photograph the real scene from multiple viewpoints basically like you would for photogrammetry then you reconstruct the cameras with structure-from-motion, essentially finding common structure across all the images to figure out the relative 3D position of where the photos were taken but by virtue of doing this you also end up with this sparse 3D Point Cloud so then you take that sparse Point cloud and create a 3D gaussian splat centered at each of the points in this SfM point cloud finally the optimization process begins you start rendering gajin from the input camera views and comparing the virtual view to the original View and then use gradients to optimize the parameters of the gaussians right the position the size the orientation to match the photos over time and since you can't have an even distribution of gaussians to model the complexity of reality you have adaptive density control essentially adding removing and splitting gaussians during this optimization process to increase the detail where it's needed for example the trees and remove the detail where where it's not for example the sky then you start representing color with spherical harmonics modeling all these view dependent lighting effects optimizing the base color first and slowly adding higher frequency bands over time and look if you want to spare even more overhead you can actually turn off these spherical harmonics and you lose some of the view dependent effects but oh my God it still models reality way way better than photogrammetry you don't end up with these like wonky looking broccoli trees you can actually see the Lush detail of this tree come to life and that still blows my mind rendering is the coolest part you can just optimize the 3D's gaussians into an image plane and Alpha blend so instead of rasterizing a bunch of triangles as you might classically you're just rasterizing a bunch of these gaussians you project these optimized 3D gaussians onto an image plane and Alpha blend them together and modern GPU can just rip through these computer Graphics operations which is why you get this amazing 100 frames per second versus 10 seconds per frame oh my God all right so with all these steps completed you get photorealistic novel view synthesis that can be rendered interactively at insane frames per second so compared to Nerfs this sort of unstructured gaussian representation gives us simplicity speed and quality so the ubiquity the ability to distribute these assets and render them on Modern Computing systems is absolutely wild I don't need to burn GPU time just to render out a flythrough or deal with some of these awkward hybrid representations where you kind of have a mesh but there's a neural network working under the hood for the view-dependent effects you don't need to worry about all of that so it's kind of cool you got this like OG physics concept of spherical harmonics providing the view dependent effects for realism meanwhile you're using gradient-based optimization of the gaussian parameters to fit them to match the real world appearance a nice combination of computer graphics and ML techniques Call It The Best of Both Worlds so suddenly you don't have these heavyweight meshes you can render them on device on mobile you can render them in a browser you can render them obviously inside a game engine and so you've got all these possibilities for interactive editing and previewing that you just couldn't do with Nerfs like I cannot overemphasize the benefit of having this editable discret scene representation rather than having this sort of implicit Black Box representation with Nerfs so this paper is crazy because basically if you like train it for let's just say 7,000 iterations it's still better than instant NGP and if you go to 30,000 it rivals Mip-Nerf and all the other higher quality approaches for Neural Radiance Fields now there are limitations which we'll get to in the future where certain types of complex lighting and transparency scenarios are still a little bit difficult this is why it's such a promising technique that could transform 3D capture and creation forever and why folks in computer graphics and vision cannot shut up enough about it gosh it's only been a couple weeks and already we've got a viewer for Unity we've got unreal we got folks working on a native iOS metal viewer there's a webGL web viewer as well so that's just to start I've got a lot more test cooking I've got another video in the works about bringing these data sets into tools like unity and unreal and all the cool stuff that you can do there but for now hopefully this gives you some intuition for what's actually happening under the hood and why you should care about gaussian splatting two interesting resources i' point to if you want to get into more of the technical side of all of this stuff definitely check out this video absolutely amazing and if you want to have easy step-by-step instructions here's another blog post you should absolutely check out anyway that's it for this video I want to keep it brief if you enjoyed it please be sure to drop a like and a comment on what you'd like to see cover next any questions you have as well I'd be happy to answer them and I will see y'all in the next one

Info

Channel: Creative Tech Digest

Views: 40,668

Rating: undefined out of 5

Keywords: gaussian splatting, 3d graphics, 3d rendering, photorealism, real-time rendering, neural graphics, novel view synthesis, computer vision, point cloud, photogrammetry, radiance fields, nerf, gpu, graphics card, ray tracing, vr, augmented reality, ar, virtual reality, game development, unity, unreal engine, 3d modeling, 3d capture, 3d scanning, neural networks, deep learning, machine learning, ai, computer graphics, future of 3d, ai for 3d, vfx, geospatial

Id: sQcrZHvrEnU

Channel Id: undefined

Length: 8min 28sec (508 seconds)

Published: Sun Nov 05 2023