3D Gaussian Splatting - Explained!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Gaussian splatting y'all 3D scanning and rendering  is moving so fast I got my Splats up and running   and I am still mind blown getting 100 frames per  second for this complex 3D scene so you probably   heard about photogrammetry and NeRFs how is this  stuff different why should you care let's explore   five reasons why you should care and then let's  get into what's actually happening under the road   cuz this is a really really cool Tech number one  this is way faster than Nerfs for comparison a   Neural Radiance Field for this exact scene would  take around 10 seconds per frame instead I am   zipping around with FPV controls at 100 frames  per second without breaking a sweat though I do   crash a few times towards the end there number two  this is old meets new gge and splatting is super   cool in that it fuses classical computer graphics  and modern deep learning techniques so like NeRFs   this is still a Radiance Field just without that  slower neural rendering part which brings me to   number three unlike NeRFs you have an explicit  representation here so you represent the 3D   scene as a collection of these ellipsoidal Splats  called gaussians the beauty of this technique is   you can do rendering in real time with a couple  of basic computer graphics operations and unlike   photogrammetry which is super fast to render you  still get all those photorealistic effects because   gaussian splatting uses spherical harmonics to  represent all the view-dependent effects and   lighting so surfaces change color when you view  them from different angles giving you all those   photorealistic effects the glint the veneer the  specularities without needing a neural network   and while it doesn't use a neural network the  training loop is very similar to deep learning   finally number five and perhaps the most exciting  gaussian splatting enables direct editing so it's   not just speed with gaussian splatting you get  3D editing support so you can select move delete   stuff even relight stuff this type of editing has  been so much more tedious to do with NeRFs because   they've got this implicit Black Box represent  presentation and oh my God NeRFs are just like   2 years old here's the new Talk of the Town so  you've got speed you've got editability so it's   this like weird Middle Ground between NeRFs and  classical photogrammetry is super super exciting   for creators now if you want a visual comparison  to NeRFs here's the same data set about 334   photos captured with a Sony a7s camera processed  with Luma Ai and instant NGP honestly gaussian   splatting looks the best to me and the fact that  it is real time it just makes it a no-brainer and   to think that NeRFs are only 2 years old it is  wild how quickly things move all right so let's   get into what's actually happening under the hood  all right so check this out this is a gorgeous 3D   scene in New Delhi running at 180 frames per  second absolutely mind-blowing but how does   this magic work exactly so it actually starts  from a point Cloud you probably seen these if   you use any photogrammetry tool the best part is  you don't need a particularly dense Point Cloud   the sparse one that you get right after aligning  or posing your images works great so all you need   to do is use structure for motion whether you use  COLMAP reality capture agisoft meta shape Bentley   context capture really doesn't matter as long as  you have posed imagery and a sparse Point Cloud   you're off to the races with gaussian splatting so  it's this point Cloud that surf as the foundation   to initialize essentially a 3D gaussian splat  centered at each of these points then through   iterative optimization these gaussians are  adjusted to match the original images that   you captured this is where they're still using  gradient descent so while you may not get a   neural network the training Loop is very similar  to machine learning now you obviously can't have   an even distribution of gaussians to model the  complexity of reality so they solve this by having   the density of gaussians adjusted so you have more  gaussians where you need to model the detail for   example the trees in the scene and less where  you don't for example the sky and then the view   dependent lighting is modeled using spherical  harmonics you only optimize the base color at   first and slowly add higher frequency bands over  time so here's the process in a bit more detail   you capture the input imagery photograph the real  scene from multiple viewpoints basically like you   would for photogrammetry then you reconstruct the  cameras with structure-from-motion, essentially   finding common structure across all the images  to figure out the relative 3D position of where   the photos were taken but by virtue of doing this  you also end up with this sparse 3D Point Cloud so   then you take that sparse Point cloud and create  a 3D gaussian splat centered at each of the points   in this SfM point cloud finally the optimization  process begins you start rendering gajin from   the input camera views and comparing the virtual  view to the original View and then use gradients   to optimize the parameters of the gaussians  right the position the size the orientation   to match the photos over time and since you can't  have an even distribution of gaussians to model   the complexity of reality you have adaptive  density control essentially adding removing   and splitting gaussians during this optimization  process to increase the detail where it's needed   for example the trees and remove the detail where  where it's not for example the sky then you start   representing color with spherical harmonics  modeling all these view dependent lighting   effects optimizing the base color first and slowly  adding higher frequency bands over time and look   if you want to spare even more overhead you can  actually turn off these spherical harmonics and   you lose some of the view dependent effects  but oh my God it still models reality way way   better than photogrammetry you don't end up with  these like wonky looking broccoli trees you can   actually see the Lush detail of this tree come  to life and that still blows my mind rendering   is the coolest part you can just optimize the 3D's  gaussians into an image plane and Alpha blend so   instead of rasterizing a bunch of triangles as you  might classically you're just rasterizing a bunch   of these gaussians you project these optimized 3D  gaussians onto an image plane and Alpha blend them   together and modern GPU can just rip through these  computer Graphics operations which is why you   get this amazing 100 frames per second versus 10  seconds per frame oh my God all right so with all   these steps completed you get photorealistic novel  view synthesis that can be rendered interactively   at insane frames per second so compared to Nerfs  this sort of unstructured gaussian representation   gives us simplicity speed and quality so the  ubiquity the ability to distribute these assets   and render them on Modern Computing systems is  absolutely wild I don't need to burn GPU time   just to render out a flythrough or deal with some  of these awkward hybrid representations where you   kind of have a mesh but there's a neural network  working under the hood for the view-dependent   effects you don't need to worry about all of that  so it's kind of cool you got this like OG physics   concept of spherical harmonics providing the  view dependent effects for realism meanwhile   you're using gradient-based optimization of  the gaussian parameters to fit them to match   the real world appearance a nice combination of  computer graphics and ML techniques Call It The   Best of Both Worlds so suddenly you don't have  these heavyweight meshes you can render them on   device on mobile you can render them in a browser  you can render them obviously inside a game engine   and so you've got all these possibilities for  interactive editing and previewing that you just   couldn't do with Nerfs like I cannot overemphasize  the benefit of having this editable discret scene   representation rather than having this sort of  implicit Black Box representation with Nerfs   so this paper is crazy because basically if you  like train it for let's just say 7,000 iterations   it's still better than instant NGP and if you go  to 30,000 it rivals Mip-Nerf and all the other   higher quality approaches for Neural Radiance  Fields now there are limitations which we'll   get to in the future where certain types of  complex lighting and transparency scenarios   are still a little bit difficult this is why it's  such a promising technique that could transform   3D capture and creation forever and why folks  in computer graphics and vision cannot shut up   enough about it gosh it's only been a couple  weeks and already we've got a viewer for Unity   we've got unreal we got folks working on a native  iOS metal viewer there's a webGL web viewer as   well so that's just to start I've got a lot more  test cooking I've got another video in the works   about bringing these data sets into tools like  unity and unreal and all the cool stuff that you   can do there but for now hopefully this gives you  some intuition for what's actually happening under   the hood and why you should care about gaussian  splatting two interesting resources i' point to   if you want to get into more of the technical  side of all of this stuff definitely check out   this video absolutely amazing and if you want to  have easy step-by-step instructions here's another   blog post you should absolutely check out anyway  that's it for this video I want to keep it brief   if you enjoyed it please be sure to drop a like  and a comment on what you'd like to see cover next   any questions you have as well I'd be happy to  answer them and I will see y'all in the next one
Info
Channel: Creative Tech Digest
Views: 40,668
Rating: undefined out of 5
Keywords: gaussian splatting, 3d graphics, 3d rendering, photorealism, real-time rendering, neural graphics, novel view synthesis, computer vision, point cloud, photogrammetry, radiance fields, nerf, gpu, graphics card, ray tracing, vr, augmented reality, ar, virtual reality, game development, unity, unreal engine, 3d modeling, 3d capture, 3d scanning, neural networks, deep learning, machine learning, ai, computer graphics, future of 3d, ai for 3d, vfx, geospatial
Id: sQcrZHvrEnU
Channel Id: undefined
Length: 8min 28sec (508 seconds)
Published: Sun Nov 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.