Open3D: A Modern Open-Source Library for 3D Data Processing

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so open 3d like no doing 3d processing besides hardware we also need software so open 3d is exactly for this so let's get into that okay so for the agenda today we first are going to give an overview introduction overview of open 3d now we're gonna see how open 3d can be set up and like go through the data structures the open 3d has then we will showcase an application actually combining open to deal with real sense for scene reconstruction application let me go to the facial work we're going to discuss what will be the next exciting thing that will be available in open 3d okay so yeah so open to the is a library that's meant for general purpose 3d processing it implements the most fundamental data types and algorithms for 3d processing and it can be you know plug into any applications you like like visualization machine learning and others like robotics and we want to make open to the extremely easy and efficient views that's our main mission okay so with that let's roll a quick introduction video so some of you might already see in the video in the in the demo [Music] [Applause] yeah so this shows how we clone the source oval 3d and we have both Python and superclass API available so you can use it very conveniently it's also cross-platform the next one shows the basic manipulations of data structures such as iOS and visualization these are point cloud processing what we see is them normals oh sorry okay let's let's jump to the yeah the next thing we see is geometry editing here's the reckons with registration pipelines where we align to point house we're going to go through more details of that we also support interactive registration which is manually picking the correspondent points and here's some customer Association we can have a script that like kind of loop over the scene and see you know visualizing from different angles the color map optimization is used for optimizing a color map after the geometry has been reconstructed and finally it's a it's a scene it's a bedroom scene reconstruction that we built for a demo [Music] okay great yeah to get open 3d is very easy we we are available on github we are under MIT license we support Python and super fast as we mentioned it's cross platform and we also have you know Conda and pip that you can download them up until I started as a research project at Intel apps like this was the white paper that was published last year and since the release open 30s gained tremendous momentum in both industry and academic so we have now 1.5 k github stars and it's still growing at a at a high rate so to get started it's pretty simple you can just do the you know the conventional a pip and Conda one line code to get open 3d it works in most platforms you can also build open to different source by cloning the repository you know see made config and make in solve in 3d so this command works you know it builds open 3d and also like include and also install the Python package inside the virtual environment next week we're going to go through the general architecture of open 3d we can think of it as many as several as you know several layers well in the very bottom layer is the foundation of everything that includes the data structures that we implement and the utility that operates on the data structures 2d data structures as Sergey already shown as like point cloud is an important type of three data structure we have point cloud triangle mesh voxels and others right with that we need utilities we need to write it into a file read into the file read from the file we need visualization we need like a custom visualizer you know that can be programmed programmable building on top of that are the algorithms right we've seen in a circus demo like odometry registration vol volume integration these are all the you know foundation like a scene a food scene reconstruction pipeline and we have all these like algorithms implemented so that you can now go to the top layer which are the applications so as we see we can do like seeing reconstruction we can do semantic segmentation a lot of stuff that you can build on top of the provided algorithms and data structures so what open 3d do is our community is trying to do is we want to build the tool that people can use to accelerate facilitate their 3d software pipeline hey so let's briefly go over the basic part which is the 3d data structures right as most of us already know here is like here the the most fundamental data structures the first one is point cloud point cloud is just a collection of points right and each point we can attach property to it right so each point have XYZ coordinates to start with and you can contain like color and normal or in other properties when we go from point cloud to triangle mesh we're adding triangle indexes from the point cloud for example in this point cloud we can say point zero point one point two forms a triangle so a triangle mesh is a point cloud plus the triangle index it on the on the right hand side here is the line set its color in red line set is a collection of edges or lines what we see in this slide is the convex how that contains the the bunny mesh okay besides this we also have a voxel grid and octree supporting open 30 dogs are quite different from point cloud a voxel grid is aligned in a in a regular pattern is axis aligned and you know every block every voxel cell is also you know space in a regular pattern for octree we can think of it as a you know index a binary search version of the oh great so we can do like efficient manipulations of also Greece for example of caring appoints in a finding nearest neighbors of a voxel efficiently in archery hey so these are the 3d data structures implemented in open 3d and to support 3d applications we also have the image class that includes the RGB image class and the death image class in open 3d so we're gonna see in the next section that we're going to use all these data structures and build algorithms and applications on top okay next let's let's go to the applications and algorithm part in this in this section we can we're going to show a pipeline a full reconstruction pipeline using open 3d + realsense camera at the camera that we use is the real sense of 415 which is the narrower more like a more higher precision camera the job for the real sense is to take RGB image and depth image and the job of open 3d is to reconstruct the scene so we do a you know a toy example here just for illustration but open 3d has been like we have used it in larger scale scenes as we will see in the in the coming videos yep so so to start all the audit code and documentation that we going to describe today the reckon reconstruction pipeline are available online you can you can go to the documentation page or we can you know just get the code from github page for the reconstruction everything is open sourced and you can like you can have very flexible you can change how you want to like you know tune parameters and change things if you like ok so there are four main steps first it's capturing a second one is make fragments which is which is reconstruction of a local patch of local scene patch right the third one is register effect means which means I'm gonna you know combine a line two patches and merge them together and finally it's the integration of scenes so at this stage we can produce a mesh right so I think there's a question about like mash processing so at integration we defined our output will be a mesh Hey okay let's go step by step the first step is capturing of our chibita image we we we just handheld the camera and then perform you know scan the the scene the goal is to capture like the scene from different angles so that we get the full reconstruction and the make fragments we subdivide the scene the we subdivide the image sequence into sub sequences so in this example we subdivide them into two sequences in sequence 0 with produce a fragment we call it fragment 0 sequin 1 we provide fragment 1 as we can see here the the fragment 0 is missing the lower part the fragment 1 is missing the monitor part so since they since in the image sub sequence we only see a subset of the scene so the fragments only represent like a local patch of the scene okay to do the to do the reconstruction of a local patch we need to estimate the camera poses we need to do the autumn odometry and you know and integrate the multiple RGB D frames so what we see here is the camera post estimation if we look at this scene for example the red line represents the camera center and the blue box shooting out is the camera like look location look direction in like in in in practice we usually set for example we can set the coordinates as a convention of the first camera post right let's say we set it to the origin and then optometry is performed among the frames so we provide very flexible ways to define which frames do you want to perform optometry with by default we have two ways a one is a a frame with the next frame like sequentially so the first frame we will perform a dhama trip with a second frame that means estimating the relative post change relative transformation between the first marine and second frame we also have like key frames alignments for example every five frames we do a odometry right the frame one will do automata with frame six and then 11 and we then formalized that odometry into a optimization problem so that it like that is equivalent to solving the the loop closure problem so this one is more like general form of loop closure so we don't actually require the camera to like go back to original location but if we follow if we if we formalize the camera poses as a post graph we can just apply the post graph such that you know the odometry match the Global post so that's also we can think of it as a more general form of the closer so the next step after we make the fragment is we're gonna perform of registration there are many restoration methods in the pipeline we use a like color pointer registration fast global restoration we in fact we use a combination of them they are good and bad for like a different reason and and suitable for different scenes and we tried in our pipeline to make them as soon as possible for example in the restoration pipeline we do a multi scale restoration which is down sampling the point cloud into voxels using different blocks of sizes so we and then we align those voxels down sampled point cloud in the corresponding size and then combine the result together so we aim to make this a very generic and can be applied in many different scenes so the final step is that we integrate the C at this point we will use at the tsdf volume and then run a marching cube algorithm to extract the mesh out of the integrated tsdf volume the tsdf volume is responsible for integrating multiple RGB frames so we can think of TS TFS as a Nagar average of the RGB image for example you have an RGB image and then you have some value casted in the space and they have another RGB image some other value and then the tsdf will store those values in in a grid pattern and then we run marching cube to extract the mash from that okay so here are the the two frame are the two fragments combined we can see that the like the scanning the camera pose it's connected like the from the first from the first fragment to the second fragment okay yeah and then in the final scene we see that the lower part we can now is visible now and also the monitor the front of the monitor is complete so we combine the information of the two two fragments yeah so this is the is a just like a summary of what we discuss so first we do the realsense capturing and here is we use a camera to scan the the scene [Music] and then we run the registration pipeline a reconstruction pipeline the reconstruction pipeline here includes the post grab as optimization and and integration so include all the step and then we can have some visualization the first one is the registration of the point cloud next is the camera post visualization yeah and here's the final result okay okay yeah so now it's time for to discuss the future work so we actually have quite a few exciting new stuff that we're going to add Tobin 3d and some of them is already in like the beta format so the very first thing we want to make open 3d these 3d processing library so we're going to continue improving you can continue adding more data structures in open 3d we're going to add more functionalities and we're going to clean up the API such as a consistent and easy for the user to use right the second big one is integrating with deep learning engines so in this we we've done experiments with using open 3d to implement a custom tensor flop so a tensor fo operation calls into open 3d to do certain operations you can also for example computer gradients with open 3d it is embedded intensive lob we've done a semantic segmentation example of that we also have integration with PI touch we did it in a Jupiter notebook demo then we will see so so the reason why we like like this direction is now the deep learning applications a lot of them are 3d paste so like being able to like develop the pipeline easily and efficiently it's important so we provide the integration with deep learning engines so here it comes the demo the first one is a tribute or integration with high touch so what we're seeing here is a point net classification for punk outs right so here's the airplane and on top we can see ok this is the airplane so it's like okay it's hundred-percent everything it's the prediction of the airplane and the the middle cell there is the open 3ds of WebGL integration so opens early can be used inside a browser ok next we're going to show the the lidar semantic segmentation with the semantic 3d data set so here here is the the RTP point cloud and here is the semantic segmentation the blue represents ground the red represent the building yellow is the car next we can use that pre-training model and run inference on the key data set kitty-kitty is a self-driving car data set we are able to achieve like near-real-time is like more than 10 FPS with open 3d acceleration okay so yeah so these are the deep learning engines integration demos yeah and next we're going to the next I'm gonna highlight is how is the integration with sparse confidence so as in 3d data a lot of like in a lot of scenarios the the three data's are very sparse right as we see in the box oh great you know the valid data might only can may only be inside you know small local subset of locations so it's important that sparse computation can be used such that we don't waste computation and in that in that way so in the sparse confident work like one overlap member came up with this minkowski's engine that operates on top of sparse 3d data so the demo here we see is is like sparse conversion in 2d data so this can be extended into 3d in 3d space so this method is one of the leading method in one of the semantic segmentation benchmark right now right so with that the next thing we wanna highlight is the GPU integration as we discussed in the previous presentation it's sometimes it's nice to offload some heavy computation into accelerator GPU is one of those accelerator and we have done preliminary work with the community with our collaborators on the GPU accelerated open 3d so here's the one of the very recent results that we see we see significant speed improvements with a GPU acceleration for example in LGBT odometry it's like you know two orders of magnitude faster the reason for this is in odometry we compute a correspondence of the pixels it's like a pixel wise correspondent of two images so if we use CPU even if you use OpenMP to paralyze it's still not like you know it doesn't it doesn't help a lot right but with GPU is very natural so with that you can get like like a high speed improvement for the tsdf integration so we've worked on like a GPU based hash voxel hashing algorithm so it's also like significantly improved in terms of speed and in real-world tests for a large scene the overall pipeline can speed up like about ten times yeah so let's go see some videos of the GPU accelerated so here the standard registration demo point-cloud ICP registration and then this is the implementation of the fast global registration algorithm and then we also have oops sorry yeah we also have we also have real-time integration that's a in this yeah so yeah so this one is the real-time integration and accelerated in GPU so we don't need to wait we can only need to do like an offline processing we can just use the automata result and it will reconstruct the scene in this mode in the in the offline mode we can do post graph who can do like post estimation of life but in real time reconstruction we have to rely on the odometry results so it might be inaccurate if we move a long distance but in the offline mode we can do post estimate like post graph optimization to get that look close your thing okay so these are the result from the GPU so for this work for this bedroom scene using traditional like GPS CPU base open 30 it takes 10 hours to reconstruct a ho C for GPU accelerated version it can we bring the tongue down to 1.5 hours okay yeah so these works are mostly you know work in progress we have a beta branch available in public for people to try but we're working hard to bring them into the main branch of open 30 ok ok yeah here's a brief summary so open 30 is a modern efficient and easy to use 3d processing library it covers most fundamental 3d data structures and algorithms you can view amazing applications on top of open 3d especially with real sense and open to its growing fast we do we have regular updates we have an active community and a lot of new features are being added ok here at the team so most of us are from the lab and we are also like very happy to have like more than 30 community contributors very actively working on the project alright with that thank you yeah I can take any questions thank you very much oh no no in that in that to demo I think the point cloud the Jupiter one is synthetic is from some CAD model the the scene segmentation is from real real scanning like one of them is like from cars modem is for my class can be expressed with like with semantic segmentation with point cloud produced by your face ah I actually haven't tried that like it should it should work in principle right recommendation on the type of algorithm like compact view over this way man yeah yeah so yeah so there are many algorithms to do semantic segmentation one of them I think that's promising is the one that it's in the real sense produce yeah so yeah there is there is some like similar to image net data set like I think point net they did their group the group that do the point net they have they have point cloud data set similarly yeah I don't think they are using real sense for the data capturing right that - yeah I'm not aware of such data set yet captured by real sense yeah yeah at the moment we we very much aware that this is a gap that we are working to find ways to complete yeah I imagine that it's 1 million images [Music] and second question if I will steal my own 3d scanner yes I use a real sense what floor would you suggest should I move the camera should I move the object the one that I try is moving the camera because I think that's you know there was something that we make here is that the object is stationary and then we estimate a camera post well I guess you can only do the reverse yeah but that one that we implement it is you tricking my camera position while you're moving if you move an object in camera stationary yeah that should be possible that's like the reverse problem but in most 3d reconstruction the scene is stationary right for example you need to scan a room it's not practical to rotate a room yeah right it's more natural do it in this way moving the camera it's great Rock great talks um I was pleased to see the coup de coeur integration are you working all with the NVIDIA mysic team about integrating with the robotics Prima um not that I'm aware of so the CUDA work we've done is through collaboration with CMU so one of our lab members collaborate with researchers at CMU and yeah we're bringing that to the main branch open 3d so yeah so yeah I mean that could be a good suggestion like to work with you know collaborate with the robotic parts of immediate yes okay and you had a question okay thanks [Applause]
Info
Channel: Intel RealSense
Views: 33,336
Rating: 4.9797297 out of 5
Keywords: Open3D, LiDAR semantic segmentation, lidar, PointNet++, 3D scene capturing, Scene reconstruction, Intel
Id: Rsh4poEpahI
Channel Id: undefined
Length: 29min 38sec (1778 seconds)
Published: Mon Jul 01 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.