droidcon SF 2018 - The JPEG of 3D: Bringing 3D scenes and objects into your 2D Android app with glTF

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

thanks for coming out I think this is the last session today right okay so great thanks for for joining us for droidcon and I hope you guys have had a good time so far so we're gonna talk about 3d scene rendering and it's good that you guys are here because this is kind of like the forefront of like we're computing is going if you've kind of paid attention we're kind of moving towards an era where we're gonna be doing more in VR and then eventually in AR I think everyone agrees that AR is is pretty has a pretty good chance of being the next big computing platform so we're gonna learn about 3d rendering and we're gonna learn about it at a deeper a little bit deeper of a level than if you use some high-level API is like scene form which is if you guys at home and GE worked on which kind of just lets you load in a 3d file and then render it so I'm gonna talk a little bit more at the nitty-gritty level and then I'll also give you guys some information about the 3d post feature on Facebook that uses the gltf format which I have a t-shirt of so you know I'm serious I happen to be the face of VR no big deal yeah that's me just want to throw that out there so we're gonna start by talking about 3d post on Facebook and then we're gonna go into the details of the gltf format which we like to call the JPEG of 3 Heat and then I'm gonna talk to you a little bit about some improvements that we're making in 3d posts so 3d posts kind of started like this we launched it giving the ability to interact with 3d objects which are rendered directly in real time into your Facebook for Android or Facebook on the web or on iOS as well so this was a partner Lego that launched with this brand video of them our brand 3d model of their parrot pretty cool but actually we started trying to support this other use case of we have these VR creation tools which allow you to create 3d models sculpt 3d models in VR so this is oculus medium and we wanted to support sharing from VR directly to news feed so that we can extend the reach of creators so this is an example of what that looks like so you'd have someone sculpting out our 3d models directly in in VR and then sharing it to Facebook with a click of a button and that will just create a gltf file which they can then share a newsfeed so that was first party integration but really this year I think in February we allowed this we supported dragging and dropping a binary gltf file JLB extension directly into Composer essentially making 3d a first-class citizen on Facebook akin to photos and videos so just as easily as you can drag and drop a photo or a video into Composer you can drag and drop a 3d file in gltf format and then you know change the background play around with the you know what the initial view of the object and then there you go you can interact with it in newsfeed that's great so what does the power what this allows us to do is we empower these artists to be able to share their 3d art assets from games so for example this is an asset from the game brass tactics and you can see it rendered in Facebook for Android there and then a recent feature that we launched was 3d photos so 3d photos kind of just sits on top of the 3d post architecture behind the scenes this is using gltf 3d meshes some of you may have seen this already I think we launched us maybe a few weeks ago so what you can see is basically it seems like a static image but then you can tilt around and it gives a you know a little parallax effect the way this works is that we have the RGB color value and then the depth value that comes out of a camera like the iPhone camera that has the two two separate cameras to get the depth disparity and then we can generate a depth map from that we can use the depth map from that to generate a mesh so basically we'll have our background mesh and then our foreground mesh and then we can apply the parallax by just moving the the scene around so this is pretty cool so let's get into the actual format itself gltf and why we call it the JPEG of 3d so back in the day John Carmack who's the CTO of oculus called for the creation of JPG of 3d to build a Metaverse what he was asking for is we didn't really have a 3d transmission format so we had formats that were used as it used as interchange within you know different authoring pipelines locally on on the artists machine or but we didn't have anything for delivery over the web in an efficient way so if we look at the history of transmission formats on the web you know for audio we always kind of had a big player that pushed a format for forward so for audio we had an app store that pushed mp3 forward for video we had YouTube that popularized h.264 video encoding and then of course for for JPEG Facebook post push that forward heavily and then we're hoping that for 3d that our adoption of gltf will help push the the ecosystem forward and get more players to adopt the gltf 3d format so why do we need a new 3d format so if you look at some of the format's that exist already FBX is a very popular format created by autodesk a lot of artists use FBX but it happens to be non-standard so it's not easy to validate which leads to inconsistent implementations and some ambiguity and it also doesn't support extensibility so we can't just add things that we need when they aren't there and it doesn't support newer features that we want in you know in a 3d scene format like physically based rendering which I'm going to talk about a little bit later obj was you know also very popular but it's like a plain text format it only describes objects but we need to describe full scenes with you know animation and lighting and in transforms and so on and so forth Dae was actually or colada was a format that the Khronos group created to try and solve these problems but they created as an interchange format so they used XML it wasn't that efficient to load and it ended up being you know bigger files so it didn't really solve the problem of being an efficient transmission format so the properties of gltf are that it's compact to transmit so it has a JSON description of the scene graph and then it has a big binary blob essentially where it keeps all the mesh day and all the animation data and anything else you need to render the textures and so on and so forth it's fast to load because we can just take that binary data and do a data copy and then upload it to the GPU instead of having to parse all the data and then you know create buffers in memory and then upload those buffers to the GPU it describes full scenes so we have complex lighting we have the ability to do animations and and joint based animations and and blend shapes and and natural lighting techniques like physically based rendering it's also runtime neutral so actually the first version of gltf gltf 1.0 wasn't runtime neutral is kind of designed for OpenGL WebGL and OpenGL ES but the the 2.0 version was designed to be platform agnostic so you can render it with you know DirectX you can render it with OpenGL you can render it with Vulcan or metal it really describes things at a high level and it's up to the renderer to interpret that and it's very extensible so they encourage the creation of extensions for for vendors so you know a popular example of this is you know Microsoft has made a bunch of exempt extensions for levels of detail and google has also made a very popular extension for mesh compression that I'm going to talk about a little bit later so now we have like a pretty big ecosystem where we have the creation tools like blender and Sketchup and substance painter and oculus medium of course that allow exporting gltf files and we also have converters so we open sourced a converter from FBX to gltf on github which allows people to work within their existing pipeline but then still be able to work with gltf and then we have the you know the content sites like sketchfab and google Poly that provide gltf assets and then we have the app in engine integration with you know apps like our engines like Unreal and unity supporting gltf through plugins and of course react VR supports gltf so things are looking pretty good right now in terms of industry adoption so what's in the actual gltf so to start there as the scene hierarchy or the scene graph is the geometry data there is data for the textures which are just images that map to the mesh that you know is representing the actual physical geometry and then there is animation data and skin data which essentially is used for when you have joint base animations and then all that gets combined together and interpreted by the client and that will be used to render the final scene so the the gltf file itself when you have a the binary gltf it'll just pack all this stuff together in one binary file but it's going to be composed of the JSON part and then it's gonna have the binary part that's going to have the vertex data as well as the index data and I'll talk about what that means later as well as in animation stuff and skins and the texture data the JSON structure has a bunch of high-level array objects which are the entities that are used to describe what the the scene graph is so at the top level you'll have the scene and then the scene can have multiple nodes and each node could also have multiple node children´s and then each node will have a mesh and then the mesh is going to get down into the details of the actual you know triangular mesh that represents the the object itself and you can also have the camera which describes how we're looking at the scene which is important and then of course the material information is just how we're going to provide the the texture or the look of the actual geometry a very simple example of this is we let's say we have the scene that we're gonna have this car and the car is going to animate it's going to drive forward and the wheels are going to rotate so this would look like what you see above here where you'd have the route of your scene which would have a car and a camera as children and the car itself let's say we can compose it into the front wheels and the rear wheels now each node can have its own transform so it could have a transform applied at one level and it will be applied to all the children of that node so for example if the car is moving forward the wheels will also move forward because there are children of the car but at the same time you also might want to have independent transformations or rotation translation scale applied to the wheels so for example you want the wheels to rotate while the car is moving forward so those wheels will have their own rotation matrices applied to them so that's all specified within the JSON you'll see that there's a bit of indirection here so the way we describe the children of a specific node is we give indices into that JSON array to where to find that node so the node the node array is just one big flat array so that there's no like multiple levels that you have to jump through to find the children this makes it super easy to parse specially on the web so all you have to do is just look at the the reference to the index and then just look it up in that one JSON array so this is all by design it adds a little bit of indirection but it makes it more efficient and faster to pack to parse this so now we're gonna look at the hello world example of a gltf file so this is a gltf file that just renders a single triangle a triangle is the primitive for for graphics rendering in in most api's sometimes it's a quad but on OpenGL ES it's it's a triangle so this is what the entire gltf file will look like over the the next three slides so you'll see that we have a scene with a single node and that node has a single mesh mesh to 0 and then at mesh 0 it's going to have information about where to get the vertex data and the index data so if that's in that primitive section and where you see position that's refer that's a key word that refers to the vertex data and the one that you see there is just the index into the accessor array that I'm going to show you in a second so what does this look like in the buffer so what we're gonna have is we're gonna have the information to describe that triangle which is gonna be composed of three vertices vertex 0 vertex 1 and vertex 2 so a very important thing in and when we do mesh rendering is we want to make sure that we don't have any redundant vertices so the way that we do that is we list out every single vertex and we pack that into the data and then we have another index buffer that references the vertices there that way let's say we're trying to draw a square here instead a triangle let's say you flip the triangle and then put it on top of this one you'd be sharing vertex zero and vertex two now we don't want to duplicate the data for those vertices because if you had a really complex mesh you'd be super inefficient so what we do is we just use the index buffer to describe the entire geometry so in this case we're gonna allocate six bytes and we're using two bytes for shorts so there's gonna be three entries and it's just gonna say look at vertex zero look at vertex index one look at vertex index two so that's the first six bytes and then the next thirty six bytes are just going to describe the actual vertex positions in 3d space so it's going to be a vector of three floats so in total it's gonna be 36 bytes and those two elements the index buffer and the vertex buffer basically tell us exactly how to render or how to describe our triangle and you know a more complicated example with you know a more complicated mesh in the end just comes down to this kind of thing so our buffer data in this case because we have very trivial data it's just a triangle we can specify it with a data URI and then just give it the the byte lengths but normally you would have your your mesh data appended to the end of your gob file so that way you would just provide an offset into the same file to where to get that data now your buffer views are an abstraction that allows you to look at the same buffer because remember I said that we're gonna want to be efficient with the way we load our data so we want to take one buffer and then just look at it to find the things that we need so we might have a stride apply to that or we might look at it from a specific offset to find the data so in our case we're looking at something like this so we know that we're gonna have you know a byte offset 0 to 7 and then 7 to 243 to describe where we're gonna find our index data and then we're going to find our vertex data so that's essentially what the buffer view provides there so the accessor is the thing that hooked up to back here so remember I said the position value 1 that's referring to access err 1 and then where it says indices that's referring to the the index data that we can find in Access or zero so that refers here and it just gives another reference to the buffer view that we just saw and it's gonna give you information about the component type so in this case it's going to be either float or a short for depending on the index buffer or the vertex buffer and then of course it gives you the byte offset for where to find it how many elements are in there and if it's a vector or just a scalar and then the min and Max is just like the bounding region and then finally we provide the asset version because there's different versions of gltf so we want to make sure that we the client knows how to parse it properly so I've actually provided some sample code on F these samples are github.com / FB samples gltf render that shows how to do this with a very minimal abstraction renderer that uses OpenGL ES in Java so it's there's no abstraction here you can see exactly how this works if you want to have a good introduction to how 3d rendering works on Android so to put this together I just kind of put it inside a an AR core sample where basically it allows me to render my gltf scene which is very trivial it's just a single triangle on some surface so the air core you know gives you the slam surface and now I can just render my triangle with gltf with my gltf renderer so let's talk about how we actually implement the the rendering of this so first we want to parse the the gltf so that's pretty obvious we were going to walk through the gltf or the gltf json and then we're going to parse the scene graph and then we're gonna figure out how to find the the buffer data so remember this was our structure where we had our scenes or nodes or meshes or buffers or buffer views and our accessors so those are like the top level array entities and when we're gonna parse this all we do is we basically just parse out these top level json arrays and then we're gonna inflate it and then keep it in like an ArrayList or a vector or whatever and the point being that now when we need to reference something within those arrays we'll just do a index lookup so this is how we do it you know just using the the JSON objects API so when we go to parse the actual nodes we're gonna try to find the information that we need to render so we're gonna go and we're gonna get the mesh data so now we fetched the mesh for each node and then similarly when we want to parse the mesh data we're gonna fetch the primitives and the attributes and so on and so forth in the indices so basically just you know pretty straightforward parsing of the JSON to get at the actual data that we need now the interesting part is when we actually parse the buffers so it's super important that we aren't redundant or we aren't you know memory wasteful when we parse the buffers so what we're gonna do is we're gonna parse that single buffer that we have only once and even though we have a buffer view that looks at it two different ways we're just gonna use that same underlying buffer so in this case we have a data URI so we just have to decode that data URI and then store it in a byte array essentially and then we're gonna provide a byte buffer that is going to contain that that array data now when we use our buffer views we're just gonna look at a specific offset to find the data that we need now of course this is pretty simple for you know the data that we have right now because it's just a triangle but when you have a very complex you know 3d mesh this can be super super big savings in terms of efficiency and memory the next thing we're going to do is we're gonna create a rendering representation from the data that we just got back from the JSON so we need to convert it into data that can be rendered with the OpenGL ES API one thing to note is that we're going to be doing a draw call a graphics API draw call per mesh primitive so when we looked at the when we looked at this mesh data here so you'll notice that there's a primitive section you can also have multiple primitives the reason why you'd want to do this is you might want to break down a very very complex mesh into multiple sections so that it makes it more efficient to draw so for each mesh primitive we're going to do one OpenGL ES draw call so let's get back to here and then of course we're going to extract the vertex and index data from the access er so now we're here and we have the mesh data that we extracted and then we have the primitive primitive data that we extracted and again we're just following the in Direction the same way that we jumped through the JSON and then we're gonna actually get to the to the raw data so this is all that's going on here it's pretty straightforward but the source code is online of course if you wanna if you miss something so the the important part here is now finally getting to the vertex and the index data which is what actually tells us how to render the scene so what we're gonna do is we're going to take that original byte buffer or buffer byte buffer or view that we looked at originally and we're gonna look at it two different ways depending on if it's the vertex buffer or the index buffer and this is provided by the the Java buffer or API so all we have to do is we look at it as a float buffer because we know that we're interpreting float data for the case of the the vertex positions and then we are going to apply the bite length in the byte offset and then just position the pointer inside that buffer object to the right position similarly when we want to do the index buffer we're gonna do the same thing so oh actually sorry this is the way that you actually upload the data to the GPU so now that we have the vertex data in memory so that's here in our float buffer abstraction then we need to upload it to the GPU so this is the API that we use with OpenGL ES to upload the data to GPU memory so from system memory to GPU memory so the key thing here is first we have to generate the buffers that's what we do with GL gen buffers and then we're going to do a bind buffers to you know GL OpenGL ES is state based so we need to bind a single buffer and then we're going to say that we're gonna upload the data so that's what the GL buffer data call does so now that we've uploaded that vertex data of just pointing to that float buffer then we can get ready to draw so drawing with OpenGL OpenGL yes this is gonna happen every frame because generally you're gonna have animation and stuff that changes every frame you can do optimizations for like if nothing changes in the frame just don't rerender the scene because the the screen is not going to change so for each render object we're gonna attach the shaders and if you're not familiar with shaders shaders are programs that run on the GPU there's two the two most common ones or the vertex shader which tells you what the final screen position of a vertex is and a vertex is just a point in 3d space so we need to do a perspective projection or an orthographic protract projection that maps from 3d to the 2d screen space so that's what the vertex shader does the other shader is the fragment shader which gives you the final color for the pixel that's going to be on the mesh so for each pixel on your geometry you need to figure out what the color of it is going to be and that's what the fragment shader does so once you've attached your shaders then you're gonna bind the actual buffers so we talked about the index buffer and the vertex buffer and uniforms are just any state that you need to provide to the GPU to render in the shaders and then finally the call to jail draw elements is actually what executes and tells the GPU to go on and rasterize some triangles to create the mesh so this is what it looks like so for each frame we're gonna take our gltf render objects that we created so we have this list of them and again these map to each mesh primitive in the in the JSON and then we're going to bind the the index buffer and the vertex buffer pointer and then we're gonna issue this call to GL draw elements with the index count which is basically just how many elements were drawing and in this case it's just three because we have the three vertices or the three elements that make up the triangle in our scene and once we do that then we have our gltf hello world scene rendered pretty straightforward when you're just looking at a triangle but of course you know this gets a lot more complex when you have a very complex mesh but this is at the core exactly how everything is working so if you understand this you understand the fundamentals of 3d graphics rendering and of course you know we're just using like red as the default color but you can also have materials so we could have applied a texture to give the triangle some kind of image as a look and we could have also provided some kind of lighting to you know change the way that the the triangle looks based on where the light source is again my sample gltf render code so if you want to look more in detail please check this out and now let's talk about improvements in 3d posts so we talked about gltf being the JPEG of 3d but an important thing for being jpg are being like JPEG is to have an encode and decode step where you you encode prior to transmission over the wire and then you decode quickly on the client so we need to look into a way to do that with 3d objects so the first thing is to compress the meshes so Google actually provided the gltf extension for mesh compression or one of the the standard ones or the ones that's been ratified by Chronos using their draco mesh compression system so this is an example or these are samples of gltf models that are on the Chronos repository where google has achieved we're very good compression ratios with their draco system in some cases they can get a 10 to 1 compression ratio and this is super important because it turns out that the network transfer is generally the bottleneck and in loading 3d scenes so you can imagine when we're in facebook newsfeed we need to load each post very quickly because people are quickly stroll scrolling through the feed so we can't have you know a 40 megabyte gltf file but if you can use mesh compression then maybe you can load you know 4 megabyte gltf file over the wire and then just do a quick decode and tie them for for loading so a mesh compression is super important and it's generally pretty fast to do the to do the decode and again this is just applied to the the mesh data itself so the actual vertex and index data now another thing that's super important is texture compression so the texture is themselves a lot of times when you want to have a mesh be less complex what you'll do is you'll simplify the mesh but then you'll still apply high-resolution textures so that it still looks decent so it's it's super important to also compress the textures so you know one thing with texture compression is most GPUs in most GPU support different texture compression formats so because of that you can't just use the same texture a compressed texture across PC and mobile so there's gonna be different formats depending on which one so this becomes a problem for gltf which tries to be platform agnostic so we need to find a solution for this so that we can you know do this without having to be specific to each platform so this is actually you know Cronus's approach to it so what they're doing is they're proposing this Universal texture format and the way that it's going to a compression format and the way that it's going to work is that it's going to be a format that you transcode on the client to one of the native formats so the format would look the same from the higher level but when you get it on to the client it will try to transcode it to one of the platform specific GPU texture compression formats and like the reason why we want to do this is we don't want to have to go through the whole JPEG decode step essentially and then pull it into binary or into bitmap data because the bitmap is much much bigger in memory than the actual texture data or the actual over the wire transmission format data so if we can just take the compressed texture and load it directly on the GPU we can skip that decode step and we can save memory so this is very important for complex scenes and this is kind of the approach that you know Cronus is going so that they can have something platform agnostic now of course at Facebook we can be a lot more practical about this and what we'll just just do is that if you're on Android we know that we can serve you a certain compressed texture format so we'll just serve you eat ec2 or a STC or one of the format's that works you know on our platform and then if you're on web we can serve you you know and ext one or you know the DirectX formats so the next thing that you know is important you know and improving in 3d posts is to get better use of all the gltf 2.0 features so far we we haven't implemented everything like blend shapes and animations skinning physically based materials this is kind of a sample that Chronos put out of what you can do with gltf so you'll see here that it has environment based lighting what that means is that the the actual animated object here which is that dog it actually takes in the lighting from the scene now this is very very important in AR so if you want something to look realistic in the scene in AR it needs to take on the lighting from the environment otherwise it's gonna stand out it's kind of like you know the early version of Pokemon go or it just had like this you know Pokemon that's just rendered on top of the video feed and it didn't look like it's actually part of the scene so to make it feel like it's part of the scene you need to render it and it needs to take in all the lighting cues and the color reflections and the little things that you don't really think about but it actually plays a big part in making something feel like it's part of the scene so that's when we get into physically based rendering so physically based rendering is is a way to more accurately represent materials and how they interact with light when compared to traditional real-time models this is an example of a gltf file loaded on the website sketchfab and you'll see that it's a very realistic rendering which has the lighting from the environment you see reflections you can see roughness and you can see you know properties that make it feel like it's very realistic this is what most you know high-end game engines are doing these days to try to make a realistic photo realistic render so the way that is this is done you have your base mesh like you can see there and then there's different textures that are providing different material properties so these are physical properties like roughness how metal it how metallic it is how emissive it is if it you know it's emitting light and then we combine these all together using you know light calculation formulas provide the actual final output render so you can see here these are all like the different components from that gltf physically based rendering material there's multiple textures that provide that and then in the end you get your final render which looks super realistic and takes in the environment lighting so if you're interested in physically based rendering which is a huge topic on its own home angee worked on a physically based rendering library called filament which is used within the scene form api library that works with AR core and there is extensive documentation on the filament github page they pretty much like wrote a novel about about physically based rendering because it's fairly complex and the optimizations that we need to do to make it work performance and royde are super important so if you're interested in this this is a really really great resource and the filament library itself is a really good implementation of physically based rendering for Android so that's pretty much my talk you know that cover is kind of you know what gltf is what we're doing next and kind of how we're using it at Facebook and what we hope that you get from this is that we know that you have the skillsets that Android developers are gonna need in the future is gonna be they're gonna have to have a better understanding of 3d rendering and and things that you're gonna need when you start doing you know Android apps that go into AR and in VR applications but there's also different applications that like even you know for our 2d you know Facebook application we find use cases where 3d makes sense and you know things like 3d photos and and just being able to you know renderer artists art you know directly on facebook so you might find that right now it might not match your use case but you'll see that you know kind of in the future there's gonna be a ton of use cases where you're gonna need the skill set so I'm hoping that this introduction to you know 3d scene rendering and describing what we're doing at Facebook helps you guys in the future so we have any questions yeah so is there a standard animation format so the the gltf spec itself describes how animation data is kept within the the file format itself so you can have node animation so like I showed you with the car where you're applying on each node matrix to basically do the transformation or the you know rotation scale translate so that's one kind of animation so that will allow you to like rotate your layer or rotate your node and you know rotate your children nodes but there's also more complexed animations so there's joint based animation so that's when you describe the joints like so if I'm you know rendering a person I would describe where their joints are and then you would be basically giving information that tells you how to move the joints and how to animate them separately and then you know that that that joint based movement will be applied to everything every node that is within that joint if you get what I'm saying so that's you know joint based animation or skinning and then you apply textures that apply to you know the arms so that you can make it look like a proper arm there's also blend shapes so blend shapes is like when you're doing facial animations or you know something more squishy where there's a lot of deformations so gltf also supports blend shapes and a way to describe that so all that stuff is still gonna be kept in the the GLB so it's gonna be kept in the binary part and you know generally you would have like keyframes for the animation and it would be appended and the the JSON itself would tell you where to get that data so it'd be similar to the accessor that tells you where to get the data from the buffer the same thing would be applied with the animation data yeah so so he's asking you know if it's possible for GPU vendors to support the common format for texture compression yes absolutely but the problem is what do you do for all the existing phones and and hardware out there because the the the compression format is not just a software level thing it's something that that NEADS hardware support oh yeah absolutely yeah so definitely you know we're trying to solve it in the practical way right now for existing devices but in the future yes there's definitely like you know GPU vendors trying to get together and solve the problem of what is the ultimate texture format that we can use in all use cases so that that is a collaboration that has to happen between you know all the vendors but the problem also ends up being that GPUs are different and each have proprietary implementations that work better in certain cases so it's kind of hard to make something that is optimal for all cases but if you know we have something that's standard in the industry and everyone else has to support it then I think that's all that's the way that this is gonna happen but again that will only help us you know five years down the line or and everyone is on the you know those platforms and even you know if you're on Android probably not all and everyone's gonna be on you know those platforms cool so I think yeah we're over time so thank you everyone [Applause] you

Info

Channel: droidcon SF

Views: 394

Rating: undefined out of 5

Keywords:

Id: EGSR8qrpEq4

Channel Id: undefined

Length: 36min 49sec (2209 seconds)

Published: Sat Dec 01 2018