Problem solving across 100,633 lines of code | Gemini 1.5 Pro Demo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
This is a demo of long context understanding an experimental feature in our newest model, Gemini 1.5 Pro. We'll walk through some example prompts using the three.js example code, which comes out to over 800,000 tokens. We extracted the code for all of the three.js examples and put it together into this txt file which we brought into Google AI Studio over here. We asked the model to find three examples for learning about character animation. The model looked across hundreds of examples and picked out these three. One about blending skeletal animations, one about poses, and one about morph targets for facial animations. All good choices based on our prompt. In this test, the model took around 60 seconds to respond to each of these prompts. But keep in mind that latency times might be higher or lower, as this is an experimental feature we're optimizing. Next, we asked, “what controls the animations on the littlest Tokyo demo?” As you can see here, the model was able to find that demo and it explained that the animations are embedded within the gLTF model. Next, we wanted to see if it could customize this code for us, so we asked, “Show me some code to add a slider to control the speed of the animation. Use that kind of GUI the other demos have.” This is what it looked like before on the original three.js site, and here's the modified version. It's the same scene, but it added this little slider to speed up, slow down, or even stop the animation on the fly. It used this GUI library the other demos have, set a new parameter called animation speed, and wired it up to the mixer in the scene. Like all generative models, responses aren't always perfect. There's actually not an INIT function in this demo like there is in most of the others. However, the code it gave us did exactly what we wanted. Next, we tried a multimodal input by giving it a screenshot of one of the demos. We didn't tell it anything about this screenshot and just asked where we could find the code for this demo seen over here. As you can see, the model was able to look through the hundreds of demos and find the one that matched the image. Next, we asked the model to make a change to the scene, asking, “how can I modify the code to make the terrain flatter?” The model was able to zero in on one particular function called generate height, and showed us the exact line to tweak. Below the code, it clearly explained how the change works. Over here, in the updated version, you can see that the terrain is indeed flatter, just like we asked. We tried one more code modification task using this 3D text demo over here. We asked, “I'm looking at the text geometry demo and I want to make a few tweaks. How can I change the text to say, ‘goldfish’ and make the mesh materials look really shiny and metallic?” You can see the model identified the correct demo and showed the precise lines in it that need to be tweaked. Further down, it explained these material properties, metalness and roughness and how to change them to get a shiny effect. You can see that it definitely pulled off the task and the text looks a lot shinier now. These are just a couple examples of what's possible with a context window of up to 1 million multimodal tokens in Gemini 1.5 Pro. This is a demo of long context understanding an experimental feature in our newest model, Gemini 1.5 Pro. We'll walk through some example from seizing the three jazz example code, which comes out to over 800,000 tokens. We extracted the code for all of the three jazz examples and put it together into this text file, which we brought into Google. We extracted the code for all of the three jazz examples and put it together into this text file, which we brought into Google Studio over here. We asked the model to find three examples for learning about character animation. The model looked across hundreds of examples and picked out these three one about blending skeletal animations. What about poses? And what about morph targets for facial animations? All good choices based on our prompt. In this test, the model took about 60 seconds to respond to each of these prompts. But keep in mind that latency times might be higher or lower, as this is an experimental feature we're optimizing. Next, we asked what controls the animations on the littlest Tokyo demo. And you can see here the model was able to find that demo and it explained that the animations are embedded with the guilty F model. Next, we wanted to see if we could customize this code for us. So we asked, Show me some code to add a slider to control the speed of the animation and use that same kind of goofy. The other demos have. This is what it looked like before on the original three JS site. And here's the modified version. It's the same scene, but it added this little slider to speed up, slow down or even stop the animation on the fly. It used this goofy library. The other demos have set a new parameter called Animation Speed and wired it up to the mixer in the scene. Next, we try to multi-modal input by giving it a screenshot of one of the demos. We didn't tell it anything about the screenshot and just ask where we could find the code for the demo scene over here. As you can see, the model was able to look through hundreds of demos and find the one that matched the image. Next, we asked the model to make a change to the scene, asking How can I modify the code to make the terrain flatter? The model was able to zero in on one particular function called generate height, and showed us the exact line to tweak. Below the code, it clearly explained how the change works over here in the updated version. You can see that the terrain is indeed flatter, just like we asked. We tried one more code modification task using this 3D text demo over here. We asked, I'm looking at the text geometry demo and I want to make a few tweaks. How can I change the text to say, goldfish and make the mesh materials look really shiny? Metallic. You can see the model identified the correct demo and showed the precise lines in it that need to be tweaked. Further down, it explained these material properties metal ness and roughness and how to change them to get the shiny effect. You can see the model identified the correct demo and showed the precise lines in it that need to be tweaked. Further down, it explained these material properties, metal, nice and roughness and how to change them to get a shiny effect. Like all generative models, responses aren't always perfect. We did feel like it could have maybe reuse the same material instance, but you can see that it definitely pulled off the task and the text looks much shinier now. These are just a couple examples of what's possible with a context window of up to 1 million multimodal tokens in Gemini 1.5 Pro. This is a demo of long context understanding an experimental feature in our newest model, Gemini 1.5 Pro. We'll walk through some example prompts using the three jazz example code, which comes out to over 800,000 tokens. We extracted the code for all of the three JS examples and put it together in this text file which we brought into Google Studio over here. We asked the model to find three examples for learning about character animation. The model looked across hundreds of examples and picked out these three. What about blending skeletal animations? What about poses? And what about morph targets for facial animations? All good choices based on our approach. In this test, the model took around 60 seconds to respond to each of these prompts. But keep in mind that latency times might be higher or lower, as this is an experimental feature we're optimizing. Okay. Next, we asked what controls the animations on the littlest Tokyo demo. As you can see here, the model was able to find that demo and it explained that the animations are embedded within the gilts model. Next, we wanted to see if it could customize this code for us. So we asked. Next, we wanted to see if we could customize this code for us. So we asked, Show me some code to add a slider to control the speed of the animation and use that same kind of go with the other demos have. This is what it looked like before on the original three JS site. And here's the modified version. It's the same scene, but it added this little slider to speed up, slow down, or even stop the animation on the fly. It used this goofy library. The other demos have set a parameter called animation speed and wired it up to the mixer in the scene. Next, we try to multi-modal input by giving it a screenshot of one of the demos. We didn't tell it anything about the screenshot and just ask where we could. Next we try to multi-modal input by giving it a screenshot of one of the demos. We didn't tell it anything about the screenshot and just asked where we could find the code for this demo scene over here. As you can see, the model was able to look through the hundreds of demos and find the one that matched the image. Next, we asked the model to make a change to the scene, asking How can I modify the code to make the terrain flatter? The model was able to zero in on one particular function called generate height, and showed us the exact line to tweak below the code. It clearly explained how the change works over here in the updated version. You can see that the terrain is indeed flatter, just like we ask. We tried one more code modification task using this 3D text. We tried one more code modification task using this 3D texture demo over here. We tried one more code modification task using this 3D text demo over here. I'm looking at the text geometry demo and I want to make a few tweaks. How can I change the text to say, goldfish and make the mesh materials look really shiny and metallic? You can see the model identified the correct demo and showed the precise lines in it that need to be tweaked. Further down, it explained these material properties, metal ness and roughness and how to change them to get a shiny effect. Like alternative models, responses aren't always perfect. We did feel like it could have maybe reuse the same material instance, but you can see that it definitely pulled off the task and the text looks much shinier now. These are just a couple examples of what's possible with the context window of up to 1 million multimodal tokens in Gemini 1.5 Pro. These are just a couple examples of what's possible with a context window of up to 1 million multimodal tokens in Gemini 1.5 Pro. This is a demo of long context understanding experimental feature in our newest model, Gemini 1.5 Pro. We'll walk through some example prompts using the three J.S. example code, which comes out to over 800,000 tokens. We extracted the code for all of the three JS examples and put it together into this text file which we brought into Google Studio over here. We asked them all to find three examples for learning about character animation. The model looked across hundreds of examples and picked out these three. About blending skeletal animations. What about poses? And what about morph targets for facial animations? All good choices based on our prompt. In this test, the model took about. In this test, the model took around 60 seconds to respond to each of these prompts. But keep in mind that latency times might be higher or lower, as this is an experimental feature we're optimizing. In this test, the model took around 60 seconds to respond to each of these prompts. But keep it in this test. The model took around 60 seconds to respond to each of these prompts. But keep in mind that latency times might be higher or lower, as this is an experimental feature we're optimizing. Next, we asked what controls the animations on the littlest Tokyo demo. Next, we asked what controls the animations on the littlest Tokyo demo. As you can see here, the model was able to find that demo and it explained that the animations are embedded within the GLT model. And you can see here the model was able to find that demo and it explained that the animations are embedded within the GLT model. And you can see here the model was able to find that demo and it explained that the animations are embedded within the GLT model. Next, we wanted to see if we could customize this code for us. So we asked, Show me some code to add a slider to control the speed of the animation and use that to kind of go with the other demos. Have. Next, we wanted to see if you could customize this code for us. So we asked, Show me some code to add a slider to control the speed of the animation and use that same kind of gooey. The other demos have this is what it looked like before on the original three JS site, and here's the modified version. It's the same scene, but it added this little slider to speed up, slow down, or even stop the animation on the fly. It uses this. It used this goofy library. The other demos have set a parameter called animation speed and wired it up to the mixer in the scene. Next, we try to multi-modal input by giving it a screenshot of one of the demos. We didn't tell it anything about this screenshot and just asked where we could find the code for this demo scene over here. As you can see, the model was able to look through. As you can see, the model was able to look through the hundreds of demos and find the one that matched the image. Next, we asked the model to make a change to the scene, asking How can I modify the code to make the terrain flatter? The model was able to zero in on one particular function called generate height, and showed us the exact line to tweak. Below, the code clearly explained how the change works over here in the updated version. You can see that the the model was able to zero in on one particular function called generate height and showed us the exact line to tweak below the code. It clearly explained how the change works over here in the updated version. You can see the terrain is indeed flatter, just like we asked. We tried one more code modification task using this 3D Tex demo over here. We asked. I'm looking at the Tex Geometry demo and I want to make a few tweaks. How could I change the text to say, goldfish and make the mesh materials look really shiny and metallic? You can see the model identified the correct demo and showed the precise lines in it that needs to be tweaked. Further down, it explained these material properties, metal ness and roughness and how to change them to get a shiny effect. Like all generative models, responses aren't always perfect. We did feel like it could have maybe reuse the same material instance, but you can see that it definitely pulled off the task and the text looks much shinier now. Like all generative models, responses aren't always perfect. We did feel like it could have maybe reuse the same material instance, but you can see that it definitely pulled off the task and the text looks much shinier now. These are just a couple examples of what's possible with the context window of up to 1 million multimodal tokens in Gemini, 1.5 Pro. These are just a couple examples of what's possible with the context window of the 1 million multimodal tokens in Gemini 1.5 Pro.
Info
Channel: Google
Views: 417,005
Rating: undefined out of 5
Keywords: AI, Tech, AI News, Artificial Intelligence, Machine Learning, AI Tools, Prompt Engineering, Technology, Generative AI, Deep Learning, AI Explained
Id: SSnsmqIj1MI
Channel Id: undefined
Length: 3min 15sec (195 seconds)
Published: Thu Feb 15 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.