Google Reveals CRAZY New AI to CRUSH OpenAI GPT4-o (Supercut)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
a year ago on the stage we first shared our plans for Gemini a Frontier Model built to be natively multimodal from the very beginning that could reason across text images video code and more it's a big step in turning any input into any output an IO for a new generation two months later we introduced Gemini 1.5 Pro delivering a big breakthrough in Long context it can run 1 million tokens in production consistently more than any other large-scale Foundation model yet one of the most exciting Transformations with Gemini has been in Google search in the past year we answered billions of queries as part of her search generative experience people are using it to search in entirely new ways and asking new types of questions longer and more complex queries even searching with photos and getting back the best the web has to offer we've been testing this experience outside of labs and we are encourage to see not only an increase in search usage but also an increase in user satisfaction we'll begin launching this fully revamped experience AI overviews to everyone in the US this week and we'll bring it to more countries soon let me show you an example in Google photos say you're at a parking station ready to pay but you can't can't recall your license plate number before you could search photos for keywords and then scroll through years worth of photos looking for the right one now you can simply ask photos it knows the cars that appear often it triangulates which one is yours and just tells you the license plate number and ask photos can also help you search your memories in a deeper way for example you might be reminiscing about your daughter Lucia's early milest Stones you can ask photos when did Lucia learn to swim you can even follow up with something more complex show me how Lucia swimming has progressed here Gemini goes beyond a simple search recognizing different contexts from doing laps in the pool to snorkeling in the ocean to the text and dates on her swimming certificates and photos packages it up all together in a summary you can really take it all in and relive amazing memories all over again we are rolling out ask photos this this summer with more capabilities to come multimodality radically expands the questions we can ask and the answers we will get back long context takes this a step further enabling us to bring in even more information hundreds of pages of text hours of audio a full hour of video or entire code repost or if you want roughly 966 cheese Cake Factory menus but I think we can push ourselves even further so today we are expanding the context window to 2 million tokens this represents the next step on our journey towards the ultimate goal of infinite context okay so far we've talked about two technical advances multimodality and long context each is powerful on its own but with a multimodal model we can do so much more to show you an early demo of an audio output in Notebook LM here's Josh hey everyone last year at IO we introduced notebook LM a research and writing tool grounded in the information you give it and today Gemini 1.5 Pro is coming to notebook LM and it's great let me show you so here we are in Notebook LM notebook LM is going to take all the materials on the left as input and out put them into a lively science discussion let's take a listen so let's uh let's dive into physics what's on deck for today well uh we're starting with the basics force and motion okay and that of course means we have to talk about Sir Isaac Newton and his three laws of motion ah yes the foundation for understanding how objects move and interact ah yes this is where multimodal really shines now it generated this audio discussion based on that text material and what's amazing as is that my son and I can join into the conversation and steer it whichever direction we want when I tap join hold on we have a question what's up Josh yeah can you give my son Jimmy a basketball example hey Jimmy that's a fantastic idea basketball is actually a great way to visualize force and motion let's break it down okay so first imagine a basketball just sitting there on the court it's not moving right that's because all the forces acting on it are balanced the downward pull of grav pretty cool right now what's interesting is under the hood you saw that Gemini had used some of the concepts of gravity Sir Isaac Newton but nothing in there was about basketball it connected the dots and created that age appropriate example for him and this is what's becoming possible with the power of Gemini back to you Sundar thanks sh the demo shows the real opportunity with multimodality soon you'll be able to mix and match inputs and outputs this is what we mean when we say it's an IO for a new generation and I can see you all out there thinking about the possibilities but what if it could go even further Google Deep Mind is hard at work to share more please welcome for the first time on the io stage sir Demis I co-founded deep mine in 2010 with the goal of one day building AGI artificial general intelligence A system that has human level cognitive capabilities today we have some exciting new progress to share about the future of AI assistance that we're calling project Astra for a long time we've wanted to build a universal AI agent that can be truly helpful in everyday life our work making this Vision a reality goes back many years it's why we made Gemini multimodal from the very beginning an agent like this has to understand and respond to our complex and dynamic world just like we do it would need to take in and remember what it sees so it can understand context and take action and it would have to be proactive teachable and personal so you can talk to it naturally without lag or delay while we've made some great strides in developing AI systems that can understand multimodal information getting response time down to something conversational is a difficult engineering challenge building on our Gemini model we've developed agents that can process information rer by continuously encoding video frames combining the video and speech input into a timeline of events and caching this for efficient recall we've also enhanced how they sound with a wider range of intonations these agents better understand the context you're in and can respond quickly in conversation making the pace and quality of interaction feel much more natural here's a video of our prototype which will'll see has two parts each part was captured in in a single take in real time okay let's do some tests tell me when you see something that makes sound I see a speaker which makes sound what is that part of the speaker called that is the Tweeter it produces high frequency sounds give me a creative alliteration about these creative crayons color cheerfully they certainly craft colorful Creations what does that part of the code do this code defines encryption and decryption functions it seems to use a ecbc encryption to encode and decode data based on a key and an initialization Vector IV H that's right what neighborhood do you think I'm in this appears to be the king's cross area of London it is known for its railway station and transportation connections do you remember where you saw my glasses yes I do your glasses were on the desk near a red apple [Applause] what can I add here to make this system faster adding a cache between the server and database could improve speed what does this remind you [Applause] of shringer cat all right uh give me a band name for this Duo golden Stripes nice thanks Gemini I think you'll agree it's amazing to see how far AI has come especially when it comes to spatial understanding video processing and memory it's easy to envisage a future where you can have an expert assistant by your side through your phone or new exciting form factors like glasses but we also know from user feedback that some applications need lower latency and a lower cost to serve So today we're introducing Gemini 1.5 Flash the lighter weight model compared to Pro is designed to be fast and cost efficient to serve at scale while still Fe featuring multimodal reasoning capabilities and breakthrough long context flash is optimized for tasks where low latency and efficiency matter most starting today you can use 1.5 Flash and 1.5 Pro with up to 1 million tokens in Google AI studio and vertex Ai and developers can sign up to try 2 million tokens when we first began this journey to build AI more than 15 years ago we knew that one day it would change everything now that time is here and we continue to be amazed by the progress we see and inspired by the advances still to come on the path to AGI thanks and back to you Cinder a huge amount of innovation is happening at Google deep M it's amazing how progress we made in a year training state-of-the-art models requires a lot of computing power industry demand for ML compute has grown by a factor of 1 million in the last 6 years and every year it increases tfold today we are excited to announce the sixth generation of tpus called Trillium Trillium delivers a 4.7x Improvement in compute performance per chip over the previous generation so are more most efficient and performant TPU today we'll make Trillium available to our Cloud customers in late 2024 alongside our tpus we are proud to offer CPUs and gpus to support any workload that includes the new Axion processes we announced last month our first custom arm-based CPU with industry-leading performance and Energy Efficiency we are also proud to be one of the first Cloud providers to offer nvidias cutting Edge Blackwell gpus available in early 2025 we are fortunate to have a long-standing partnership with Nvidia excited to bring Blackwell's capabilities to our customers chips are a foundational part of our integrated entn system 2 years ago we created search to help people make sense of the waves of information moving online with each platform shift we have delivered breakthroughs to help answer your questions better on mobile we unlocked new types of questions and answers using better context location awareness and realtime information with advances in natural language understanding and computer vision we enable new ways to search with your voice or a hump to find your new favorite song or an image of that flower you saw on your walk now you can even Circle to sech those cool new shoes you might want to buy go for it you can always return them later of course search in the Gemini era will take this to a whole new level to tell you more here's this thanks SAR with each of these platform shifts we haven't just adapted we've expanded what's possible with Google search and now with generative AI search will do more for you than you ever imagined so let's begin with AI overviews Google does the work for you instead of piecing together all the information yourself you can ask your question and as you see here you can get an answer instantly complete with a range of perspectives and links to dive deeper but this is just the first step we're introducing multi-step reasoning in Google search so Google can do the researching for you for example let's say you've been trying to get into yoga and pilates soon you'll be able to ask search to find the best yoga or Pilates studios in and show you details on their intro offers and the walking time from Beacon Hill as you can see here Google gets to work for you you get some studios with great ratings and their introductory offers and you can see the distance for each like this one it's just a 10-minute walk away right below you see where they're located laid out visually and you got all this from just a single search under the hood our custom Gemini model acts as your AI agent using what we call multi-step reasoning it breaks your bigger question down into all its parts and it figures out which problems it needs to solve and in what order but what about all those times when you don't know exactly what to ask and you need some help brainstorming when you come to search for ideas you'll get more than an AI generated answer you'll get an entire AI organized page custom built for you and your question say you're heading to Dallas to celebr celebrate your anniversary and you're looking for the perfect restaurant what you get here breaks AI out of the box and it brings it to the whole page our Gemini model uncovers the most interesting angles for you to explore and organizes these results into these helpful clusters like you might never have considered restaurants with live music or ones with historic charm our model even uses contextual factors like the time of the year so since it's warm in Dallas you can get rooftop patios as an idea and it pulls everything together into a dynamic whole page experience you'll start to see this new AI organized search results page when you look for inspiration starting with dining and recipes and coming to movies music books hotels shopping and more but your questions aren't limited to words in a text box and sometimes even that picture can't tell the whole story soon you'll be able to ask questions with video right in Google search let me introduce Rose to show you this in a live demo I have always run a record player and I got this one in some vinyls at yard sell recently but um when I go to play it this thing keeps lighting off I have no idea how to fix it or where to even start like what make is this record player what's the model and what is this thing actually called but now I can just ask with a video so let's try it let's do a live demo I'm going to take a video and ask Google why will this not stay in place and in a near instant Google gives me an AI overview I guess some reasons this might be happening and steps I can take to troubleshoot let me walk either what just happened thanks to our combination of our stady art speech models our deep visual understanding and our custom Gemini model search was able to understand the question I asked out loud Break Down the video frame by frame each frame was fed into Gemini's long context window you heard about earlier today such search could then pinpoint the exact make a model of my record player and make sense of the motion across frames to identify the toe arm was drifting search fan out and comb the web to find relevant insights from articles forums videos and more and it stitched all of this together into my AI overview the result was music to my ears since last May we've been hard at work making Gemini for workspace even more helpful for businesses and consumers across the world one of the really neat things about workspace apps like Gmail Drive docs calendar is how well they work together and in our daily lives we often have information that flows from from one app to another like say adding a calendar entry from Gmail or creating reminders from a spreadsheet tracker but what if Gemini could make these Journeys totally seamless perhaps even automate them for you entirely my sister is a self-employed photographer and her inbox is full of appointment bookings receipts client feedback on photos and so much more so let's go to her inbox and take a look lots of unread emails let's click on the first one it's got a PDF that's an attachment from a hotel as a receipt and I see a suggestion in the side panel help me organize and track my receipts let's click on this prompt the side panel now show will show me more details about what that really means and as you can see there's two steps here step one create a drive folder and put this receipt and 37 others it's found into that folder makes sense step two extract the relevant information from those receipts in that folder into a new spreadsheet now this sounds useful why not I also have the option to edit these actions or just hit Okay so let's hit okay Gemini will now complete the two steps described above and this is where it gets even better Gemini offers you the option to automate this so that this particular particular workflow is run on all future emails keeping your drive folder and expense sheet up to dat with no effort from you now we know that creating a complex spreadsheet like this can be daunting for most people but with this automation Gemini does the hard work of extracting all the right information from all the files and in that folder and generates this sheet for you so let's take a look okay it's super well organized and it even has a category for expense type now now we have this sheet things can get even more fun we can ask Gemini questions questions like show me where the money spent Gemini not only analyzes the data from the sheet but also creates a nice visual to help me see the complete breakdown by category and you can imagine how this extends to all sorts of use cases in your inbox like travel expenses shopping remodeling projects you name it all all of that information in Gmail can be put to good use and help you work plan and play better now this particular ability to organize your attachments and drive and generate a sheet and do data analysis via Q&A will be rolling out to Labs users this September and it's just one of the many automations that we're working on in workspace and now over to to tell you more about Gemini app our vision for the Gemini app is is to be the most helpful personal AI assistant by giving you direct access to Google's latest AI models it's natively multimodal so you can use text voice or your phone's camera to express yourself naturally you can even interrupt while Gemini is responding and it will adapt to your speech patterns and this is just the beginning we're excited to bring the speed gains and video understanding capabilities from Project Astra to the G Gemini app now the way I use Gemini isn't the way you use Gemini so we're rolling out a new feature that lets you customize it for your own needs and create personal experts on any topic you want we're calling these gems they're really simple to set up just tap to create a gem write your instructions once and come back whenever you need it now gems are a great timesaver when you have specific ways that you want to interact with Gemini again and again gems will roll out in the coming months and our trusted testers are already finding so many creative ways to put them to use they can act as your yoga bestie your personal Sue Chef a brainy calculus tutor a peer reviewer for your code and so much more next I'll show you how Gemini is taking a step closer to being a true AI assistant by planning and taking actions for you now it all starts with a prompt okay so here we go we're going to Miami my son loves art my husband loves seafood and our flight and hotel details are already in my Gmail inbox now there's a lot going on in that prompt everyone has their own things that they want to do to make sense of these variables Gemini starts by gathering all kinds of information from search and helpful extensions like maps and Gmail it uses that data to create a dynamic graph of possible travel options taking into account all of my priorities and constraints the end result is a personalized vacation plan presented in Gemini's new Dynamic UI now based on my flight information Gemini knows that I need a two and a half day itinerary and you can see how Gemini uses spatial data to make decisions our flight lands in the late afternoon so Gemini Skips a big activity that day and finds a highly rated seafood restaurant close to our hotel now on Sunday we have a jam-pack day I like these recommendations but my family likes to sleep in so I tap to change the start time and just like that Gemini adjusted my intinerary for the rest of the trip this looks great it would have taken me hours of work checking multiple sources figuring out schedules and Gemini did this in in a fraction of the time this new trip planning experience will be rolling out to Gemini Advanced this summer just in time to help you plan your own Labor Day weekend and check out what Gemini Advance can do with your spreadsheets with the new data analysis feature launching in the coming weeks maybe you have a side hustle selling handcrafted products but you're a better artist than accountant and it's really hard to understand which products are worth your time simply upload all of your spreadsheets and ask Gemini to visualize your earnings and help you understand your profit Gemini goes to work calculating your returns and pulling its analysis together into a single chart so you can easily understand which products are really paying off now behind the scenes Gemini writes custom python code to Crunch these numbers and of course your files are not used to train our models oh and just one more thing later this year we'll be doubling the long context window to 2 million tokens today you've seen how AI is Transforming Our products across Gemini search workspace and more we're bringing all these Innovations right onto your Android phone this new era of AI is a profound opportunity to make smartphones truly smart we're harnessing on device AI to unlock new experiences that work as fast as you do while keeping your sensitive data private let's start with AI powered search earlier this year we took an important First Step at Samsung unpacked by introducing Circle to search it brings the best of search directly into the user experience so you can go deeper on anything you see on your phone without switching apps fashionistas are finding the perfect shoes home chefs are discovering new ingredients and with latest update it's never been easier to translate whatever is on your screen like a social Post in another language and there even more ways Circle the search can help here's Dave to share more building Google AI directly into the OS elevates the entire smartphone experience and Android is the first mobile operating system to include a built-in on device Foundation model this lets us bring Gemini goodness from the data center right into your pocket so the experience is faster while also protecting your privacy starting with pixel later this year we'll be expanding what's possible with our latest model Gemini Nano with multimodality this means your phone can understand the world the way you understand it so not just through text input but also through sites sounds and spoken language Let me give you an example people lost more than $1 trillion dollar to fraud last year and as scams Contin to evolve across texts phone calls and even videos Android can help protect you from the bad guys no matter how they try to reach you so let's say I get rudely interrupted by an unknown caller right in the middle of my presentation hello hi am calling from safe moring Security Department am I speaking to Dave uh yeah this is Dave kind of in the middle of something we've detected some suspicious activity on your account I can't give you specific over the phone but to protect your account I'm going to help you transfer your money to a secure account we've set up for you and look at this my phone gives me a warning that this call might be astounding Gemini Nano alerts me the second it detects suspicious activity like a bank asking me to move my money to keep it safe and everything happens right on my phone so the audio processing stays completely private to me and on my device we're currently testing this and we'll have more updates to share later this summer and we're really just scratching the surface of the kinds of fast private experiences that on device AI unlocks all of this shows the important progress we have made as we take a bold and responsible approach to making AI helpful for everyone before we wrap I have a feeling that someone out there might be counting how many times we have mentioned AI today and since the big theme today has been letting Google do the work for you we went ahead and counted so that you don't have to that might be a recording how many times someone has said AI anyhow this tally is more than just a punchline it reflect something much deeper we've been AI first in our approach for a long time our Decades of research leadership have pioneered many of the modern breakthroughs that power AI progress for us and for the industry on top of that we have World leading infrastructure built for the AI era Cutting Edge innovation in search now powered by Gemini products that help at extraordinary scale including 15 products with over half a billion users and platforms that enable everyone Partners customers creators and all of you to invent the future thank you
Info
Channel: Ticker Symbol: YOU
Views: 71,616
Rating: undefined out of 5
Keywords: nvidia, nvda, nvidia stock, nvda stock, nvidia gtc 2024, jensen huang, gtc keynote, nvidia keynote, openai, chatgpt, gpt4, gpt4o, gpt-4o, msft, microsoft stock, msft stock, goog, googl, goog stock, artificial intelligence stocks, nvidia stock news, semiconductor stocks, tsmc, tsm stock, asml, asml stock, nvidia news, nvidia 2024, ai copilot, google io 2024, google io, google keynote, google gemini, google vs openai, gemini vs chatgpt
Id: RHOYo-mbjEk
Channel Id: undefined
Length: 29min 22sec (1762 seconds)
Published: Tue May 14 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.