Rabbit New AI AGENT Phone UNLEASHED | Rabbit R1 Device OS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
there's a small previously unhype company called rabbit it quietly created a large action model Lam an AI agent capable of executing tasks on your behalf the company just announced its R1 service which is a reimagination of the computer and smartphone powered almost entirely by its large action model you need to take a look at their keynote hi hi everyone my name is Jesse and I'm the founder and CEO of rabbit I'm so excited to be here today to present you two things we've been working on a revolutionary New Foundation model and a groundbreaking consumer mobile device powered by it our mission is to create the simplest computer something so intuitive that you don't need to learn how to use it the best way to achieve this is to break away way from app-based operating system currently used by smartphones instead we envision a natural language centered approach the computer we're building which we call a companion should be able to talk to understand and more importantly get things done for you the future of human machine interfaces should be more intuitive now before we get started let's take a look at the existing mobile devices that we use daily the one device that's in your pocket the smartphones like iPhone and Android phones these guys been here for years and we've grown tired of them the problem with this devices however is not the hardware phone factor it's what's inside the app based operating system want to get a right to the office there's the app for that want to buy groceries there's another app for for that each time you want to do something you fumble through multiple pages and folders to find the app you want to use and there are always endless buttons that you need to click add to the card go to the next page check the boxes and jumping back and forth and so on the smartphone was supposed to be intuitive but with hundreds of apps on your phone today that don't work together it no longer is if you look at the top of ranking apps on App Stores today you'll find that most of them focus on entertainment our smartphones has become the best device to kill time instead of saving them it's just harder for them to do things many people before us have tried to build a simpler and more intuitive computers with AI a decade ago companies like apple Microsoft and Amazon made Siri contana and Alexa with these smart speakers of often they either don't know what you're talking about or fail to accomplish the tasks we asked for recent achievements in large language models however or llms a type of AI technology have made it much easier for machines to understand you the popularity of L chatbots over the past years has shown that the natural language based experience is the PA forward however where this assistance struggle is still getting things done for example if you go to the chbt and use your Expedia plug-in to book a ticket it can suggest options but ultimately cannot assist you in completing the booking process from start to finish things like chbt are extremely good at understanding your intentions but could be better at trigging actions another another Hot Topic is a field of research around what they call agents it has caught the eye of many open source projects and productivity software companies what remains to be solved is for these agents to perform tasks end to endend accurately and speedily the problem is forcing a model to perform a task it is not designed for whether for a language model to reason about web page using super prompts or screenshots we have yet to produce an agent as good as users simply clicking the buttons to fulfill our vision of a delightful intuitive companion we must go beyond a piece of complex software we want it to be in the hands of everyone so we first set up to fundamentally understand how computer apps are structured and more importantly how humans interact with them we wanted to find a way for our AI to trigger auctions on behalf of users across all environments we want it to be Universal not just a chrome Plug-In or a limited set of apps but everything iOS Android and desktop these applications share something in common the interface they all have a user interface so at a philosophical level if we can make an AI trigger actions on any kind of interface just like a human would we will solve the problem this Insight led us to create the large action model or lamb as we call it it is new foundational model that understands and executes human intentions on computers driven by our research in neuros symbolic systems with a large action model we fundamentally find a solution to the challenges that apps apis or agents face we solve it with interfaces lamb can learn any interfaces from any software regardless of which platform they're running on in short the large language model understands what you say but the large action model get things done we use lamb to bring AI from words to action finally we can build a computer that in addition to understanding what you're trying to say can actually help you do things on your behalf we pack the large action model into a advanced rapid OS operating system it is built for realtime interactions between you and the rabbit powered by lamb large action models concept and test results are so powerful that we decided to make a oneof kind mobile device introducing R1 your pocket [Music] companion [Music] it's this is the rabbit R1 designing collaboration with teenage engineering the R1 is a fully Standalone device primarily driven by natural language the accumulation of countless hours of work from our research design and software teams it is simple to use R1 is equipped with everything you need to interact with you and perceive your surroundings it comes with a touchcreen a push to talk button and an analog scroll wheel a microphone and speakers and a computer vision enabled 360 degree rotational camera we call the rabbit eye it is Bluetooth and Wi-Fi capable and it has the same color slots that supports Global 4G LTE network and now let's see how it works I can ask rabit for anything just like if I'm using chbt the difference though is a speed with rabid OS I get a response 10 times faster than most of the voice AI project rabbit answers my questions within 500 millisecs with the push to talk button you don't need to say anything to wake it up just press and hold the button and talk like a walkie-talkie now let's give it a shot what's the nature of the reality the nature of reality is a topic that has captivated the minds of philosophers and thinkers throughout history according to Bertrand Russell reality is composed of hard data derived from sensory experiences and logical analysis it is the amalgamation of our perceptions thoughts and the external world I can speak or type with R1 if you shake R1 a keyboard will pop up you will also know that the response speed is much faster than using some of the LM assistant apps with web browsing such as chpt what's the stock price of Coca-Cola searching for the stock price of Coca-Cola the stock price of Coca-Cola KO is $59. 76 who played the role Oppenheimer in Christopher Nolan's latest movies in Christopher Nolan's most recent film Oppenheimer the role of Jay Robert Oppenheimer is played by Killian Murphy the film is a biographical Thriller that explores the story of American scientist Jay Robert Oppenheimer and his involvement in the development of the atomic bomb now let's see how the large action model works with lamb rabbit R1 can interact with all kinds of applications introducing the rabbit hole web portal I can log into different services and unlock functionalities for R1 it's just like iCloud let's say I want to listen to some music with a rabbit hole web portal I have complete control over which Services I want to activate on my R1 and which provider I prefer for music I use Spotify so let's go ahead and connect my Spotify to the rabbit hole now that this is Spotify login page before I start the login process I want to emphasize our commitment to privacy the way we design the infrastructure for apps that we interact with we don't create fake users or spam users we don't hack their infrastructures and of course we do not store any of your third party credentials instead we redirect you to their login page we never save your username and password and we don't track your data the authentication happens on on their respective apps now I'm logging with Spotify rabid OS recognize that Spotify has been authenticated on my device before this presentation I pre-authenticated some other apps like uber and door Dash just choose your preferred services and loging through them to get authenticated on your device it's that easy now let's play some music play craftworks pocket calculator playing pocket calculator by craftwork for you play another song from the album sure playing computer love for you now all the tracks from the Same album who wrote the lyrics for this [Music] song Ralph hutter and Emil schul wrote the lyrics for the song Computer Love by craftwork which band sampled this song it seems that Coldplay talk is the only known example of a song that sampled Computer Love what do you think about this song Computer Love by craftwork is widely regarded as a groundbreaking and influential song in The Realm of electronic music its Fusion of catchy Melodies robotic vocals and Innovative use of synthesizers has left a lasting impact on the genre I can also use R1 to call a ride get me ride from my office home now of course I will book an Uber ride for you from your office to your home please confirm the ride I have six people with three luggages find me Uber that can fit all of us for six people and three pieces of luggage I recommend booking an Uber XL as it provides ample space for all passengers and Luggage please confirm the ride the ride shows up I just hit confirm Uber's on my way I can also use R1 to order food get me a 12in pizza from Pizza Hut deliver to here the most ordered option on the app is fine ordering a 12-in pizza from Pizza Hut since you mentioned that the most ordered option is fine I will select that for you I just created an order for a 12-in pizza it's going to be hand tossed with a classic marinara sauce and topped with regular cheese please confirm your order that sounds really good I just confirmed order here now I'm waiting for my pizza I can also make R1 to a set of more complex actions that may take multiple apps to finish on a smartphone for example if I want to plan a trip to London I can just describe what I wish in full to R1 I want to take my family to London it's going to be two of us and a child of age 12 we're thinking of January 30th to February 5th can you plan the entire trip for me we like cheap non-stop flights grouped seats a cool SUV and a nice hotel that has Wi-Fi exploring ticketing options to make your trip a reality for your trip I found various flight options a range of hotels to choose from and car rentals available please confirm each option individually for further details and booking so it's all been planned out I just confirm confirm confirm and that's it could you come up with a schedule for fun things to do while I'm over there on it I have prepared a detailed travel plan for your trip in summary you will be exploring London's iconic landmarks visiting museums enjoying delicious meals at local restaurants and experiencing the vibrant atmosphere of different neighborhoods navigation details are also prepared and I can help you book tickets for any events you'd like to attend R1 just planned the entire trip for me that's awesome but it seems like this is a little bit too intense can you plan a easy schedule for us sure I'm working on it please take a look at it and let me know what you think it gave me a more relaxed schedule every day with all the details sounds really good and I just confirmed that I can foresee a wonderful trip Isn't that cool huh R1 can help me reach others just like a phone but smarter it come with a building real-time translator and a notaker removing the final barriers to communication R1 automatically detects the language spoken in the environment and provide the bidirectional translation R1 has an i an onboard camera designed for advanced computer vision it can analyze surroundings and take actions in real time to activate the eye just double tap the button oh funny seeing you here Rick let me take a look this is what I got in the fridge can you make me a nice dish that's low in calories let me see how about a green garden omelette it's a delicious and low calorie dish that combines the freshness of broccoli and cabbage with the creaminess of eggs here's a simple recipe for you nice it recognized all the stuff and gave me the actual recipes rabbit eye can also help you with your documents here's the table I've been working on for job questionnaire can you create an additional column that matches candidates who mentioned rabbit then their questions about how they found us sure let me take a look at the table and add the matching column for you I've processed the table and sent you an email with the results okay now let's check our email I can continue to interact with rabid OS even Beyond R1 let's reply this email directly can you add another column that matches candidates who have included rabbit in their question and are La based I just reply the email H and I got a ref version from Rabbid o through my email let's say I have a unique routine or task I cannot do on my phone R1 can do that too we are experimenting with what we call the teach mode just like how I can teach my friend how to skateboard I can show R1 how to do it and it will learn from me this means that any user regardless of technical background can teach R1 to learn new skills so you go to teach mode you start a new session today I will show you how to generate an image of puppy using me Journey from prompt using Discord first I will go to the servers page and click one my own servers since this is only a general image generation I'll go to Mid Journey text Channel then I will use the image comand along with the prompt here I'm putting a cute baby wild dog with big eyes animated cartoon on raal 8K let's wait for a minute for the engine to start generating the images once it's done let's click on the image to get a link I will then explain to Rabbit how to use this rabbit and annotate it so that I can generate anything not just poppies so let's go back to our web portal submit request it takes seconds for the web portal to finish processing and that's it it's that simple now once we finish the training I can go back to my R1 now let's use mid Journey as I told you to generate a a picture of a bunny in pixel art style certainly Jesse I will use mid journey to generate a picture of a bunny in pixel art style for you please give me a moment to create the image now here you go you got a image generated on M journey through teach mode watch learn and repeat that's teach mode it's that simple that's all the demos for today with land fast evolving my R1 will eventually help me to do things that can never be achieved on an app based phone speaking of the current APP based phones the first question we ask about ourselves is why would I need a new device if I already have a thousand iPhone my iPhone can't do any of this at all we do not build rabbit R1 to replace your phone it's just a different generation of devices the app based system were introduced more than 15 years ago and the new generation of native AI power devices are just getting started here's a quick recap R1 is our companion that hosts the large action model with natural language I can use it for WI range of tasks ask anything direct actions complex actions AI enhanced video calls notaker translator with a rabid eye computer vision and experimental teach mode on the hardware perspective we got a 360 rotational camera a global 4G LTE SIM card a push to talk button and an analog scroll wheel one last thing what about the price now before we reveal our price I want to do a quick comparison here are some of the best phones on the market right now you got iPhone you got latest version of Android phones we're looking at somewhere around 700 to $1,000 for a top phone with an app based system I bought my new iPhone 15 Pro Max last year and it's the same experience as my previous ones here are not so smart smart speakers they're asking roughly around $200 but they're all outdated and finally here are a couple of the new things with only large language models you got AI paying asking for $699 plus monthly subscriptions for their base models you got tab asking for $600 and you got meta reband glasses asking for roughly $300 remember these are the things with only large language model with still think these were too expensive we priced the rabbit R1 at $199 no subscription no hidden fees you can order the R1 now at rabbit. and we are shipping Eastern 2024 I can't wait for you to experience the R1 for yourself thank you rabbit did post some research showcasing what they managed to achieve in the field of learning human actions on Computer Applications they claim to have developed a system that can infer and model human actions on Computer Applications perform the actions reliably and quickly and is well suited for deployment in various AI assistant and operating systems our system is called the large action model lamb the large action model or lamb emphasizes our commitment to better understanding human actions specifically human intentions expressed through actions on computers and by extension in the physical world they talk about something called the neuros symbolic model our key observation is that the inherent structures of human computer interactions differ from natural language or Vision the applications are expressed in a form that is more structured than a rasterized image and more verbose and noisy than a sentence or a paragraph the characteristics we desire from a lamb are also different from a foundation model that understands language or Vision alone while we may want want an intelligent chatbot to be creative lamb learned actions on applications should be highly regular minimalistic per aoms razor stable and explainable language models are ill equipped to comprehend applications with raw text here they show the average tokens required to complete various tasks on Airbnb Google flights Shane and YouTube music by the top leading llms right now Claud GPT 4 and other similar models they say we measure the token is required to represent common web applications across different snapshots in raw HTML state-of-the-art large language models with their existing tokenizers have trouble fitting the raw text application representation within their context window so you can see here this shows their context window Claude at 200,000 GPT 4 1106 at 128,000 as you can see here these maximum token limits the context window is not enough to complete most of these the lamb learns actions by demonstration Lamb's modeling approach is rooted in imitation or learning by demonstration it observes a human using the interface and aims to reliably replicate the process even if the interface is presented differently or slightly changed as lamb accumulates Knowledge from demonstrations over time it gains a deep understanding of every aspect of an interface exposed by an application and creates a conceptual blueprint of the underlying service provided by the application here's a quick snap snapshot of what looks like the various different actions that are demonstrated by humans and how they turn into recipes used by the lamb for example you record you add Ed Sheeran's photograph to the playlist named my music and play that playlist you go through and click on the various things to demonstrate how it's done and here are the two replays that do something similar although not identical we believe that in a long run lamb exhibits its own version of scaling laws where the actions it learns can generalize to applications of all kinds even generative ones over time lamb could become increasingly helpful in solving complex problems spanning multiple apps that require professional skills to operate and they post some results they call early signs of lamb competitiveness in web navigation tasks although recent web navigation algorithms have shown human level performance in simulated environments they struggle on real websites when tested on the Mind web Benchmark data set the most effective method only achieves an accuracy of 70.8% here they provide a preliminary evaluation of lambs using their own Benchmark some showing accuracy rates as high as 89.6% 81.9 Etc the large neuros symbolic lamb is the highest by far out of all these at 89.6 they compare it to flan GPT 3.5 GPT 4 with various training methods now some people have question why don't we get to see when he has to confirm flights hotels Etc another write up on the subject said the following it seemed that the keynote the rabbit keynote was playing an awful lot of tricks in terms of what it actually showed on camera Lou would make audacious requests like asking rabbit to book flights and hotels and pronounce the task completed without actually showing that it was completed if you wanted to get the R1 for yourself it's going to cost $199 however this is a pre-order so it's not going to ship immediately it looks like they begin shipping to us addresses for pre-sale purchases in March to April of 201 24 and you have 14 days to cancel the order if you change your mind however like Ben and others have mentioned on Twitter we're not quite sure if the keynote showed everything or perhaps highlighted the most impressive parts we likely won't know until the first people start receiving their R ones in a few months from now looks like about 3 to four months from now anyways what do you think do you think it's real and assuming it is real would you get something like this to help you complete ious tasks on your computer and phone what of the cases I think this is the future the future is AI powered operating systems that complete tasks on your behalf very soon we won't be typing and clicking but rather like a top level General simply commanding our troops of AI agents to go forth and do our bidding will this be the first viable version of that may is West rth let me know what you think in the comments I'm all ears
Info
Channel: Wes Roth
Views: 35,423
Rating: undefined out of 5
Keywords:
Id: _V8n-zRHGm4
Channel Id: undefined
Length: 30min 54sec (1854 seconds)
Published: Thu Jan 11 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.