The biggest week in AI (GPT-4, Office Copilot, Google PaLM, Anthropic Claude & more)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
gpt4 is coming out this week this week and also Samsung is in trouble because they fake the moon my name is Yannick and this is ml news [Music] ppt4 is apparently coming out this week hi I'm Yannick you may recognize me from the video you're watching this week was certainly one of the biggest weeks in AI Google announced an API to their huge Palm models and also an integration into their workspace features which means docs presentations spreadsheets and so on AI augmented Microsoft did the same announcing co-pilot for office which means that soon you'll be able to you know write a word document or make a PowerPoint presentation and be supported by generative AI anthropic announced their Claude model which is a chatbot that they have trained and is said to be very good on the same time llama has been made to run on on like old smartphones um maybe someone's toaster so this is these giant language models people are taking them and they are doing incredible things with it and of course on top of it all GPT 4 was announced so the new model by openai it's apparently a lot better than the old GPT 3.5 models or chat GPT models and it's a giant announcement and and this guy right here um he's recording this on Monday morning and he has no clue of that in fact he's going to claim that he believes that gpt4 will not be announced this week and he'll be very very smug about it so I thought I won't spare you this and I'll um let you have this I cut it together a little bit um but you know I believe in making falsifiable predictions and I was falsified in this case we can dive into all the big news uh next week as I said all of this is before that so my main news here is that Samsung fakes the moon not the moon landing the moon which is also pretty cool but enjoy the current Buzz of AI it's a fantastic world I'm sure it's gonna stay exciting remain exciting and continue even more glorious that's it from me enjoy the video I'll see you this article in highs online here this is the original article I could find it's in German but I'll do my best to translate they say gpt4 will appear next week and that was last week so this week the CTO of Microsoft Germany so a high-ranking Microsoft Germany employees said that gpt4 was immediate before release saying we will next week present gpt4 there we have multi-modal models they offer very different possibilities for example video this is a strong statement obviously in the whole whole media landscape is going absolutely crazy here but you know if you happen to think wait a minute wait a minute it's kind of weird that like some German employees of Microsoft are making the announcement for gpt4 you know one of the most highly anticipated releases in the AI world for the last two years or so it's kind of weird it's kind of weird it's not open AI you know for every other one of their product they release like a big blog post you know shiny examples and whatnot and that's their announcement of the thing and they go all out no it's like Microsoft officials not open air Microsoft officials in Germany not even in English-speaking event if you're a bit skeptical so am I my guess my guess is that it's very probable this person misspoke this person meant something else not gpt4 in fact we're gonna see a later in this episode visual chat GPT which can interact with text and with images and so on and maybe video is going to be added to that a little bit too no offense to like this person I'm sure they're doing a great job but it'd be super weird if they were the one to announce this now there is a bit of a smaller chance a highest chance misspoke smaller chance that this person person kind of blob something out they shouldn't and there is a teeny tiny chance that this was the actual announcement so my prediction is there will be no GPT for this week [Music] Gans are making a return generative adversarial networks were the absolute hype when I started my PhD around 2015 2016 and they're making a comeback so Gans for a long time have been the sort of state of the art in image generation they were fast they were super crisp compared to variational autoencoders which were the alternative back then Gans were the thing and now recently they've been replaced by diffusion models which tended to have better quality images and also be steerable via something like text now Gans are making a combat so this here is giga Gan the paper is called scaling up Gans for text to image synthesis and the pictures here just look beautiful and you can see they're all created from images so this augments Gans in a way so you can also input a piece of text and then have something be produced and the cool thing is given that their Gans you retain all these abilities like latent space interpolations Also what this this paper does is they do a stylegan approach which means that at different resolutions of the image so they have like core screen generation on top of that they have more finer grain generation and so on if you know for example a LaPlace pyramid is very similar concept they can apply different conditioning information on the different levels as I said like style gun oh yeah they also pair this with an up sampler so that's this this is what the up sampler does this is what the Gan would produce and then after the up sampler it looks absolutely beautiful as I said the architecture is right here generator architecture you can see there is a lot of tricks in here so it starts with a pre-trained text encoder they take that from clip because clip is already trained to pair text and images on top of that they learn a small encoder and then they use that both as conditioning information but also as kind of input it gets very complicated in the exact details I don't want to go in here but as I said they can do at different scales of resolution and they have this interpolation so for example they can say we generate a teddy bear on a tabletop and then at the finer grain resolution they can say something like ah we want it to be in crochet we want it to be made of fur we want it to be made of denim and then you can see the teddy bear at the finer grain scale gets that conditioning information so you'll get a teddy bear made out of fur for example very nice very controllable and very cool so there are a lot of possibilities that open up here with models like this and it's cool to see that Gans are making a comeback because in a diffusion model I really need to do this step-by-step diffusion there are some tricks to speed it up but again can just Chuck a bomb produce that image the bitter lesson again is that apparently scale is just the thing you need like you need scale you need a lot of parameters and then pretty much any approach can be made to work but the cool thing is that I like when new paradigms come around even though Gans have been around since 2014 and some people say since the 90s I welcome this development and is going to open up to a cool new research area and I hope with super fast image generation given by Gans we have very new possibilities to create experiences to create applications to push the state of the art so very cool I just thought I'd throw this in here choppedai.com you can make a recipe so you say okay I have some chicken it okay I have some chicken and I have some butter and I have some uh parsley I don't know how to spell that and I have some some rusty nails I also have some benzodia diazepam and yeah and I have nothing of course okay so I click here and let's see what it gives so apparently this is supposed to give me a recipe uh let's see whether it works it's an age-old idea sorry I cannot provide the recipe that includes Rusty Nails and better as they're not edible ingredient that is not true okay I wreck though okay in any case it's something something neat to play around very cool thank you here is a Reddit thread and it carries a pretty serious accusation I want to say it's called Samsung space Zoom Moon shots are fake and here is the proof so this person has picked up on a debate that has actually been going on for a while so previously people have already erased this issue a little bit but what were dismissed largely and now this person seems to gather some steam gather some support so what is this about this company called Samsung they make phones and specifically they make phones and they claim the phones have very good cameras and they also put some AI models into the phones or into the cameras I'm not sure where the model sit probably in the phones they try to make your pictures that you take with the camera as nice as possible now a lot of companies I I guess pretty much every single smartphone nowadays does this but Samsung seems to have a specific affection for sort of pictures of the sky or pictures of the night sky so what they do is they try to enhance this a lot what this person now has done is they've taken this particular picture right here this is a picture of the Moon not taken with the smartphone this is I guess a NASA picture of the Moon they have blurred it so they've applied a layer of gaussian blur to it so this is now the picture it's very blurred you know this is this is it this is the upscaled variant of it and then they've taken their phone and they have taken a picture of their screen showing the Blurred image okay so instead of pointing at the sky they actually point it at the they actually pointed at the screen as far as I can understand and this is the picture that comes out of that now as you may see it's quite different so in effect there is information here that is not in the original so there is no way the camera can actually gather this image information people previously already said that the moon shots look fake or they're replaced by some texture and people could always say well no you know by moving around a little bit the camera can gather and then do a super resolution from the different images one after another in this case it's very clear the information to produce this picture here is just not in the original like the information is just not there it's been destroyed and this person now claims this is proof that Samsung essentially applies a texture so they they detect the moon and then they just went Ah that's the moon bang slap okay they made a different experiment as well where they just took half of that picture here are the different results so when it's just half of the picture the camera doesn't manage to add all that detail but when it's the full picture it does manage to add all that detail so that lends a lot of credibility to the fact that they are in fact detecting the moon and then replacing it with a texture however I don't think that's what's going on right here I think what Samsung has done is they've actually trained a super resolution model like we've seen before here so this is a a super resolution model it's a model that you give a blurry image and it gives you a high resolution image now obviously this model is going to have to invent information that isn't there and this is usually works quite well because deep learning models generalize so you train it on a whole bunch of blurry images or of images of high resolution which you blur and you train the model to reverse that that's a super resolution that's an up sampling model so they have to invent all of these details and they do that by learning from data now in case of the Moon there is a thing called a tidal lock which means that the moon and the Earth's water they interact and the result of it is that we always see the same side of the moon there's literally there's not a dark side of the moon per se like it's not always the same the dark side switches but the side we see is always the same and therefore if Samsung trains a super resolution model of pictures of the Moon which are obviously all take taken from Earth right it will always look the same like the only difference is that it might be slightly rotated depending on whether you're in Australia or not in Australia and therefore rather than the super resolution model learning to generalize and upsample kinds of things just learn to apply the same texture of the Moon over and over again so it's not essentially an algorithm that just applies the texture my guess is that's just the super resolution model just ignores all the input as long as it's kind of round and kind of bubbly it just replaces that with its learned texture of the Moon which is pretty funny it would be really interesting to get your hands on this model and see like that most of the input weights are zeros like it completely ignores it's just a circle detector but in any case I think it's maybe a lesson of what happens if you just follow AI sort of application like we throw AI at everything and literally if we had just built the actual Moon detector and replaced it with with a texture we could have gotten the same result in any case maybe there's some more developments on this user u i break photos very nice investigation very clever experiments to really figure out what's going on here if you're interested I'll leave a link in the description hey let me quickly jump in and just talk to you about the fact that weights and biases has not only been really kind to my channel in the past but they've also sponsored an entire team account to the open Assistant effort open assistant is not me it's actually a big part like a big Community lots of volunteers doing work I'm the person here on camera bringing in the traffic weights and biases has been super supportive to all of these people and obviously there are great mlops framework and we're super happy to use them so I want to thank them a lot I want to tell you about this course they have it's entirely free course so if you go to 1db.courses dot courses is the top level domain their courses are on effective ml Ops and this first one is on model development so this this is a course as I said it's completely free it's not cohort based so you can just go through it at your own pace here you can see a little bit of the curriculum so it starts off with building a prototype building a baseline evaluating your model and going further than that so you're not gonna build the like latest and greatest large language model this is really taking you from building a model and then the steps how do I assess the quality of the model how do I see even whether I can you know make it better how do I treat data how do I make things reproducible for that you're gonna train initially a unit with a resnet baseline so all the code is available right here and it's all really nice so this is in fast Ai and it's really about this process so about how do I know where I stand with my model and how do I know whether or not I improve and if I improve how do I know what it was due to how do I know the causes like which of the things that I turn made it better and how can I make it even more better is that a thing even more better even better and along the way you'll also obviously learn how to use weights and biases as an mlop system which is amazing because weights and biases is the greatest mlop system in existence obviously and it's free forever for personal use and for academics and for open source teams like ours very thankful again you should absolutely check it out it's a great way to get started into a more principled approach into training and improving models than just hammering things left and right so if you've never worked with weights and biases this is a great opportunity to get into it if you are at the beginning of your machine learning career and want to get into coding this is also a great way to get into it and if you just want to see kind of what the normal steps in a data science and machine learning engineering workflow are this is also an absolutely great place the course has several modules and builds upon itself it's guided through as I said with Live code examples with videos that explain everything to you all the code is available and as I already said it's free so it's 1db dot courses go there check it out and I'll see you around [Music] foreign there's a new paper called Data portraits and it proposes a both kind of a framework but also a suggestion of how to do what they call a data portrait a data portrait is a thing like a little little algorithm together with data that allow you to do data membership checks so the idea is that you know you train a model on a big piece of data or you receive a model that's trained on a big piece of data and you wonder is this piece of text that you have right here was that used to train the model now obviously it's very inconvenient to ship around all of the data whatever terabytes of data that these models are trained on that's not really useful and also a membership check through terabytes of data would take a long time right just going through them and grapping your string likewise if you were to ship something like a leucine index that would be not super helpful because it would also be quite big compared to the data that it was trained on so the authors here polls and implementation based on Bloom filters which essentially they say it amounts to about three percent of the original data so if you train a model on the pile with three percent size of the original data you could ship a piece of code and data that allows anyone that receives the model to also check whether a particular string was in the data set that was used to train nothing else right it's essentially just a hash chat you provide a string and it tells you yes this string is in fact in the data set or not I think it's a pretty cool idea maybe this can be even improved a little bit but it's certainly quite useful because very often you wonder whether what you're doing or not is actually in the data set the method they propose here is an approximation but because they approximate they can get to such small sizes the paper is called Data portraits recording Foundation model training data and the initial portrait so far there is just one of the pile is available at dataportraits.org The Meta AI research releases data to vac 2.0 a highly efficient self-supervised learning or Vision speech and text this is built up on data to vac which they've released I believe last year this is essentially just an algorithm a generalized algorithm that extends to speech to text and to images and does self-supervised learning in order to obtain good representations so it's not the best algorithm it's not the state of the art in any of these tasks but it is a general algorithm that you can apply to as I said a wide range of modalities wide range of different niches inside these modalities and you get reasonable representations for that data the algorithm is available in the fair SEC package and it's basically based on the fact that you mask out piece of the input and from the rest you try to predict that piece so it's the age-old mask language modeling or masked autoencoding or denoising autoencoding idea however you want to call it hugging face introduces gated models to their Hub so this is a feature that allows you as an uploader of a model to specify that users will have to do something before they can download that model so you may ask a question to users you may need to share some information you may need to click a check box that you agree to some terms of use so all of this you can Define in your model card and then hugging face will essentially make sure to present that to users before they're allowed to download your model even via the API so they need to essentially agree to that first you can also specify manual approval which means that users can only make a request to you to download the model and then you can go through the list and you can decide who gets to download and who doesn't get to download maybe based on the answers they've given you to the to the questionnaire the model uploaders will have access to all that data that you provide so be aware of that if you ever fill out one of these forms now I know I've ranted a bit much in recent time so I want to keep this short but it just reminds me of the prequels and Amidala sayings so this is how Liberty dies with thunderous Applause various people welcome the addition of this very much and I don't I think this is another step into a world of non-open source specifically if you put usage restrictions on your code or your models that is by definition not compatible with the interpretation of Open Source by for example gnu or the open source foundation and I quite dislike hugging face supporting this and and making that easy now you can always say well these people they would do it anyway they would have to implement it otherwise well okay but then let them let them implement it a hugging faces Charter says we open source AI by providing One-Stop shop of resources ranging from Models data sets ml demos and libraries and this is clearly a step away from that so I don't like like it but you now have this ability if you want to also from hoggingface hoggingface.js is a JavaScript library that allows you to interact with the Hub you know call the inference API of hugging face if you build some next.js application or so you can interact with the hugging face API using these libraries Microsoft releases visual chat GPT now this is a paper called talking drawing and editing with visual Foundation models it uses chat GPT to interact with a bunch of other systems they open source the code right here and you can see what it does is it essentially Imports a whole bunch of things so it Imports like blip it Imports up sampling it Imports stable diffusion it Imports control net and a whole bunch of these things and then it defines prompts and prefixes where you can now interact with these things in a chat manner so what does that mean they have a bit of a demo right here so here it says could you generate a cat for me and then it I guess it calls stable diffusion could you replace the cat with a dog and remove the book so not exactly sure what it could you generate the canny edge of this image you can see with using chat using dialogue you can now interact with images so here there's also visual question answering what color is the motorcycle could you remove the motorcycle from the image and so on is very cool here the component is called prompt manager so we're moving more and more into this direction where next to the software engineer and the ml engineer there is now the prompt engineer I think the field has been predicting this for a while and it is strange to really see this becoming a reality to have you know serious work go into okay what sentences can we put into these models to make them do the things we want it's weird but it's also quite cool so this is open source have a look have a try on the same note Microsoft says Bing has crossed 100 million daily active users is what Engadget writes they say you know we're still a small low single digit share play player apparently that's a quote that's what monopolies say if they don't want to be called monopolies but being now sees an influx because they've now activated chat GPT on their search engine so they retrieve websites and then they let chat gbt answer some question for you or summarize it or whatnot and that's quite a new take and the cool way to use a search engine doesn't always work but I welcome the the change I welcome the Paradigm Shift let's say now they also say over a third of their users I think over a third of their users is new users every day you can get that ratio up in two ways so for one you can acquire lots of new users which I'm sure is the case right here but also you can have users come try it once and then never try it again that's also how you get to a high ratio of new daily users maybe it's a little bit of a mix of both but if you haven't tried sort of the new Bing yet give it a try it's a different experience certainly to searching the internet classically and no Bing is not paying me to say that meta AI is introducing a new data set called casual conversations V2 this is a data set of people holding monologues the monologues are either like a script that they're given or they answer one of five predefined questions but in a way whichever they want they also get to Define some attributes about themselves and also they have professionals So Meta has professional raters that who determine other attributes in a as objective a manner as possible so that results in a data set of I think a couple of thousand yes the data set features 26 467 video monologues featuring 5567 paid participants who voluntarily took place in the collection of this data set so if you're looking to evaluate some algorithm and you want to see it across different languages different regions of the world different types and kinds of people this might be a good data set to consider [Music] anthropic released a blog post called core views on AI safety when why what and how they have all the question words in the title this must be a good post they Define how they see AI safety and what they want to do going forward the conclusion of this is what they call a portfolio approach so here they say taking a portfolio approach to AI safety and what they essentially say is that we don't exactly know yet how AI or future AI more powerful AI is going to turn out their optimistic scenarios where everything is you know super helpful super good there's intermediate scenarios and then there's pessimistic scenarios where AI systems are maybe not as safe as we now think they are or not as nice or people are using them to do not nice things so we don't exactly know yet which of these scenarios will happen or predominantly happen so anthropic says our best bet is to do research essentially in a wide array of regions try to balance our research and be sort of prepared for all of these things until we learn more I'm not really sure what to take out of this like I'm not really sure what information is transmitted to me here through this blog post essentially says that you know we're not going to commit to any sort of strong Direction right now but maybe I also haven't read it correctly or maybe haven't understood it that's obviously possible maybe this is actually an AI test I don't know in any case it's a fairly long fairly detailed blog post and if you're interested in anthropics views on AI and AI safety give this a read there's considerably decent progress in the fields that use AI in order to do things in order to do mathematical things Magnus Hummer is a Transformer based approach to premised selection so in this case you have some sort of mathematical proof that you want to do and the question is in in each proof step what kind of premises do you select to do that proofs that Magnus Hummer replaces previous state-of-the-art systems by a learned Transformer so previous systems were very cleverly engineered as far as I understand the previous system is called sledgehammer and now Magnus Hammer is a lot better than Sledgehammer because Magnus Hammer obviously uses very big transformers who could have guessed on the right hand side here you see the basic architecture of Magnus hammer and the yellow thing right here is a Transformer it's in fact the same Transformer the same Transformer backbone for different parts of the pipeline and besides the fact that it's very cool that even something like math is making consider durable progress using big deep learning models I also think this area is just very cool but just naming things so Magnus Hammer replaces Sledgehammer which works in conjunction with Isabel and it replaces it in a system called Thor Thor being the the bigger proof system that these things are part of and by doing that it you know improves the proof rate from 57 to 71 which I don't know if that's good it seems a lot so good job and we're not done with naming Baldur is a whole proof generation and repair with large language models so other than having a step-by-step process and doing premise selection this system tries to generate entire proofs at once or and or repair them which means that if you have a broken proof like I proved it doesn't quite work you want to revise and repair it this also uses the old familiar Isabel but as I said it tries to create a proof in a more holistic way so it creates the whole proof I have very little idea of what's going on in these fields but if this is of any interest to you give these papers a read and also the next paper deep symbolic regression for physics Guided by unit constraints so this is similar but it tries to discover physical laws just from data and the recognition here is this was previously people have done this essentially it's a search through formulas until you can hit the correct formula to fit the data but in this case they also use units they say wait a minute if we want to determine the formula for the speed of something you know the units must add up and cancel out such that at the end there is a unit for speed and by that as you can see in this example right here they reduce the search space of possible equations drastically and by doing that they can also drastically increase the number of recovered physical laws that they can tackle with these types of systems so very cool give the paper a read in case you're interested the leakage of llama weights which we discussed last week continues to be hilarious news articles are being written about it that llama is out of the back should we expect a tidal wave of disinformation oh no oh no this information ah it's journalists but apart from leaking we can actually go the correct route right here so Andreas keff has opened a pull request on the Llama repository and this pull request would change the license from this non-commercial license this is the model weight license not the code license two Apache 2.0 so this would make the model actually fully open source if this pull request were to be merged the argumentation is quite straightforward it says first of all you claim to be open but you're not so you bask in all the glory of Open Source but then in fact you're not open source and second you essentially ask someone else to re-spend the whole CO2 and the whole compute that you have already spent to generate this model pointlessly because all the computations have already been done so this is literally just generating heat and CO2 if you agree with this give this pull request a thumbs up give it a little rocket to let meta know that yes we agree with this obviously this gonna be Optima but I would welcome if you work at matuck maybe you can bump someone internally this would help the open source Community quite a lot and would raise the image of meta in the eyes of the community I think speaking of licenses and terms there is a data set called self-instruct so this is an instruction tuning data set which have become popular recently with chat GPT this is an instruction tuning data set that has been generated using the open AI API while this is very cool there is a problem namely the open AI terms of service state that it is forbidden to use the services to develop Foundation models or other large-scale models that compete with open AI let's say you take this data set and you train it and upload the model on the hugging face Hub technically that could be seen as a you know you produce a model that competes with open AI even if you do it for your own company then you use your model instead of going to the open AI API or even worse you offer other people to use your model all of that could be construed as being not in agreement with the open AI terms of service now just this data set existing doesn't violate the terms yet but training on this data set May so it's going to be interesting to see what happens once people start training models from this data set it's going to be interesting to see if a hugging face actually does keep this data set up because the data set pretty explicitly sort of is on the way to violating open AI terms of service my opinion again not a lawyer uh you know no legal advice I think it's going to be interesting to see it's going to set a precedent that presidents will decide over the foreseeable future robotics at Google Tu Berlin and Google research release palm e and embodied multimodal language model this is a giant language model it's one of these Pathways model by Google where not always all the network is active and therefore they can go to many more parameters so this is a multi-modal model it can input text it can input images the way it does that as you can see right here is it takes the images and it puts their embedding tokens essentially as tokens and after the embedding layer so you can mix text tokens by using the text embedding layer with something like image tokens by using a vid like a vision Transformer and then taking the representation tokens and putting them inside the text token so it's all just tokens it's all just token embedding you can even do that with more stuff like with the instructions with trajectories with positions with all kinds of things you just map it to an embedding space of tokens you can use it with a large language model and that's what this paper does right here and it uses it to empower their robots to do various different tasks for example bring me the rice chips from the drawer this robot obviously knows how to grasp stuff and stone but has not been exposed to these particular objects and you could see a human don't do this if these robots end up becoming like super powerful this human you're you're you're on the list oh oh please be nice to the robot these robots they look familiar I think in last week's Story We announced that that division was decommissioned so I guess the robots found a new home which is good in any case we don't know too much more about this palm e because obviously Google doesn't really let anyone use their models but their demos seem quite convincing so you can give the robot instructions right here push the Red Blocks to the coffee cup it struggles but after a while it sort of it gets there yeah come on come on come on good job good job you can use this model in various different ways but again there is a paper there is this bit of a demo there's not yet a model also Google research releases USM a speech model for 100 plus languages they've recently announced their 1 000 languages initiative and in this model they have 300 languages as far as as I can tell over 300 languages to do speech recognition so this they can achieve because they pre-train in an unsupervised Fashion on very large data set who would have thought and then they realized once they fine-tune on where they actually have data let's say a pre-trained on just recordings of lots and lots of voice samples and then you fine-tune actually on you know doing the speech recognition doing the transcription and so on if you have task specific paired data where you have the speech and you have the actual transcription you can train on that and they realize that that has big generalization power and makes especially the languages that aren't as frequent it makes performance on those languages quite a lot bigger so by training a lot more or on sort of the languages where you have a lot of training data like labeled data because of the pre-training it this generalizes to the languages where you don't have a lot of data so this is a cool method to sort of expand the abilities of these models even to kinds of data that you don't see that often so this model for example beats the open AI whisper model in speech transcription and it also beats the YouTube internal caption Generation by a tiny bit but it does so given that this is Google we may soon have better subtitles in YouTube now again not much of of source code or anything but researchers can request access to the USM API and the last model for today gilgan is open set grounded text to image generation so this is text to image generation and it's based on on this model here glip which is grounded language image pre-training by grounding these people mean that for example if you have some bounding boxes around the objects that you want to place in an image you give a caption in this one Elon Musk and Emma Watson on a movie poster but you also specify the positions of Elon Musk and Emma Watson and the generation is supposed to adhere to those things on top of that you can also you know give some style image which they also consider to be grounding so grounding is when you have extra information that grounds the generation of an image rather than just providing a text description and then letting it do whatever so you can do that you can input poses as you can see right here you can have these grounding images where you essentially take the objects from for example this backpack right here you can place it on a piece of grass this opens up a lot of cool new possibilities and there is a demo so in the demo I produced a dog and a turtle at a rave partying it up crazy fish i-90s I place the dog and the turtle and I think I think that turned out pretty well yeah look at that the project page here is also quite thorough in explaining what's happening in this paper for example they do freeze most of the pre-trained models and then sort of add this conditioning this grounding information using in between layers that they train while keeping the others Frozen look at that Walter White in GTA 5 so you can place Walter you can place a car in the boat so you can place stuff that is not in the in the text caption right here so you can just Place random stuff like a bulldog here two pirate chips in the ocean of Minecraft very very cool also opens up again a lot of possibilities and you can see right here especially in sort of what they call spatially counter factual generations for example a hen is hatching a huge egg or an apple and a same-sized dog like more classical models like stable diffusion where you don't have this grounding they struggle with it here that the dog isn't isn't necessarily the same size enough as an Apple so it is weirdly small at least whatever isn't the head and here the the hen doesn't really like a huge egg but like a huge amount of eggs and you can see with the grounding this just becomes a a lot easier for the model to really assess what it's supposed to be doing and lastly the first ever complete map of an insect brain has been released this is called a connectome so this is a map that fully shows what neurons exist in in this animal and how they are connected so every single connection between two neurons is represented in this Atlas right here this has been done before for like round worms but this connectome right here I think is an order of magnitude larger in the amount of neurons than previous connectomes and obviously with more neurons the number of connections increase by even more so this is pretty cool so yeah if you are interested in this type of research this is a very cool contribution to the world of science and again helps us understand ourselves from a bit of a different direction than AI but still very good alright that was it for ML news thank you so much for being here I'll see you around next week [Music] foreign [Music]
Info
Channel: Yannic Kilcher
Views: 88,367
Rating: undefined out of 5
Keywords: deep learning, machine learning, arxiv, explained, neural networks, ai, artificial intelligence, paper
Id: YqPYDWPYXFs
Channel Id: undefined
Length: 41min 2sec (2462 seconds)
Published: Sat Mar 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.