Google's MusicLM: Text Generated Music & It's Absurdly Good

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and rapid growth of text to image text to video or even text to 3D AIS we have witnessed the future models generating visual contents in a way that we have never seen before the text to image generation aspect has gotten so absurdly good that generating images of spectrograms can create comprehensible music if you convert it into audio refusion was the one that achieved this and it is an extension of stable diffusion that was fine-tuned on spectrograms and has made text to music possible in a way that feels like cheating it is pretty much just generating images but imagine an AI generated image can sound like this on the other hand you might have also heard of another text to music service called mubert muber is a bit of a mystery by itself since it is closed-sourced but just to give you an idea of how it works this is their demo [Music] you can generate music with a few text prompts and the whole music generation process is actually not synthesized through a neural network but instead composed together by an algorithm it's a pretty interesting idea since it might guarantee some decent sound quality however this makes generating complex or unique music much harder and if you want to use a long and detailed text prompt I doubt that will work so here comes Google casually dropping a Sota in the field of text to music on the 26th of January they release music LM short for generating music from text and as its name suggests it generates music from text captions while not using any diffusion surprising right music LM is actually based on the research that they released September last year called the audio LM which focuses on synthesizing High Fidelity audio and is the piece that music LM uses instead of diffusion so how does music LM sound like then well hold on to your two minute papers wear a pair of headphones and get ready to be completely mind blown [Music] away [Music] that was pretty good right to have the music being generated with such high quality and faithfulness to the text prompt is something we have never seen before even though you can still hear some incoherency and static noise music LM still generates music at 24K Hertz that can potentially remain consistent over several minutes they have demos online for lone Generations too but I am not going to play the whole thing since they are up to 5 minutes but here is a quick Glimpse for you at different parts of the audio to show you how consistent it is foreign [Music] thank you amazing right well what is also surprising is that similar to diffusion models where you can have image to image or impainting functions music LM is also capable of that by conditioning a piece of audio such as humming [Music] and use the text prompt to edit the style to let's say um guitar [Music] this is crazy they have a whole page of this on their official website you should go play around with it but let me just show you another one where it transfers a piano tune of Twinkle Twinkle Little Star to Jazz [Music] foreign [Music] speechless when I was going through all of these but there's more so there's also the story mode where you can continuously play a piece of music and change it depending on the sequence of texts so you can make this weird and long mashup of songs that somehow make sense don't make sense [Music] [Applause] [Music] foreign [Music] [Music] or have a storyline of text where the music would change depending on the sequence of text what is also fitting is that a story like description can also be used to generate music so taking these Wikipedia descriptions from these paintings music LM can then use them to generate soundtracks which is really fitting for these paintings the descriptions from the screen generate some pretty creepy music [Music] foreign [Music] night generates some peaceful tune and the music generated on the painting of Napoleon just perfectly describes the mood [Music] [Applause] [Music] music LM can also easily generate different genres of music from 8-bit foreign 90s house to dream pop [Music] or play the xylophone [Music] okay [Music] electric guitar [Music] or even the experience level of a musician [Music] foreign [Music] [Music] or 2000 [Music] [Music] it can even play accordion in so many different ways [Laughter] music LM is so much more powerful than any of the previous texture music AIS because of the flexibility it offers and how I can understand a very long string of text on top of that it has a wider generation diversity as in the same text prompt can generate a wide range of different music compositions [Music] foreign or have the same sample but with variations [Music] I am just in awe that it is soon possible to have a fully AI generated movie from only text and the music can be entirely synthesized based on the visual descriptions on the other note in the music out and paper they specifically stated that they have thoroughly examined the possibility of model memorization similar to how text-based large language models can have this means that they have made sure any of the pieces that music LM generates have a significant difference from any of the data used in its training while it is obvious that they are trying to protect themselves from the whole copyright issues that AI art has faced but I am glad that they respect the ethical aspects and the responsibilities which comes with developing a large generative model but yeah that's it for today check out their project page for a lot more demos Unfortunately they did not release their codes for safety issues but they did release a new text and image paired data set called music caps that contains 5.5k music text pairs with Rich Text descriptions thank you so much for watching a big shout out to Andrew lascellias Chris LeDoux and many other support me to patreon or YouTube subscribe to see more and I'll see you all in the next one
Info
Channel: bycloud
Views: 661,672
Rating: undefined out of 5
Keywords: bycloud, bycloudai, text to music, image to music, mubert, mubert ai, mubert text to music, text to audio, text to speech, audioLM, text to song, text to soundtrack, musicLM, generate music from text, text to music ai, text to music converter, riffusion, riff diffusion, spectrogram to music, ai music synthesis, music synthesis, music generation, google music, google musicLM, ai generated music, AI generative music, musicCaps, riffusion app, google text to music ai, jukebox ai
Id: 2CUKU2iAzAs
Channel Id: undefined
Length: 11min 44sec (704 seconds)
Published: Sat Jan 28 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.