But can ChatGPT-**4** write a good melody?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I only have surface knowledge when it comes to AI, but weren't there AIs that were trained on a lot of music, and which were able to generate more music in a pretty convincing way? Wouldn't we get a better result by combining one of those AI with something like chatGPT than we would by only asking chatGPT? After all, like anyone, and AI can only be good at what it has been trained on. I know nothing about music and wouldn't be able to create something as good as what chatgpt4 did in this video, is it surprising that not being trained on music, it performs worse than a trained musician?

👍︎︎ 10 👤︎︎ u/AskingToFeminists 📅︎︎ Apr 25 2023 🗫︎ replies

I asked it to write a song about a baby growing into an eco-warlord in the climate wars of 2034, and make the guitar tab. it was able to pump out a 4 chord punk song with the verse and chorus the same progression and a different bridge.

it was good enough for the time spent. on par with what a human could do in the timeframe it took.

👍︎︎ 9 👤︎︎ u/MakeTotalDestr0i 📅︎︎ Apr 25 2023 🗫︎ replies

Really interesting, and I love your punny titles and composers. I’m guessing GPT-4 isn’t quite at your level of wordplay yet, either! I was as fascinated by your processes (with GPT, reasoning through prompts, your code, etc.) as by GPT itself.

It’s astonishing that the model can produce anything coherent, and yet also kind of puzzling that the outputs are so bad. I’d put them in the bottom 10% of a mediocre freshmen theory class in the fall. Those freshmen would certainly fail.

I’m on a committee meeting this week to talk about AI impacts and will definitely mention this. Thank you for sharing!

👍︎︎ 5 👤︎︎ u/Sohanstag 📅︎︎ Apr 25 2023 🗫︎ replies

I've created 60 AI-assisted (human-in-the-loop) melodies last year with a new model. There is no chance of GPT-4 creating anything catchy.

https://www.youtube.com/playlist?list=PLoCzMRqh5SkFPG0-RIAR8jYRaICWubUdx

👍︎︎ 3 👤︎︎ u/zero0_one1 📅︎︎ Apr 25 2023 🗫︎ replies

Somewhat unrelated question, but what are some sources that cover recent events in the field of AI in a somewhat accessible way for non-experts?

👍︎︎ 2 👤︎︎ u/TumbleweedOk8510 📅︎︎ Apr 25 2023 🗫︎ replies

I hate long winded videos but I hope this guy mentions that a lot of melodies are just variations on scales (Pachebel's Canon being a brutally popular example). For sure I would expect an LLM to be able to pick up on that pattern if trained with textual descriptions humans have wrote or even possibly sheet music.

👍︎︎ 2 👤︎︎ u/NovemberSprain 📅︎︎ Apr 25 2023 🗫︎ replies

I kept asking it to write the melody for “Oh, Susanna” and all it would give me were random melodies where the note G was repeated 27 times in a row.

👍︎︎ 2 👤︎︎ u/Read-Moishe-Postone 📅︎︎ Apr 27 2023 🗫︎ replies
Captions
a little while ago I released a video in which I explored whether chatgpt3 could write a decent Melody I Then followed it up with one exploring whether it could write four-part Harmony the results were mixed on the one hand it was pretty amazing that a language model that was in no way explicitly trained to do this could even respond to the question coherently but on the other hand the music left quite a lot to be desired in the comments for both videos I noticed a common refrain it may not be good now but just wait for gpt4 well guess what gpt4 is here and now we can find out just how good or bad it is at music but before we do though I'm gonna wax philosophical for a moment I have to be honest I didn't start out with high expectations for gpt4's musical abilities the reason is that large language models are just that language models they're statistical machines that take in language and spit out language human cognition on the other hand involves all sorts of specialized subsystems for things like math and logic sensory perception modeling physical reality storing and retrieving memories theory of mind and obsessing over the outcome of sports games and TV series at all of this is orchestrated by some sort of executive functioning now I'm not a neuroscientist and apologies if I'm oversimplifying but that's kind of the point he human cognition is very complicated and is nothing like a simple input output machine let me give you an example I cooked up to illustrate this I described to chat GPT a scenario in which we have several colored blocks on a table first we place a red block on the table then we place an orange block directly on top of the red block then we place a Yellow Block directly on top of the orange block and finally we place a Blue Block directly to the right of the orange block question is what do you expect to happen next a human being hearing this scenario described will quickly come to the conclusion that the Blue Block will fall crucially this is not a linguistic process it's a completely different part of our brain which is doing the work even if our original understanding and our response are mediated by language now when I asked to chat gpt3 this question it confidently predicted that the structure was stable and that nothing would happen its response sounds good couching these statements in some impressive physicsy language but the response reveals that there's no physical imagination behind the words here's the thing though JP T4 gets it right it says the Blue Block is unstable and is going to fall and it's not just this problem I dt4 on a lot of different blind spots that people found with gpt3 and it's much much harder to get it to respond with nonsense so the question is isn't actually modeling physical reality somehow does this enormous black box of a language model somehow contain within it subsystems for modeling physical reality and math and logic and theory of mind and if so will it be able to do any better with music than its predecessor well there's only one way to find out to start out with I took a similar prompt from the melody video asking chatgpt4 to write a melody in the form of pitch duration pairs in Python syntax I asked it to have a varied Contour with several high and low points use a wide variety of note lengths include at least four skips stay within a pretty wide pitch range and be at least 20 notes in length it responded as before with this list of pitch duration Pairs and I went ahead and copied that list into this script which uses my Scamp libraries to play back the music and generate the notation here's what it sounded like um [Music] and here's the notation that it generated as you can see it has a contour that goes up and down a couple times it's got a nicely varied Rhythm and so far things are looking quite a bit better than gpt3 it kind of reminds me of something that Wagner might have written as a light Motif in one of his operas that said it's limited completely to the C pentatonic scale so my next question was to ask it what kind of character it thought this melody had and see if it could generate a new Melody with a contrasting character it said that this melody was energetic and dynamic and decided to give me a calm gentle Legato melody in response that Melody sounded like [Music] it's certainly slower and calmer but I'm starting to wonder about gpt4's expressive range again this is just a rising and falling arpeggio of a major Triad with an added sixth so I decided to challenge it and ask it if it could come up with a new Melody that includes stepwise motion as well as leaps and suggests some sort of changing chord progression instead of just sticking to this simple chord the entire time the results started out pretty boring just as a C major scale but the middle it did something a lot more interesting foreign [Music] and then it kind of ended abruptly I communicated these criticisms and then asked if it could address these issues making the whole thing a bit more like the middle part here's the result [Music] they're just from a C major scale but actually the more interesting issue that's emerging is one of metric phase so metric phase is kind of a fancy term for where you are in the beat or in the measure and if we look at that middle part of the melody the part that I liked the revision puts it in a different metric phase than in the original version here's the original version [Applause] here's the revised version and here's the version that I think it should have done all along placing the high e on the beat but anyway all of this points to the fact that gpt4 really doesn't understand rhythmic context it's concatenating little segments of Melody without consideration for how these segments of Melody line up with rhythmic Cycles anyway at this point I thought it might be good to give gpt4 a fresh start so I asked it to keep in mind everything we'd already talked about and create an entirely new Melody this time of dark and brooding character I wanted to see if those words would be evocative in any way and based on the result I think they were [Music] foreign I then suggested adding a rest to help with phrasing and maybe try adding something new at the end that gets higher in register and has a surprising harmonic twist finally I asked it to write a bass line to go with it to be played by a plucked double bass and I asked it to make it slow and regular so that it defines the harmony I then plugged it into a new python script that allows us to play both Melodies at the same time and here was the result foreign [Music] [Laughter] [Music] having criticized gpt4 for its lack of understanding of musical meter I think it's only fair to point out what is doing well here for one thing it's using the same dum dum dum dum rhythmic Motif several times throughout the melody in the original version of the melody it even used some similar melodic motifs another thing that I find impressive is that the placement of the rests in the bass line matches the phrasing of the melody there's clearly a sense of the melody being made up of three short phrases in my view this was the strongest music I'd seen out of either gpt3 or gpt4 so far so since it's doing okay with two-part writing let's try it on four part writing as with the harmony video I did with Chachi pt3 I asked it to create a four part Chorale in the style of Bach and format the response as a python dictionary if you remember this is what gpt3 produced what gpt4 came up with foreign [Music] pt3 I asked it to add some rhythmic variety to the parts and it definitely did a better job than gpt3 did [Music] next I asked it to change the Corral to D major make it twice as long add some rests and just generally give it a bit more melodic shape while sticking to the rules of four-part writing it wasn't all that responsive to what I said except that it did a pretty good job of changing it to be in D Major in fact the bottom two parts are a direct transposition of the bottom two parts of the previous version which I thought was kind of interesting of course the whole Corral is still riddled with four-part writing errors so I decided to point that out to it specifically asking it to avoid parallel octaves and fifths and generally outline a clear chord progression I also asked it to tell me what chord progression it was using after it generated the music Here's the result foreign it did fix the glaring parallel fifths in the beginning of the tenoren base but it replaced them with something equally bad also the music It produced Bears no relation whatsoever to the chord progression that says it used I tried everything I could think of to point out its mistakes and coax it into making good four-part writing but nothing really worked I also tried the same trick as before and asked it to create a dark brooding Corral and after a little bit of tweaking that did create something pretty interesting [Music] overall I'd say four part writing is a little Beyond GPT 4's abilities and this isn't unconnected with the metric phase problems I discussed earlier since four-part writing is all about the alignment of the different parts that said I did kind of enjoy the pan diatonic style that emerged it reminded me a bit of Stravinsky's Symphony of Psalms so what are we to make of all of this well it doesn't seem like gpt4 has enhanced reasoning abilities extend all that far into the musical domain there's definitely improvement over gpt3 but nowhere near as much as there's been on physics reasoning like the block problem I described earlier my guess is that this is because music represents a bit of a blind spot to the open AI developers it could also have something to do with the fact that gpt4 was trained on images as well as text so maybe that somehow adds to its spatial reasoning in a way that it doesn't add to its musical reasoning perhaps if the GPT developers made a concerted effort with music or made a multimodal model that incorporates both MIDI files and text we would start to see something interesting emerge that said I couldn't help but keep probing gpt4's enhanced reasoning abilities trying to think of ways that I could Outfox it in particular I wanted to find a prompt that a child can answer but that it could not and I found a couple the first one is the simple question of which letters directly follow vowels in the English alphabet pause the video and try it for yourself what are you doing in your brain for me there's definitely a sense that I'm accessing a model of the alphabet and moving through it spatially whatever the case gpt4 messes up this question in a way that makes it look a lot more like a statistical machine than a thinking machine I also decided to look a little bit more closely at the Block problem and created a variation I asked it to create a three by three grid place blocks at certain coordinates and create a diagram showing the grid and here's what it produced pretty good however when I asked it to imagine that gravity pulls the blocks downward so that there's no empty space below any of the blocks it failed pretty spectacularly I also tried probing its ability to create these diagrams a little bit further asking it instead to make a larger 20 by 20 grid with 18 different blocks of three different colors now for a human being this is the same problem just a more complicated pain in the butt version of it anyway here's what gpt4 came up with and here's that same graph annotated with the correct block positions among the many issues here it only included two blue blocks when there were supposed to be six of them if you study this closely though I think that some of the mistakes are kind of interesting I'm curious if you have any thoughts about them in the comments my favorite test though was when I asked it to play a kind of anti-tic-tac-toe the same rules as regular tic-tac-toe except that the goal is to avoid getting three in a row gpt4 turned out to be not only bad at anti-tic-tac-toe but also bad at analyzing the state of the game we got to a point in the game where the only possible outcome was for it to lose by making three in a row so I asked it who do you think is winning his response was I have no idea call it a tie then when it finally came to the point of playing the losing move it played an illegal move when I pointed this out it apologized played the losing move and said your turn clearly this whole process was very confusing for it so the question that I'm left with is is writing music more like tic-tac-toe or like anti-tic-tac-toe in my experience music and creative work more generally is much more like anti-tic-tac-toe when I start working on a piece I don't really know what it's going to be the rules by which it's going to operate and as I work on the piece the goal posts are constantly shifting as I re-evaluate my creative goals but what do you think of all of this has there been a real shift with gpt4 and what would it take for one of these so-called AIS to be truly creative also what do you even want the role of artificial intelligence to be in the Arts Oh and before we go I wanted to offer a few resources I've found that offer interesting perspectives on gpt4 and AI more generally the first is a blog post by Brett Devereaux called on chat jpt which helped me to see that language consists of two separate sets of relationships the first is the set of relationships between words and the things that they represent in the real world and the second is the set of relationships between words themselves when we communicate with language we use both of these relationships but large language models only have access to the relationships between Words which is why they hallucinate and make up citation stations in terms of a more Pro gpt4 perspective I listened to a talk by Sebastian vubeck called Sparks of AGI which detailed some of the remarkable things about gpt4 in particular I founded ideas about gpt4 linking with other tools to overcome its shortcomings quite provocative next this write-up by gray Marcus and Ernest Davis called how not to test gpt3 really helped me in my experiments with trying to Outfox gpt4 and in terms of a more zoomed out perspective on AI and its potential effects on society I've been enjoying Ezra Klein's podcasts I'll link to a couple of episodes in the description finally these experiments with chat GPT are only possible because of all the time and effort I've put into my open source Scamp libraries for computer assisted music in Python so if you've enjoyed this I hope you'll check out some of my other videos on music coding and perhaps subscribe to my patreon for this video I'm providing patrons with the prompts and code I used to create the music and I'd love if some of you signed up and created some weird things oh [Music] foreign foreign foreign
Info
Channel: Marc Evanstein / music․py
Views: 665,147
Rating: undefined out of 5
Keywords: GPT-4, ChatGPT, Music, Composition, Algorithmic
Id: d_7EsKcn8nw
Channel Id: undefined
Length: 15min 41sec (941 seconds)
Published: Sun Apr 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.