Anna Wszeborowska - Processing music on the fly with Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] this podcast is brought to you by LMU Munich okay so Anya ishka is going to give her talk on processing music on the fly with Python give her a big hand [Applause] hi yeah my name is ana Brusco tricky one and as Alessandra said I'm going to tell you how we can process music on the fly with Python but before we dig into details like details would let me introduce myself what I do and why I even talk about things like that so I work in a Berlin based company called Ableton it's a company in the music industry and it's especially popular among electronic music makers we developed many three products first is a digital audio workstation which means it's a piece of software where you can record edit and produce your music but apart from that it was designed to be an instrument that you can take to the stage and perform life hence the name last year we introduced link which is a technology that allows people to jump together on different electronic instruments thanks to synchronizing them in time over the wireless network and last but not least we developed this piece of hardware and that's what I do at the company these controller lets you control life without looking at the computer screen so it helps you capture your musical ideas quicker okay but let's back get back to our main topic which is processing music what does it mean to process audio well basically desire any modifications on the incoming sound you can think of but in our presentation we're going to focus on a very particular example which is transcribing music what is transcribing music mean it's basically presenting notes in a musical notation and and what we are going to do is to first play sound then read this audio string detect what notes were played and convert them to a music notation to some notation and ask other instrument to play it back for us so we have to think of three things basically first of all how to read our data how to store it so that we can process it later how to detect our notes how to figure out if the note was first of all played at all and then what pitch it had so was it like C or E and in what oak tape and then how to represent it so that other instruments are able to play it not caring what was the initial instrument applied it okay so let's try to briefly find answers to these questions without getting into implementation details yet so first of all let's think about what is an audio stream how to read these data so audio stream is a continuous signal right and when we want to perform some operations on it basically first of all we have to digitize it right so that we have a finit set of numbers instead of infinite set of continuous signal values so to come up with a set of finit numbers we have to sample it which means we decide was what is our sampling rate which is how many samples of sounds we're going to take per second and obtain this values okay so now we end up with this grid function with finit set of numbers but their values are still an infinite set of numbers because they can have arbitrary numbers of infinite resolution right so to fix that we need to quantize our signal it means we just decide what is our set of values that we are going to map the real continuous signal values into digital values let's say can be integers so it means that we get our amplitude that can be any real number and in our well in the simple scenario we just find the closest integer value to this number okay so now as we know how to read it theoretically we need to decide how to store our data what data type to choose so that we can later quickly manipulate our data I perform some operations in an efficient way and how to even store it so that we don't use up all our memory okay next question was how to detect notes so first of all we need to find out that the note was actually played in here you you see the wave form of a wave from plot up to notes played over two seconds so it's easy to tell okay this is where it's not happened and once we find this we want to know what pitch it had so we are rather concerned with frequency of the sound here we can't really tell for in time domain that's why we convert it to frequency domain and you can see that here are two significant page meaning two notes had different sounds had different frequencies okay so then if we figure out how to item find these values we have to think of how to represent this note that we found so in acoustic music what we're familiar with is this this music notation everyone probably has already seen and we have to think about what's the equivalent of it in digital world what is the standard so that other electronic instruments like synthesizer or synthesizers or other software can actually play it back okay so this is a diagram of duplication I prepared for you to have some scenario that we can follow and have a closer look at the implementation later so before we see it live how it works I hope the demo he's gonna he's going to work and let's quickly brief how what the concept is so we're going to play note read as we want to process the sounds in real time we need to read chunks of data at a time and then process this chunk of data processor consists of two parts as mentioned detecting on sets and pitch then write the note down once we found it and send it to some other instrument okay so let's see how it works first of all we're going to see a pure Python version of the application which is I'm going to play this wonderful thing some people's elementary school drama and I'm going to ask my application to play a piano when I play a playback Canon note when I play this recorder okay see if it works ah sorry we have no idea how to put right so didn't make it louder maybe are you it comes here it okay can you hear it okay you can hear the pan fine okay this is good okay okay whatever that worked so now no right way let's now with some very slight modification we're going to ask it to okay we're going to ask our application to where it's my console okay to use a piece of software to interpret this note what we are going to use okay is um before mentioned able to live well as with live demos everything can go wrong right and it usually does okay so I hope we're back on track so let's see it's now focused on some implementation details of this thing so what I used for reading the data was a PI I do which is a set of Python bindings for poor audio music which is a cloth trust pattern library for playing and recording audio it also supports real-time input and output is what we needed this is how you instantiate the stream and what's interesting in here is that it works it can work in either blocking or not blocking mode here we're using a non-blocking mode using some callbacks and let's see what is the callback the signature here so what's important here is that a callback needs to always return a frame count sized array of data and a flag letting our application know if it has to if it wants to receive some more data okay so now we know how to read it now let's think of how to start in a in a previous slide you saw that we're reading the second line we're reading our data which we get strings we read it into a numpy array and we convert it to integers of course it's easier to manipulate integers and strings right and then you might ask why are we using the umpire rice and not - lists well it seems obvious because number arrays give us a lot of useful optimized routines make operation on big matrices which is our data and it provides us with some common it complicated operations that we widely use here like fast Fourier transform and so on but what's also very important is to know why actually nepai arrays are more performance than Python lists they're both implemented in C so what is the difference well the difference is that Paden lists can store elements of various types right so where as number array takes elements of the same type so it means that padding lists while it allocates memory it actually creates an array of pointers to Python objects because it needs to store data information as well whereas number array can simply store a pointer to contiguous contiguous memory because numpy array is kind of a Python object wrapped around theory as a result number numpy arrays can benefit from vectorized implementations whereas Python lists sadly need to perform a check type and preferable dispatch for for each element okay let's continue so we read our data we started in numpy array now we want to process it so first we want to detect the onset we want to know that they're not happened so what we do is we calculate the power spectrum of the signal and it basically represents the strength of variations in the signal so here we see two big disturbances in the signal meaning two nodes replied then on top of it we apply into the Greenline spectral flux spectral flux enhances this changes even more because it's a it shows us how quickly the power spectrum changes over time because it's obtained by comparing the power spectrum of one segment of data against the previous one okay so now we can see that there are some peaks but not all of them are relevant for us we we want to end up with just two visible Peaks not so many of them so what we do is we apply a threshold in function which is the red line in here and threshold in function is obtained by getting some like defining what chunks of data we are going to average and then multiply it by a constant so we have two parameters that we can tweak one is threshold window size which is the number of segments that we are averaging while the other one is the multiplier so and we can tweak it to obtain results that are satisfactory for us here we can see that we could tweak it a bit more because still we are not left with two significant Peaks and in the end implementations actually it looks a bit different okay so once we found our Peaks which are our notes we need to determine what frequency they have so to do this we create something called sub strim and it can be thought of a spectrum of spectrum of the signal while formally defined as an inverted Fourier transform of the logarithm of the spectrum but let's not be scared about this what we are what we want to focus on is that the inversion itself so steps from is the name derived from the one word spectrum but with just some lead letters inverted just to put actions on inversion process so why would we be inverting spectrum so basically when we calculate the Fourier transform what we what we do we find periodic patterns in our signal meaning we find sinusoids that appear in our signal whereas here we want to do something similar but with harmonics so it's good to know that every node consists of fundamental frequency and it's multiplications and it also appears in our spectrum with some frequency and it's also known that for high frequencies the harmonics are more cars so their frequency is less that's why in in sub strim the quick receive value that we see here which is like frequency also but inverted the high frequencies will be represented at the beginning of it okay let's I guess enough signal analysis for now and in this sub Strom we want to find the tallest peak because it's our fundamental frequency but we can also before we do this we narrow the frequency frequencies to the ones that are interesting to us I narrowed the eight to frequencies that I can play on this thing which is from 500 to 1200 Hertz okay so mmm we applied a narrowing and then we find the maximum value in our sub strum and then we have to convert the currency domain to frequency domain and it's obtained by just dividing sample right pair and value from the septum quick currency and variable okay so here this is our really our example that we've been analyzing the whole time as we saw there were peeking a frenzy domain is around 28 but we have to remember that we narrowed our sets from so we have to find the index of the start which is four twelve hundred Hertz and it's number 36 and add it to our index and that's how we obtain our fundamental frequency which in this example leads 689 Hertz which corresponds to a note F I think okay so as we as we could see here we applied also some correction to that our onset detection algorithm because we eliminate notes from the frequency ranges we're not interested in okay so we have our notes we have its pitch now we want to write it down right how did somehow so as we know that we can just write notes and stuff for acoustic music what do we use for electronic music we use standard called MIDI MIDI stands for musical instrument digital interface and it's a protocol that defines how electronic devices can communicate with one another it sends MIDI messages that contain that consists of three bytes the third byte contains the message type we usually use note on a node off to indicate if the node should start being played and stop being played and the number of a channel we see that channel is encoded on four bits it means we can have up to 16 channels pitch and velocity velocity is perceived as loudness of a sound okay so as we can see here we have seven bits per pitch which means it can take values from 0 to 127 but then how does it correspond to frequencies that we have how do we map these values to frequencies well that's how and this is basically what happens when you apply it to frequencies so this is a chart that shows us what MIDI values different notes have in our example you can see there are note F has media value 77 okay so we found our notes we wrote them down and code it now we can fit it to some other application or a synthesizer that's what we did the first pure Python implementation was using highfalutin and it's also a patent binding for something called fluid synth which is basically a real-time software synthesizer for generating music and it can convert a media node to an audio signal using something called sound font some phones defined instruments I chose this lousy piano but there are many other sounds that you can find there are libraries available online okay and in the second scenario we could reuse our bigger application thanks to setting up a virtual midi port and that we wrote information to and then our digital at your workstation could read from it and we set it up using a library called simple car MIDI written by my colleague and is actually really that easy to set it up two lines but one line to set it up and the other sends notes and and as it uses carmody which is a Mac OS framework that provides ABI for communicating with MIDI devices it unfortunately only works for Mac OS but there are plenty of other solutions available for other platforms probably not as easy but it's possible okay so to wrap it up what conclusions can we draw from this kind of hack well so first of all even though Python would not be an obvious choice for audio processing applications we can see that it's good enough to code something like this to prototype our ideas quickly to try out different solutions to identify bottlenecks per our application and it's all possible thanks to our amazing numerical libraries especially thanks to super simple input/output operations while super simple comparing to C or C++ which are usually choices for audio applications and a set of other really good wrappers around very useful libraries with great API so the code that got executed today is available on github and you can see that it's not a lot of code at all so yeah that's it I can only encourage you to play around with Python and come up with musical acts as well thank you good question come on this is able to detect two different yes the thing is that this this hack is focused on mono a sound which is that's why it says real-time on or do to me D because that's what I join do in such a scenario PT detection is much trickier because of overlapping frequencies as you as you noticed so and also that's why it's so tricky and pidgin section algorithms for polyphonic sounds are not super effective just like the best ones are up to 70% or something yeah so that's the tricky part it is possible it's not always very accurate though yeah sure but you have to but what like for example a violin like you detects what instrument is being played or yeah sure it's like it would be similar I was just thinking of yeah we can if it if it's played a note by note it's gonna work sure human voice I can like well but it's not when you just deal with our scenario it's not a problem this doesn't matter what instrument is being played like for example I can ah we don't have our and I can yell to it like ha and it's going to be detected it's not a problem but it's a separate thing of detecting what words are said or to detect if it was a human or a violin then it's different spectral features would have to be analyzed but this is just the pitch so everything gives a picture that's why I was sometimes clapping sometimes yelling sometimes playing a flute it's yeah everything that has page it's going to be detected there are instruments that don't have peach like percussion instruments but any other question okay thank you very much
Info
Channel: Epic Python Videos
Views: 55,974
Rating: undefined out of 5
Keywords:
Id: at2NppqIZok
Channel Id: undefined
Length: 24min 26sec (1466 seconds)
Published: Fri Nov 18 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.