Librosa Audio and Music Signal Analysis in Python | SciPy 2015 | Brian McFee

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right thanks you know hear me not too loud not too quiet just right cool oh hi everybody my name's Brian I do music and machine learning and stuff like that so I'm gonna talk mostly about the nuts and bolts of the library that I developed to make the rest of my research possible and I'm not going to talk so much about the actual research but if you're curious about it I'm happy to talk later so just a little bit of context I come from a fuel sort of called music information retrieval which I'm pretty sure most of you have not heard of we have a society apparently it's right here it's called Izmir net and basically it's a loose conglomeration of people that are interested in analyzing and summarizing and indexing musical content and that can be anything from audio to score to listening histories to you know historical documents anything that's kind of related to music I mostly work on audio content analysis so from my perspective the interesting problems or things like summarizing features of audio in there musically relevant so things like note onset events beep tracking structural analysis high-level things like predicting tags like is this a rock songs at a hip hop song automatic transcription recommendations search cover song ID all that kind of stuff so that's roughly what I do but I'm just one person in this kind of strange field so it's the community involves a lot of people coming from different backgrounds much like this one but different so in particular involves a lot of DSP so if you're happy with signal processing good machine learning and stats but then also more the humanities end of things there's musicology library sciences and cognitive science all are pretty well represented so that's roughly kind of where this is all coming from so my typical pipeline looks something like get a bunch of audio signals extract a bunch of features from them build a statistical mod and then start to think about some questions you might want to answer with that model this part of the talk well this talk entirely is only gonna cover that first bullet point and maybe a little bit of a second one but not too much so that's kind of motivating what libero say is for so yeah python module to analyze audio signals in general but geared towards music what it includes are basically the nuts and bolts that you need to build in Mir system and reference implementations for some more commonly used methods and helpers and shortcuts and commonly use things so that's kind of what it does why did I sync all this effort into it about three years ago now just shy of three years we have our annual conference in the fall and we had like a late-breaking lightning talk session on adopting Python in Mir as you might imagine Python might not be most natural choice for a lot of people especially the DSP people who make most of the tools that we use they really like MATLAB and I'm sure you all are familiar with that kind of person I used to be one it got better so there was a lot of interest in moving to Python for a lot of reasons that I'm sure you're all familiar with things like API integration dealing with text better reproducible methods the list goes on and on a lot of interest not much movement and basically what we identified in this like half hour session that we had is that we lacked the infrastructure to make it work we didn't have all the you know legacy infrastructure tools piles of MATLAB script etc that people who grown accustomed to using so I just finished my PhD and I was looking for new stuff to do and I thought that's a problem that I can help with I can build infrastructure that's I like doing that kind of thing so that's what I did so a quick word about what the design goals of this library are so number one it has to be easy to use for Mir people particularly it has to be easy to port over from MATLAB code so that means we're not gonna have a lot of objects involved there's not gonna be a lot of like tightly coupled ap eyes it's gonna be pretty flexible pretty open and very easy to shoot yourself in the foot but that's kind of intentionally we want things to be consistent so a lot of the code that we're integrating is coming from different sources right you have your s TFT implementation he has his and they both generate roughly the same thing but they might be a little bit different we want to be able to support backwards compatibility with everyone and we want to do it consistently with the same interfaces same conventions same variable names etc etc we do backwards compatibility testing mostly through very exhaustive regression testing it's all automated things are modular in the sense that if you don't like my particular save on set strength detection function you can plug in your own stick it into the beat tracker and off you go things still work together nicely and finally we focused pretty heavily on quality control so there's a lot of testing a lot of documentation and real focus on making the code readable so it's all pure Python you should be able to crack open any function and see how it works and get a pretty quick idea of what it does and that mostly comes out of my own frustrations in porting MATLAB code and I'm sure you're all familiar with that problem too so the rest of this talk is basically just gonna be demo time because I don't think it would be that interesting or relevant to go into the details of how all this stuff works if you want a thorough overview we have a paper that's in the Proceedings which should be somewhere or you can check the API documentation if you really have some time to kill or you can just pipe install it and hope for the best tip installed will work some of the dependencies are a little bit trickier to make everything fit together but we're working on that and it's a it works on 2 7 and 3/4 so we're all pretty happy ok so blah blah blah what does this actually do so first thing you might want to do is load in some audio seems sensible there's some just boilerplate code to make this work and then helps to actually import lib Rosa this is all just stuff to make things interact well first function that we're going to look at is load and it gives you back an audio time series as an umpire right and the sampling rate of the audio by default everything is going to be down mixed to mono and resample to 22 kilohertz you can override that easily so we get Y and s are so punchy resampling right numpy array integer and that's kind of what it looks like so the song is about a minute long you can do the math if you want so ok loading that works and this will handle pretty much any kind of compression codec you want this is using audio read on the back end which then multiplex is to like every other codec if it works sometimes it doesn't you can play things back we have a demo track that ships with its nice Creative Commons 0 thing that I found on the free music archive so hopefully this will work but too loud ok I'm gonna be using this example a lot for the rest of the talk so should have a run okay that's it basically just goes on and on like that if you're into plotting waveforms we can do that pretty easily when you plot waveforms you don't usually want to plot the time series natively you want to down sample it and do some shaping so we have a display module that does this kind of thing for you you can see the very very bottom it does time axis formatting and all that kind of stuff so that's kind of fun I don't particularly like looking at waveforms but it's a thing you can do more interesting your spectrograms this would be like how the the audio spectrum varies as a function of time we store these in general as 2d numpy arrays first axis is frequency second axis is time and just a little bit of terminology if you're not familiar with this kind of thing so I'm gonna use SR to know if the sampling rate that's how many times we sample the signal per second a frame is gonna be a short snippet of a signal and FFT is the number of samples that are in a frame and the hop length is how far you move the frame between each column of your spectrogram basically we have some defaults these are persistent across all functions but they're easily overridden but if you just want a spectrogram you can do that you say liberal Sadat s TFT that computes to the short time Fourier transform we also have inverse s TFT we also have instantaneous frequency s TFT and so on and so forth this is just a picture of what that's doing yikes all right so yeah it's pretty modular you can plug in your own window functions you can do all sorts of things I'm not gonna dwell on this too much more fun stuff you can do a plot spectrograms this is showing time and frequency just like I described this is what the specs show function which is kind of a smart wrapper on top of nap plot let's em show it does a whole lot of stuff that I'm not gonna get into but there are a few things you should know about so the first thing is this is a linear frequency map which is usually not what you want reason for that is all the action you care about is typically in the bottom of the spectrum for music at least so instead we can plot this on a logarithmic frequency axis so now if you look at these these labels now this is log space instead of linear and you can kind of see more structure so that's fun like Alan said earlier that the Fourier transform is the coolest thing ever and I would amend that to say that it's really cool but the constant cue transform is also really cool possibly even cooler so that's directly doing a log Rob yeah logarithm Utley spaced frequency representation we do that with the cqt function and my mouse stops working that's no good there we go so that looks like this and this is nice because one vertical move is one semitone right so things are kind of shift invariant vertically as well as horizontally in time you do logarithmic amplitude scaling makes it a little bit easier to see what's going on this is also very very flexible and has tons of parameters you can play with okay gonna zip through some of this stuff there's more to it than just for a feature so you can do all sorts of different spectral features an interesting one is chroma so that's how much energy is in each pitch class not each pitch so right there's C 1 C 2 C 3 C 4 and it collapses all down to C and that gives us what we call chroma that's easy to compute just call feature chroma and that looks kind of like this so you can see the white keys on the piano row up here the song has a lot of e in it and then there's this little kind of chord progression that repeats a whole bunch you do all sorts of other spectral features metal spectrograms em FC C's ton its features spectral contrast band with all that kind of stuff is in there those are fun to play with but I'm not going to talk about them we have an audio effects module which maybe I'll skip the examples but this can be useful if you want to separate components so typically when you want to do a harmonic representation you want to throw out transients right because a drum hit isn't going to be all that harmonically relevant so there's a nice implementation of all right percussive source separation I'll just play the percussive component and you can get a feel for what this does so that's the percussive this is harmonic right so we get the harmony but not so much the percussion in the transients and gives us a little bit cleaner representation all right so here we have the original cqt is the harmonic component which has all the horizontals here's what you want stable tones and the percussive x' here are vertical stripes depending on what you want to analyze one of the other might be more useful what are you you fun things like onset detection and be tracking those are pretty easy to do get the onset strength and detect events and here I'm just plotting the onset strength function in blue and all the detected events in red so each one of these red lines we think is a new note don't worry about the plot code garbage this is just to make it look nice ish you can track beats and s to make tempura to do so here you can see that the detective beat events are much more evenly spaced than the note on sets because beats are kind of a subset of notes they're evenly spaced and they're pretty much correct if you want to listen to what that sounds like you can convert beat frame entities back into x and sana fie them and we'll put a little click track on it it sounds like this so gives you kind of a metronome it's dynamic it it'll adapt to slight tempo variations and kind of do the right thing okay very last bit is just how do we deal with time repetition in structure and this will be very very quick yeah very very quick so we can synchronize features in between beats gives us some dimensionality reduction you can do some history embedding to get context for features so it's not just a single frame but the frames around it and you can use that to build a recurrence plot any code structure so this would tell you if you have a real sharp eye that maybe the song is in 6/8 or possibly three four if you look at the spacing between the repetition events and this is the kind of thing that I like to work on my research is analyzing this type of picture and figuring out what information is in there so there's a lot more stuff to parse these things and filter them and do fun stuff with it so I think that's where I'll stop so I've just shown you a few of the building blocks that are in there but there's a lot more we just released 0.4 like two or three weeks ago but we're still developing pretty actively there's some related projects for doing other things that I can't spell correctly we have pretty thorough documentation code online and lots of really awesome contributors who I would not have been able to do this without so thanks to them and thanks to you and I'll take any questions thank you so it is all in Python the sqft implementation is fairly well optimized and it's doing partial blocking so it's not doing a single frame at a time it's not doing the whole signal it's doing chunks and that chunk size has been optimized but but yeah it actually is Python I'm not hiding anything I did try to make it Saipan eyes yesterday and it didn't get any faster so that's kind of interesting I probably don't know what I'm doing I have a project that does transcription into not score but NES chiptunes so it'll take an audio file and convert it into two square waves a triangle wave and a noise channel and depending on what you put into it it works kind of okay but for the most part now I can't that's a very active research topic okay so the question was can we do things like vocal activity detection be tracking a line minute stuff like that we can give you the features to do it I don't want to start putting supervise models like things that are really statistically dependent on particular data sets into the library I want to keep those separate but it can give you the features to do it thank you we have to cut it here less thing to speak it again

Info

Channel: Enthought

Views: 56,573

Rating: undefined out of 5

Keywords:

Id: MhOdbtPhbLU

Channel Id: undefined

Length: 18min 10sec (1090 seconds)

Published: Wed Jul 08 2015