An Introduction to Building tools with FFmpeg libraries and APIs - Matt Szatmary | August 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
since I'm gonna talk a little bit about using the Lib AV libraries that are part of the ffmpeg suite I really try to condense this down from a larger talk so it might not flow perfectly but let's see how this goes basically what I've noticed a lot in my career is a lot of people use ffmpeg as part of a bigger system this was about 30 seconds of github searching of people who just call system process launched ffmpeg a whole string of commands to try to get the video job done that they're trying to get done I'm not saying there's anything wrong with this if you do this that's great it's there's absolutely nothing wrong with that at all however ffmpeg is of course a wrappers not the right word but it is the interface on top of the Lib AV libraries this isn't the same thing as the live a be project I'm not going to get into that but the Lib AV libraries which that project was probably named after is really the workhorse underneath ffmpeg so it involves well quite a few libraries here there's the little baby format which does the file parsing that'll like read and write mp4 MKV that happens happens to do a lot of the codecs as well or I'm sorry a lot of the protocols as well there's Lib a V codec which does the encoding and decoding that might do it internally ffmpeg ships with a lot of its own codecs or my outsource that to something like X 264 there's the filters there's the devices for interfacing with you know like your deck link and stuff like that is in the device Lib AV util it's just something that's shared across their software resampler xandrie scalers there's all these libraries that ffmpeg uses to get the job done this is a quick video of just the bottom of the ffmpeg option dot c library of all of the options that you can pass in on the command line and this is just the initial options this doesn't include filters this doesn't included some more of the advanced stuff this is just the absolute basic list of things that you can pass in on the ffmpeg command line which just goes to show you how powerful this is under the behind the scenes that all these things have to be converted into commands to drive those libraries so the ffmpeg executable is really responsible for taking those options parsing them and then just running the main loop and that main loop goes through the input basically frame by frame or packet by packet calls all the necessary functions below in those libraries in order to get the end result that you asked for again it's an amazing tool but personally what I find is that those libraries are actually way more powerful than what ffmpeg can actually expose because it can do so much and because command line can only be expressed so much so from that I really believe that there is a la portunity for a lot of people to build their own custom tooling on top of these libraries not saying that that needs to be done for everything it's just a option for when you need it so a common way of doing that of course obviously is to take ffmpeg patch in what you want to patch in the big big problem with that is of course if you're doing something that's very useful for the whole community and you have the time and effort you should really be committing those patches back that's not always practical sometimes you can't change the package back because you change to default and you can't set that back to the community because they expect the defaults to remain the same so now you need to add new flags or switches to the command which makes your work larger in order to get done just what you need to get done the time and maintenance once it's into the package you know you're kind of responsible for it you have to keep it up-to-date which again if it's something that's great for the community I highly recommend we take advantage of that but your patch might not apply to others like you might have something that only is necessary for your company or you or your individual use case so applying it would actually not be good it might be secret sauce you can't submit it because it's proprietary patented maybe or your company just might have a shitty open-source policy in which case my tux is hiring as well so because of this what I see a lot of people go is to a custom for Krauts they take ffmpeg you know get clone and start tweaking the biggest problem is see what that comes in the upgrade so now you have your custom patch and now you do a git pull or get merged and everything just blows up tons of merge conflicts happen in this case you oftentimes in this case start using unstable api's now the code doesn't even compile even if you once you resolve the merge conflicts which means you're inevitably running an old version of ffmpeg this leads to many security problems there security patches all the time in this system so because of that you're running an old version the faff MPEG you're you're you're susceptible to these security flaws so basically what I'm saying here is you can use these libraries to build your own tooling when applicable appears a few links to some tutorials and how to do that I'm going to actually walk through some code here next I just wanted to point these out these are way more complete than what I'm gonna go through the ffmpeg examples are great but they're not well documented so you'll see a lot of function calls and you they're there and you know you need to call them you don't know why the next tutorial I just found today it seems to be pretty good and then of course there's the I don't know how you say it it's ad Ranger or dr. anger I don't know what it is but that's been around forever it's a great tutorial but it's continually going out of date just because ffmpeg keeps changing so if you go and download tutorial 1 right now it won't compile because ffmpeg has changed since that last tutorial was updated so I'm gonna go through some code and this is basically the workflow I'm going to go through I originally planned on doing an encoding example but I wanted part of the subtext I'm trying to give here is that this isn't incredibly difficult and going through an encoding example kind of worked against me in that case so so instead of doing the encoding example is actually easier to write a quick GUI app and actually just display the frames on the screen but what I'm going to go through here is the basic workflow of a basic decoding of the file in ffmpeg the process is basically here we're going to open the file analyze it we're going to configure the decoders we're going to send the read frames from the file send those to the decoders read the results back from the decoder I'm gonna display those in this app it didn't at the end you have to drain the decoders and then clean up the whole library so I just wrote this utility using the Qt library for the GUI not necessary it's just what I was able to put together quickly everything that using from the Lib AV tools I wrapped in this Lib AV include header which I'll try to open-source after this or I will open-source the only reason why I have it yet is it's not in a great state but I'll clean that up and open it up well let me let me start by doing this I'm just going to show you that the app to show you how it works it's just a quick GUI app open file I made this little clip file it plays I don't have it reading timestamps so the frame rates wrong so I created this decoder thread just to keep the GUI responsive normally you don't have to create your own thread here since it's a GUI app I had to have a response about and this play function is basically the entire set of instructions I said before where I said we're going to open the file which is that line on line 47 configure the decoders we do on line 48 we read the packets from the file and this for loop we send the packet to the decoder and the send packet and then the decoder automatically returns the frame here this is just to keep the thread responsive and then of course we drain at the end obviously for those of you who know Livie this is not standard lib baby code I've extracted this away to make it look way easier but this is basically the core loop so I'm gonna set a breakpoint there I did and watch the debug I promise I'm not going to spend an hour going through code that's why I got rid of that encoding example so I will open my clip and I hit my first break point so to dig into the ffmpeg I'm sorry I keep saying ffmpeg I really need only a baby code a lot that is just converting the string and here we are into the open input so what I did in this library is I took a lot of the ffmpeg concepts and obstructed them away slightly so if you notice that function before I didn't have any cleanup code there's actually a lot of cleanup code but I made it automatic with these with this abstraction and I can show that there in order to keep things consistent I try to name things in this familiar similar way so the AV format context normally you have one from Lib AV format which is designated here by the two colons that means you're using the one out of the root namespace or the global namespace so stepping through here basically anything with the preceded by the two colons is the ffmpeg command anything outside as it's my code that calls it I'm sorry live AV command so so here we have the AV open input so this basically just opens the input file you pass in the URL and a pointer to a pointer to a context and this is so live AV can clean itself off later and clean up the pointer and knowledge out later so to make it a little bit safer in our case we're not going to use that feature but it's good they did that so first we open the file but passing in the URL and the result is it populates this format context we get an error back out and in this case at zero which you can't really see in the upper right there because I can't make that text larger but you'll see it's zero so it succeeded next we did the fine stream info so this actually opened the file and read some of it it looked for what what are the tracks what are the codecs what is the file what's the format and it tried to read just as much as the files that needed to to continue on sometimes that's a little sometimes just a lot this is sometimes called to analyze or probe duration of a during enough MPEG command line in this case again it's exceeded returned nothing or zero as an error and then here is where I talked about the cleanup so I take that pointer I get back and I using a C++ 11 unique pointer I immediately attached the deleter so this means later on when this goes out of scope it automatically calls to correct for clothes input function and I don't have to worry about leaking resources it's just all automatic from this point on so that is now I've opened the file I've analyzed it I found all the tracks next I want to need to configure the decoders so I'm going to step into that I'm gonna step over this first so I'm going to try to open a video and an audio track now the clip I used only has video just to make the example easier but I'm gonna try to open both so I'm gonna try to open this video track using this little open function I created here is a lambda this is going to use the F the Lib AV command to find the best stream so it's basically tries to find this the the primary video stream in the case of like mp4 or transport streams you might have multiple streams in the same file this will try to find the best one what it does he deems is best once it's fun the streams that tells you returns a pointer to the Kodak object of what the Kodak that stream is using we're now going to configure a decoder so on this line 160 we're going to create a decoder using the allocate context this just sets up the decoder in memory it doesn't do any initialization the next line 161 takes the codec parameters from the stream we just open and copies those parameters into the context we just created so basically we created a decoder and now we're setting up the decoder to decode this specific track by copying its parameters over once those parameters are copied over we then call open 2 in this case this says ok now actually initialize the codec go do run all the code you need we the structure set up let's go ahead and open up the codec to do our decoding and then finally the same thing I did before we take the pointer that we just created wrap it into a unique pointer so it cleans itself up automatically at the end I'm also going to return in index which I skipped up here at line 156 which is the index of the video stream we just read in this case it's going to be 0 since there's only one track and then it returns all that we're gonna try to do the same thing with audio as I said this file has no audio so it's going to fail immediately but the one difference this one did is it actually passed in the video index so if you do have multiple audio video streams that this way it'll pick the correct audio for the video that we've already selected but this will just return no I'm sorry is this text large enough can everyone see well enough okay good so from so now I need to keep this codec parameter so I'm just gonna stick it into this this map I made we'll come back to that later the audio failed so it's gonna skip over that so now we're ready to decode this part is a modern C++ feature which personally I find very convenient but if I up into this code what you'll see is that actually end up calling this function begin in C++ 11 and beyond anything that has a begin and an end function that returns something that can be incremented you can use in a for loop so in this case it's taking this begin function is creating an AV packet and passing in the get is the way of getting the pointer to the unique pointer so it's passing in the video contacts that we created earlier so that creates this AV packet again we create a pack at the beginning by calling a V packet Alec and then setting up the deleter so the packet cleans itself up at the end next we were supposed to read the frame but my breakpoint wasn't there so it skipped over it but we'll be reading many frames so I'll come back to that in a second and our function pulls the first frame out now this frame is still encoded this is the raw encoded frame so now we can send it to the decoder using this AV sent packet so in the AV send packet again this is a wrapper I built that takes the the context it takes the packet that we're getting out of the for loop and it takes a function that we will call when we get a frame so we find that code at context that I created earlier in the map we found it so we continue we send the packet to the decoder we create a new frame to store the result in and then we say give me the frame and then we err so the reason why we error is because ffmpeg or I'm sorry live AV codec behind the scenes spawned a new thread to do this decode so when we gave it the the packet it's stuck it into a new thread and of course return immediately in order to keep the application going and this is why later on we have to drain so I'll step through there it skipped the frame call back we come back to our loop which skips the lambda again this that's just to keep it the third thing is just to keep the the GUI responsive so we go through the for loop again this time I'm going to step through the read frame um I accidentally skipped this the first time but we'll use this read frame with the format context that was passed in at the beginning and we get which returns the pointer to this packet we got a frame so here is interesting I was going to spend time on this but I'm actually gonna skip it I'm converting to a standard time base so all of the frames operate in the same time base normally every single track might have its own time base so comparing timestamps gets very complicated here I'm using the flicks time scale that oculus developed which is very cool I was going to talk about it but I would probably spend thirty minutes talking about that and I know you're starting to get bored already so I'm going to skip that so we now have the packet and we come through return and we still don't have enough video for the decoder so I'm gonna set a breakpoint there and just hit run I'm gonna get rid of that breakpoint and eventually we're gonna hit this in that frame this frame is in that frame function uses the SW scale library to convert the AV frame object to an image format that my GUI library wants to use so you'll notice I'm actually in line 22 and 23 here I'm actually not changing the size I'm converting from width height the same width height but I am changing the pixel format from whatever it is to RGB 24 and then the conversion happens on line 28 here where it actually populates the pointers of the image and then this emit on frame sends that frame to the display so now the video is starting to render back here except because I'm in debug mode things are very slow back there so that loop just continues until we reach the end of the video skip back out of here one more and get to the drain of the function set that breakpoint so oops move that breakpoint so if I hit that and move back we should see the video going playing now because I removed all the breakpoint so it's just looping as fast as I can and then it hit that new break point so it reached the last frame in that loop or I'm sorry I reached the last frame in the file but there's still frames cached in the video decoder so if the drain knows and that's easy enough we just use the very same send packet function except we create a null packet with zero with a pointer to null pointer and a size of zero so I create that dummy packet I send it to the same send packet function that was calling before here and then that should go through this loop and drain the rest of the packets so that should flush it out although the frame looks very similar because that was still seen we get to the end we exit and as soon as I hit that button all of those destructors are all those de leaders that I wrote at the unique pointers just ran when we hit that last line so when we hit that line all the ffmpeg or I'm sorry live AVG libraries all clean themselves up right at that instant and reach the end of the file so that's the basic functionality to go through the great part about this of course is not that workflow but here you have your opportunity to inject your code so if you need to manipulate the raw packet before it goes to the decoder you have date you have it right here so you'd have packet data so there's a pointer to the raw packet if you want to manipulate the raw video file or the post decoder we have the those pixels right here I just did a quick men set here so for example if I wanna this just takes the plane 1 & 2 I'm gonna stop this code so I just added those two lines it just sets the number 128 to the second and third plane run it I open my file and of course it's black and white so you know thou are cleared out those two color planes so there's and of course it popped in color at the end because I did not put the code there so this becomes very powerful because now you have access to all the data in the stream you have the raw data of the video you have the raw data of the decoded video we have all of the timestamps you have every bit of detail in the entire video and you have an opportunity to change it or manipulate it as the process is going through and in the grand scheme of things I'm using a abstraction like this let me write this program in you know 1012 lines of code using that abstraction library of course so obviously this gets much bigger there is the filter graph which I was going to cover but I've already gone way too long so this will let you set up a rotate so this filter graph all the code is there but this will rotate the video by 12 degrees and played oops because I'm calling a mid frame and not filter frame what I felt a frame function there there now the video is playing 12 grades tilted so you can build the whole filter graph with just little strings like that there's I actually have created the code to turn this syntax into a full filter graph that's in the component I'm going to open source again I'm not sure if I made my point which is it which is if you use these libraries there there's so much potential that opens up and you can build new tools that ffmpeg might not let you do in the command line while simultaneously making it easy to pull a new library so if you were to update ffmpeg behind the scenes it'll probably just compile because that lot that API is very stable and if it did change over the course of years you might have to change one or two functions here or there and you won't have the merge conflicts you don't have the problems of committing back you won't have all that so that's I don't really have a finish but that's pretty much what I wanted to show you guys tonight is there a C API this yes so Lib a V actually is a C API I originally started writing this using the C API so this is actually an example of the C API and this does something very similar to what I did except it just writes from the files to a ppm to disk I originally was going to show something like this but the I felt that I felt that demonstrating the ease of use was actually a little bit better I thought I thought the the wrapper proved my point a little bit more about being able to simplify things especially around cleaning up so you'll notice this little filter has over this little tiny function is one two three four five six seven cleanup lines in order to make sure you get it right so if you miss any of these you're gonna be leaking memory so those those are the things that I really felt feel make using Lib AV card so by abstracting it by using modern C++ by using you know ranged for loops by using unique pointers I really feel it simplifies it so yes there is a C API that is the default API what I showed here was a was a small abstraction on top of that so everything I showed here is all done on CPU so any GPU decoding which is a little bit of a misnomer in a lot of cases because normally the GPU quote unquote decoder is normally typically a hardware decoder that's part of the GPU silicon but it's not V it's not actually this typically the you know normal like CUDA cores for example there's usually a little piece of silicon to the side specifically for accelerating video encoding and decoding everything I showed here a software driven you can use hardware with ffmpeg earlier when I showed when you opened the codec I just used the codec returned by the find best stream but you can find a specific implementation of a codec at that point and pass in a specific implementation to open a specific decoder at that point so if you had a hardware decoder or encoder that's the point you would do it at Heuer I'm calling find best stream and I get a pointer to this AV codec object that is a specific implementation of a codec so you can there's actually a fine codec function and then you can say like give me the quick sync or giving me the env ink decoder at that point and then configure that with open in in my experience modifying the ffmpeg library becomes problematic not at the moment but down the line so I've I've seen that done in a number of cases but usually what I've seen happen at that point is then you get the people tend to or the organizations tend to get stuck on that version of ffmpeg forever because they might go pull the latest ffmpeg or Lib AV and then they'll have conflicts and they'll try to fix the conflicts and then run the code and then it crashes and I had to go figure debug that that in my experience is what happened happens in those scenarios um then they tend to just stick on that version for a very long time and that maybe once every two to three years you pull in another version of a thumb peg I'm sorry why is this why is that not gonna happen so what happened the reason why this is a little bit different from that perspective is the high level API changes a lot less often and when it does the documentation tends to keep up with it so if I was to I mentioned that showed this for this file files to compile this as I said it won't actually compile you know I must actually already fix this one to make it compile but what you'll notice you'll notice a lot of these warnings we're saying it's deprecated it's deprecated these deprecation lines were actually there because the function was specifically marked deprecated in the header file and if you go and look at that function it'll say use this function instead or something along those lines so it if you do pull it and the high-level API does change there's a guide to help you get to that updated API whereas if you changed in the ffmpeg dot C file and you do emerge and you have this giant conflict there's no guide is there to help you it's you got a pic you got to figure it out you got a you have to go in and figure out all the things they changed and how to reapply your changes to that I'm not saying that happens to everyone I'm just saying in my experience that that's happened a lot but I type me just longer but if you type make you're gonna get these same warnings no not warnings errors merge conflicts or errors and then and then they in this case they may have deprecated an API but it'll still work for a while and then they'll lead you to a new one so like this line here about the codec being deprecated that was replaced with the codec parameters field for example but it still works and it's still here and I have a path I have a path to get to the new version this isn't a codec doesn't exist error and then you say we know what happened to Kodak you know I I have a path forward using the high level API [Applause]
Info
Channel: SF Video Technology
Views: 9,394
Rating: 4.8780489 out of 5
Keywords:
Id: fk1bxHi6iSI
Channel Id: undefined
Length: 30min 58sec (1858 seconds)
Published: Tue Sep 24 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.