LIVESTREAM: Setting up FFmpeg and OpenGL in C++ for real-time video processing

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay I believe we are live which is good finally this was supposed to be streamed on the 9th of November from home but my wife I could not handle doing a whole live streaming setup so I'm trying it again this time in the office and hopefully we'll have all the bandwidth that we need to do all of this so let's start off very quickly if you're joining the stream you don't know the context I made a four minute video earlier this week describing what I want to do I want to build a project in C++ that does all sorts of video processing to create an app that can be used for for live performances to create visuals for live performances and I thought it would be very interesting to livestream the entire development process to kind of make this whole field a bit more accessible because uh from my experience of just exploring these libraries and trying to learn how to how to do this how to code this project I found it quite tough to get good resources on how to do things I was stuck for like two days just trying to to get the library to link properly and to compile so I want to go through and as I solve these weird problems show them to you and hopefully if people join the stream they can they can aid me as well so if I'm struggling with something maybe you can spot it and help me out too okay so overview of this stream I'm gonna first explain a detail the app I want to develop if you're not interested in that and just want to learn how to set up ffmpeg and how to start decoding video files in C++ skip through to a specific point in this video where I start doing that I'll put that in the description if you're watching live obviously you will have to listen through the first explanation but yeah I'm gonna explain in detail the app I'm gonna make second I'm gonna start I'm gonna set up a basic app that allows us to display things on the screen I'm gonna be using OpenGL for this and that's gonna help in what we're gonna do later which is decoding video files and ffmpeg but if we have a basic app that can display stuff on the screen then we can start displaying the stuff we're decoding on the screen so I want to do both of those in this one session right so what is this app that I want to make if you've seen the video you've already heard about it but I will give some more context this time around essentially it is it is a visualization tool that should embellish a live looped performance for music and I'll give you the example that I used in the in this in the initial announcement video and we can look through that for a second so this is me in my bedroom at like 1:00 in the morning recording a live looped performance and how that all worked is I set up let me show you the Ableton project if I can find that real quick stream announcement no stream demo streaming demo live demo song live project ok so what I wanted to do is create a couple different layers of of harmonies that I would layer up and then could could play in the chorus and and the way I did that in Ableton is I've got everything in the arrangement view when I start playing from here let me set up the right that should be correct so we've got this this sample that I'm using from from use of days alpha mist love is the message and it starts with all of this not sure if you can hear that yeah so we start with this loop here and here I start recording like the automation allows me to start recording the loops region of a bass and then we play this guitar loop and I start recording with automation a vocal followed by another vocal followed by another vocal so the time at which I record all of my loops is set ahead of time so it will look like this [Music] so that's my base loop and that's looping now and here I'm laying on another harmony so go through them at the same time now if I skip through the video if I skip through the video we'll get multiplied we have like three layers now we have the first the order two years [Music] and now we get to the course and we have those backing vocals so I don't want to go into too much detail here but during this performance I was recording loops and later on in the performance and playing those loops and I can explain that to you now because I'm sitting here like separate from the performance but to anyone in the audience that's not completely paying attention when they hear this chorus it might sound to them like I'm just singing one thing and playing one thing and everything else is just a backing track that's been prepared and they might not be quite as engaged in the performance because it appears to be just a performer doing one thing it's not a very impressive feat however looping all of this stuff and recording everything in the moment live that is actually quite impressive and I think that is quite an interesting thing I want to draw attention to the fact that this is all looped life and I want to show that on a screen I want to project something so that people can see so that they can see what they're listening to and the way I suggested doing that in my stream announcement if I go ahead here was like this hello my let me find it yeah this is how I want to do vocals perhaps they would look a bit like this hold on [Music] so during the course you would see every you would see a video for every sample that's being played the only thing missing here is the guitar sample because I don't have a video for it but you would have videos of pre-recorded samples and you would also have captured moments from the live performance being projected on the screen so I want to have a live looping app that that follows exactly what's happening in in an Ableton Live set or in whatever program you're using to make the music and whatever you do with the music app it should replicate it visually behind you on a projection so how does that work for the performer the performer would when they're creating their live set provide video material for all of the pre-recorded samples that they want to project on the screen and and then they also set up a webcam or some kind of capture device live that looking at them so that whenever they're recording loops live the the webcam will record the video for that loop and whenever they play that loop back the webcam video will be projected behind them if that makes sense you can imagine with this video here that all of the videos that you see of me would actually be recorded live on a webcam so that we're creating this tapestry of where we're creating this grid of visualizations or of clips that are all actually based on the live event and that are proving to the audience that this stuff is not created this is this stuff is not completely prepared beforehand like there is a real performance happening in front of them that's creating the sound that they're hearing okay so I would like to give more detail to how this would work from the perspective of the performer but right now I think that's about enough the the main point is that the performers shouldn't be doing anything extra to make these visuals work if they're working through Ableton or they're using a looper like the like a like a boss looper or a loop pedal system I want to find a way to get the data out of their hardware or their software to be able to also matically create the video counterpart to their their audio performance I don't want them to have to do any extra work during the performance they should focus on the performance but okay what I will detail now is is how can this work from a developer's perspective what's difficult about this project and why do I kind of want to do it so the way it works internally how the system will work internally is its reading' state from Ableton or reading state from whatever device you're you're looping on it I think I'm gonna start with support for Ableton and then work my way outwards it reads state from Ableton and it can do this in a few different ways one way it could do it is through intercepting MIDI messages let me show you this so MIDI Ableton push so you've got devices like this you got the Ableton push you've got also the iocai APIs apc40 you've got cheaper versions like the 72 verse the I see Minnie and these now let me just go in let me just go images the Ableton push here so actually I prefer the way the Akai APC yeah so let me show you this can I just get the image yes so what you have on this Akai is is basically a physical physical equivalent of the the clip view or the session view that you have in Ableton you have one button for every clip that you want to play or any clip that you want to record into then you have like the record enable buttons and you have the the seam launch buttons here so you can basically control all of Ableton through this MIDI controller which means that whenever you do anything in Ableton or whenever you do anything on this device MIDI messages will be sent from that device to Ableton and the way MIDI works on Mac at the very least is that though messages are global to the entire computer any program that's listening to MIDI will receive those messages so you can run a program in the background that receives the same messages that Ableton does and from those messages can figure out what you're doing in Ableton another thing that happens is Ableton is sending messages back to this APC to to color the buttons to show which which slots or which clips are full with data and which ones aren't and the information that it's giving back we can use that that's kind of an even better way to to figure out what the state of Ableton is because Ableton will be directly informing us about which clips are being played when they're being played and also the timing information so there's lots of information running around on your computer that actually gives away what Ableton is doing all we have to do is collect the right data and in a intelligent way rep reduce the internal state of Ableton from that data and once we have the state of Ableton and we know which clip should have which video we can then figure out which video to put on the screen so we read state from Ableton and then when we decode the state we figure out what's what's being played what is a pre-recorded sample being played is a clip that's being recorded during the session being played and based on that we can determine whether we need to play back a video that's saved on disk or whether we need to record from the webcam or whether we need to play back a video that was recorded from the webcam so you have the Ableton State you figure out what that is and then from that state you figure out what you have to do or you being the program what what the visualisation tool has to do and from that information you can basically break it down into into simple play/stop record instructions so you would have an instruction that you can tell the video player like play this video at this time for this length of time and then you would have a stop instruction where it's like stop playing this even that you were looping it for a few minutes now stop playing the video or record instruction which is like from this point in time start recording the web because I want to use that material later and then you'd have like a an additional play instruction specifically for playing webcam recorded videos so we've broken down like we start with the Ableton data we break it down into what what clips are being played what clips are being recorded then we break that further down into what does the video engine have to do which videos doesn't have to play what time doesn't have to play the map and if there's any recordings that needs to make which ones are they when do they have to happen all right so that that is basically the the external interface everything from then onwards will actually just be a generic video player with a few custom features of recording so that's that's where we get to the main system really so the main system will receive these these play stop record instructions and based on that will handle all the necessary video processing to to get that material on the screen so in the case of a play instruction it will find the Associated video file on disk it will open it up it will start decoding it and then as it decodes out frames those frames will be placed on the screen at the right time synchronized to whatever the instruction told it to be synchronized to in the case of recording it will have to open up some live video feed that can come from again a webcam or a video capture card and it will start getting that data in and it will have to encode it it will have to encode it into a compressed format and either store that on disk or store that in RAM temporarily and as its encode as its building up this file at some point it will get a new instruction to to playback and so as it's building up that filled file it will get in go into it and start decoding and playing it onto the screen so that's gonna be the most complicated system because you've got files that don't have a fixed length like you might be recording something spontaneously and then having to play a bit back of it from spontaneously so that's gonna be very complicated but I'm gonna see how that goes that'll be very interesting to figure out because essentially building a streaming video application where there isn't any start time or end time it's all just video material and timestamps and like looped regions and yeah it's gonna be a bit of chaos to structure that properly but that's essentially it that's what I want to do I want to to replicate whatever audio is happening in video and I think the the kind of video processing necessary to do this will be very informative for anyone who wants to do any kind of real-time video processing application right okay next next thing I want to explain like how do I want to implement this so the implementation for this will be there are a few ways there are a few ways you can you can implement this so there are there are a few things that already exist which I want to show you yeah let's go over here so vdmx vdmx is a very popular tool in professional it's a professional video DJ software and it's it's got a pretty big price to it but you know not unusual for for audio video performance software and it's got a massive system that allows you to do all sorts of crazy things and I imagine that I would be able to do my project in vdmx which will have the benefit that it could plug right into any other visuals that you're doing however I don't want to do this necessarily because video mixing seems like a big system and what I actually want to do is explore the more low-level concepts of video processing I don't want to see like okay how can I do this from scratch or from like the bare essentials as opposed to using a big framework yeah and so what I want to do instead of using something like vdmx I want to use the I want to do it in C++ and so for C++ if you're doing any video processing project there are multiple options for how you can build it in C++ so let me show you there's one big one which juice so the juice framework is a framework developed by the guys at Rolly who make like who make these software and hardware instruments and juice is a graphical user interface audio processing framework and it's got its cross-platform it allows all sorts of crazy stuff to be done so if you want to make a graphical user interface application that works on Mac on iOS on Windows on Linux juice could be a very good place to start from because they basically do everything for you and I believe they're their framework and the library is very nice I haven't used it myself I was speaking with someone about my video project and they have created a video engine for juice so if you're doing any working in juice you can also use video engine there that that called let me find it I have it written down somewhere ok Foley's finest video engine so this Foley's finest audio software company have created a video engine video engine inside juice that adds a few classes that allows you to to process video files directly in juice using the same API and also render it and display it on the screen and what he's trying to do is essentially make an entire video editor in the juice framework that's open source so I believe everything that I want to do would technically be possible in juice but again I want to explore the more low-level fundamental ways of doing video processing and I think the layers of abstraction added by juice are are don't interest me so much okay so so another big one if you don't want to use juice but you do want to get off the ground working very quickly is open CV so open CV is is I think a Google project but it's basically open source framework for computer vision processing so any kind of stuff like face detection mapping like probably a lot of the the Instagram filters and snapchat filters you see will be using something like open CV or similar and open CV has everything you need to do stuff with video processing and I think this would be a very interesting way to explore it because open CV seems really cool but again I wanted to go a bit more low-level for this project so I've finally to get to get to the core of the of what I'm doing i'm gonna be using ffmpeg which is a open source encoder and decoder for video files to be able to access the video data that I want and then OpenGL to render my video data on the screen so super simple to C++ OpenGL ffmpeg and there's this one thing that I need called GL FW which allows you to create just an Open GL surface cross-platform so that you can render stuff through OpenGL regardless of what operating system you're on that's what I'm going to be doing again not the best summary I could have given but that's essentially it okay I hope you're still with me let's look at the timeline so first thing I want to do is getting pixel data on the screen then I want to start decoding some video frames and then I want to render videos video frames on the screen that's what we're gonna be doing today these streams will start out a bit like tutorials and as I progress through this project it will turn into me learning and me not necessarily teaching and just trying to figure out how to solve a problem and that will be quite interesting as well but first of all to get up and running I will be basically leading you through how to set up ffmpeg how to set up OpenGL and GL FW and how to start rendering stuff on the screen so let's do that right I'm gonna start with an empty project here I've got video app a video app git repository and what I'm gonna do is just go to my go to my desktop and clone this okay okay so there's nothing in it it's just empty empty repository so I want to start out by making a few directories I'm gonna have a source directory and I want to have a library directory I'm gonna bring this into sublime and create a few big basic files so we want to read me video app plus and I want to also create get ignore I want to use I want to ignore any like Mac DS store files I want to ignore a build directory and that's it for now okay so that's my basic structure I have a library a library folder source folder and yeah let's get to work so how do you set up OpenGL and GL FW on a mac to start developing applications that are cross-platform so the first thing I do is I install GL FW as inside the library folder so if I'm here if I'm here I've got my library in my source folder I'm gonna go into the library and I'm going to install I'm gonna say get sub-module add I'm gonna install a let's see what's it called I have it written down somewhere for the references DLF WGA left of you so I'm gonna say get some what you'll add Oh God so sub-module allows you to put a different repo as a child into your repo so here we've got GL fw and i'm gonna install it inside lib as under the name GL fw so then it starts cloning all of the the data there the nice thing is that if you want to update the if you want to update GL fw you can just check out a new version of that repo inside you're inside your repository so we've got that installed if we look here we've got GL f w we've got to your left W we've got all its sources it's include files everything you need you've got some cmakelists ok so we've got to your left W how do we actually make an application that can use this well I'm gonna be using something called C make and the way C make works is you have you create these C make lists txt files throughout your project and they're explaining to see make how to build your projects so these are like if a readme is an explanation to a human than a see make list is an explanation to a a build tool and you're gonna be explaining how all the different files in your projects should be built together so let me just let's start writing that out so first of all dependencies if you want to use T make you'll have to do like brew install C making and so you'd have to have homebrew homebrew is just the most practical tool in my opinion for just installing these these dependencies C make is as I said it's like a tool that actually it's it's a it's a ok so in in the way build tools work you would have a compiler like like GCC you or clang that actually compile your application and the the line of code that compiles your application to be a really complicated line and key and the way it all gets compiled together can be really complicated so we create a step above that called like which with build instructions where we write those instructions and then those instructions call the compiler and do all of the difficult work so okay C make is already installed that's fine so how does a C make file look good start with like a C make minimum required version so I'm gonna just say 3.14 I don't know if I'm actually using that version and I'm gonna give this as a as a project video app I'm gonna use C and it's gonna use C++ I want to use seam ceased I want to use C++ 14 okay so now let's just do doing a really basic version we can add an executable we can say video app and we'll have a like source main dot CPP file so if we go into source here and I create a file main LCVP okay so then I would have [Music] let's just have a really basic hello world mein I wonder if that works it's giving me some complaints redefinition of Arg V artsy okay let's see if that works so in our CMake list we're building our main our main file into the video app so our executable video app will just have one file that it's that that creates it so then I'm gonna go outside here so I'm gonna list we've got our seeming list readme a library and a source I'm gonna make a new directory build an inside build I'm gonna say C make whatever is in the parent directory so if we're inside build I want to see make this project so when I do that it it did some work and I created a make file here and a make file you can call to build your project and it will execute it so then I say make and it says building target linking X people built Oh God that's not what I wanted to do so now we can run our video app it says hello world great not a video app at all but that's our that's a very basic C make setup you can see make your project and then you can you can make it alternatively C make can create an Xcode project for you so I'm gonna remove my build file folder again and making you and start over and I'm gonna say C make the parent folder but don't make a make file make an Xcode project for me so it will read all these C make files and output an Xcode project so if I open up this build folder I now have video app taught Xcode project and when I open that up I will have video appearance source files I have a main dot CPP so I can then build it in Xcode which is great because now I can use the Xcode debugger there we go hello world program exited with code zero so that's very cool let's build let's link to GL FW so the first thing I want to do is I want to add it as a subdirectory honest add GL FW there so then what that means is it will go into library it will go to GL fw and it will find the cmakelists file of GL fw and it will read all of that out and basically create a new thing for you to link with then all I want to do is after I've added my main file to my executable I want to also target link libraries so my video app I want to link GL FW and yeah that's pretty it for now I just want to link with DLF top of you so let's try that ok generate it all done so now let's let's look in Xcode and see if see if it's working so it's building GL fw it links with it it's all good we're running the video app now let's write some code that we use where we use GL f W so we can keep that but I want to include LF w free don't H instead of printing stuff I want to create a window I want to initialize my GL fw and if it didn't initialize i'm gonna like printf couldn't init fw and we just returned -1 or return 1 okay I still need my standard Lib so let's see if that words can we initialize G a left of you like that yes so I want to create my window sees 640 18 okay if we don't have a window if we couldn't open that window then again I want to printf couldn't open window and then just exit so the problem is if we open a window and then we just exit out of main the window will disappear immediately so we actually have to stay open which means we have to create a a run loop so first of all I'm gonna make I'm gonna say the window that we just opened that's the window that I want to be operating on right now in this Fred so I'm gonna make the context current and so then while as long as the window should not close we can just say see if we could just keep it open like that just nothing maybe glf wait events so we initialize the lfw we create a window and then where is just we stay in this while loop waiting for events from the user as long as the window should not close so let's let's try that amazing so we've got a window there's nothing on it can we close it yeah so we we were able to exit out of that loop by pressing the X button or probably by pressing by doing command Q as well perfect so that's that's a super basic GL FW setup so in our C make lists all we've done is just linked our library GL fw and created our executable with a main file that's it okay let's render some stuff on the screen so with with GL fw inside here we could do our rendering so we can use GL commands like GL clear GL color we want a clear geo depth buffer bit and then i'm gonna use a really old command like GL drill pixels I want to draw a hundred pixels by a hundred it's gonna be just GL draw pixels allows you to just get a buffer of pixel data and put it on the screen in like the most crude way so we have to we have to create a buffer first so let me just do that outside of the the while loop I'm gonna just create an unsigned char buffer data and I'm just gonna say well you want Sychar it's gonna be a hundred by a hundred and each pixel has free bytes a red green and the blue byte so I want to fill all of that with I want to fill all of that with red the color red this is really crude but it gives you an example for how you could just get data on the screen because that's what we want to do so there now let's see data okay so in this buffer we've actually just got one buffer for all of the data we don't just have like one buffer for each row of data which means you'll have a hundred pixels of the first row in bytes and then a hundred pixels for the second row and then 100 pixels for the third so the way you address all of that would have to be like okay if Y is 0 if Y is 2 then we have to go 200 rows down and then get whatever the x-coordinate is so we have to do y times a hundred and for each pixel there free so that plus x times free and that let me let me show that here okay this is how I could fill a buffer width with data with let me see so data at this point will be the row let's see how do I explain this properly so so if this is my my image then every pixel let's see if I can do this properly okay every pixel that we have will have a red a green and a blue byte so the way we would store this is you can imagine it like that you have RGB RGB RGB RGB and so this this 5x5 image would be stored like this so it would just be one whole row of data starting with x0 y0 the red the green and the blue component and then X 1 y 0 the red the green and the blue component then X 2 y 0 until you get to the end of the line and then you start with y 1 X 0 X 1 X 2 X 3 and then Row 2 and Row 3 that's how the data is packed together so if I want to create a 100 by 100 pixel image and fill it with all red pixels then I have to for each x & y I have to find the red the green and the blue byte and fill them with well the red one with 255 and the green and the blue with 0 so we end up with here what we have here Y like we have the width of that one line so that for Row 1 we have to skip over a hundred pixels which means we have to skip over 300 bytes for Row two we have to skip over 200 pixels which is 600 bytes so we do whatever the width is times the amount of bytes per pixel and then if the x coordinate is 2 we have to skip over 2 pixels which is 6 bytes so we have to times it by the size of each pixel and then that will be the red component and then one further one byte further will be the blue component and one byte further again will be the green component so we get red green blue right so with this code I've filled into this buffer all red pixels and then with GL draw pixels I could explain to OpenGL that I'm giving it a buffer where all the pixels are packed in RGB order and that I want to print it so I tell OpenGL like well the width is a hundred the height is a hundred it's gonna be RGB packs and the type of each like let me just do geo trope EXO's I doesn't tell me I was hoping it would give me the documentation for it yeah so this is this is the documentation here we say the width and the height of the of the pixels we want to draw the format that the pixels are stored in and the type of each byte or each component of the pixel and then a pointer to our data so that's what we're gonna do here we've got that the format is RGB packed each component of the pixel is an unsigned byte and then we give our data like that and then supposedly that might work we'll see so I'm getting some warnings yeah it's deprecated people don't really use this kind of OpenGL programming anymore the tendency is to use shaders I don't want to go into shaders too much right now because actually all I want to do is just put pixels on the screen so I'm gonna ignore those warnings and hopefully it will compile so okay this is a common error these symbols are undefined but note that it happened in the linker which means that when we're programming we're aware that this function exists but it when the time comes to actually compile the program we can't find the code for that function and so it doesn't link properly so in C make I have to link with GL FW but I have to link with a few other things so I'm gonna link with extra libs which is a variable I'm gonna make so I'm gonna say if if I'm running on Apple I want to I want to append to my list extra libs I want to append let's see what do I need to ask just framework OpenGL let me put in the the the the windows equivalent I you'll have to fact check me on this one I'm not exactly sure if this is correct but I believe to get those OpenGL functions to work you have to link with L glue 32 L Open GL 32 maybe it's just that one but I'm not sure otherwise let's assume it's Linux and so then we want to just link with l GL l l glue L x11 I think that should work I think that should work for Windows here apparently you have to say C make executable linker flags I'm gonna say standard equals to G and you 99 static static Lib GCC static Lib standard C++ and Windows apparently so I will have to check this on Windows and make it fully cross-platform but I believe those are those are the extra libs you need to link with to get to get OpenGL to work so let's see if that works to run CMake again and it successfully links but I don't see any red pixels and yeah so so what's going on why am I not seeing anything we drew the pixels oh because we didn't swap the buffers so when you're drawing with OpenGL in India left w you actually have one buffer that you draw your pixels onto and when the time comes to render it there's actually you have a back buffer and a front buffer you have two buffers one is shown to the user at any time and the one on the back buffer is the one that you draw on so you draw on your back buffer and until it's once it's full you swap them around and you show that to the user and so then you start filling in your your new back buffer so you have to swap the buffers after you've done your rendering otherwise whatever drawing you do will always be on the back buffer and never visible to the user so I have to swap my buffers so yeah let's try that again okay great we've got a tiny little triangle triangle tiny rectangle of 100 by 100 pixels that are red so there's pixel data there that I'm showing on the screen obviously we want that pixel data to be video data and not just red red pixels but we're getting something on the screen which i think is is an important step towards putting anything on the screen because the code you see even though it's doing something very simple if you swap out the data there for any kind of other data you will be getting any kind of other data on the screen so this is like the the simplest program you could have to to render date on the screen so actually at that point I'm gonna I'm gonna commit this so we're gonna add let me commit all of this add my get ignore file my scenic list I read me my source file and then she'll commit great I'm gonna push that right so you there we go video app in C++ in our library folder you can see there's like this this reference to GL fw so it's actually not got any of the GL fw source code in my repository it just has a little reference to GL fw repository saying what version of that repository we've checked out so we've got our library we've got our c make list with all our important stuff that's it that's what we need yeah cool ok the next thing i want to do is I actually want to swap out gif draw pixels because draw pixels as you can see what happens here is when we draw these pixels they show up in the bottom of the screen and we can't actually just place them anywhere on the screen it's gonna be drawing our buffer in from the bottom corner into the into the frame buffer into the into the back buffer and that's not very flexible so what I'd rather do is create a texture which is OpenGL term for like an image that you can place onto things and you can render in different ways so I'm going to create a texture by getting my date set putting it into a texture inside OpenGL and then you can instruct OpenGL to put that texture wherever you want and you can also render it in 3d space you could say this texture has to be floating at an angle has to be viewed in a certain way and so it becomes MUX much more flexible and also every time you want to render the texture you don't have to send opengl the data again you just load it in once and then it's stored on the GPU and you can put the texture you can render the texture at multiple times and there's all sorts of optimizations that go along with having that texture so let's keep the the raw data that we have just this 100 by 100 block of red maybe I'll put some like a blue square in the middle by creating another loop here that starts from Y 25 and goes to Y 75 five and X 25 to X 75 and instead of putting in those I'm gonna put in blue pixels like that let's see if that works Oh oh jeez yeah so we've got we've got our square with a blue square on the inside not very interesting yeah so we've got that data now I want to load that into a texture so that everytime we render we can just render the texture so in here what I want to do is let's see so I'm going to create a texture so a texture will have an ID that we can refer refer to it as which will be the handle so I've got my texture handle variable and then I'm gonna tell OpenGL to generate one texture and give me back the name of that texture in the handle like that and then we want to say because because the way OpenGL works is as as a state machine you you place OpenGL into a certain state and then you can you can you place it into a certain state and then you can carry out specific operations on OpenGL so when we say GL binds texture our texture handle I'm saying this texture ID that I have I want to to make some changes to this texture or use this texture right now so from now on refer to it as the 2d texture the GL texture 2d because that's the one that I want to operate on and so like at the end of that once we've finished operating on that texture we can then say GL let's see ya know we can just we can then bind it to to nothing we can say GL bind texture texture to d20 to know so that we we've we're now focusing on a different texture on no texture so that's the geo bind texture is like the next operations that we're going to do on texture 2d will be on the texture I just generated so we bind a texture then we want to to basically say what's the simplest way I can do this the main thing we have to do is deal texture image 2d we want to load in texture data into RTL texture 2d there's a level argument that I don't care about then let me let me look up in OpenGL GL Tech image 2d let me just show the documentation for this yeah so first of all the target texture that we want to work on texture 2d that we've loaded in the level is like how like if you have one texture you'd have level zero which is like the highest quality version of that texture and different levels will have different scales of your textures so that based on where you're rendering it to OpenGL can pick the most optimal version of your texture so we're just gonna say this is level zero we don't care about different levels at the moment like yeah level zero here's level specifies the level of detail number level zero is the base image level level n is the nth MITM a production image so mipmap production images we're just getting smaller and smaller versions of the texture I believe so yeah we don't care about that then we have internal formats which is how should OpenGL store this texture internally and I'm gonna say RGB just red green blue components that should be fine and then we want it wants the the width and the height the width and the height of the texture so it's gonna be a hundred by a hundred what's the next thing the border we have no border we have nothing outside of that one hundred by a hundred range and then the format of the data that we're going to put in the format and the type that's the same as draw pixels so it'll be GL RGB jail on the signed bytes and then our data so that's that's what we need to do to load in our data into our texture 2d now I might be missing a few important things here to load that in but but but we'll add that in later hopefully that should work to create our texture now how do we render that so I'm going to get rid of our drawer pixels I'm going to keep the clear function and the swap buffers function but now to render it we have to actually deal with the way OpenGL deals with coordinates and how it places stuff on the screen so I want to do something that they call a Norfolk graphic projection I don't want to do any free D production I want to project it like on a flat surface looking straight at it and so I have to set that up I want to make it so that when I refer to coordinates with OpenGL like zero or a hundred or 200 I'm actually referring to the zero or 100 or 200 pixels visible on the screen when you're doing a freebie game you don't want to think of it in terms of the the X&Y pixels on the 2d surface you want to think of it in terms of the 3d pixels the the 3d coordinates but for my purposes I don't want to do that so I'm gonna use a Norfolk graphic projection so first I want to know the width and height of our window and I can use glf w can tell us exactly the width and the height I'm gonna say get frame buffer size of our window and put it in these two variables next up I want to go over to [Music] Oh God I don't know why I did that I want to say GL matrix mode gel projection I want to switch to like there are two matrices in OpenGL I think or maybe more but two important ones the projection and the Model View so the coordinate system that describes how stuff is projected and then the matrices and the transformations regarding that is stored in the GL projection matrix and then there's another matrix for the Model View the way the points inside the world get transformed and that's in the Model View so projection projection the projection matrix describes how things should be projected and the modelview matrix describes how points should be transformed within the space so if you want to change the camera the way stuff looks you want to change the projection matrix if you want to play stuff after setting up the camera if you want to then place things in the world you have to use the Model View so let me switch over to have the Model View below that so here set up or Foe orthographic or for graphic projection and here after setting it back to Model View we can do render whatever you want so inside GL projection I'm gonna load identity I'm gonna reset the matrix to an identity matrix and then I'm going to use GL or foe to to just fill that matrix with a Norfolk Rafic projection so if you look at GL or foe yeah okay so it explains that in the world coordinates the if I give a value 0 it will go all the way to the left of the window if I give a value of whatever right is then it should be all the way on the right side if I give a value of zero in the y-axis it should go all the way to the bottom if I give a value of top in the in the y-axis it will go all the way to the top of the window and so if I set those values to be exactly zero window with zero window Heights then our world coordinates will be the same as the the final the the final window coordinates like if you put 2020 as your coordinates then it will become the twentieth pixel from the left and the twentieth pixel from the bottom so this is our way of basically saying to have a one-to-one mapping with the screen know freely projection nothing strange happening from OpenGL because we don't need it right now the last is near and far that's how to deal with the the z axis and if you have stuff in 3d we don't care about that right now but the basic values are usually just minus one one that's an orthographic projection once we have that now we can render our texture first things first we have to bind with it we have to enable our texture 2d which says that whatever things were going to draw if we're going to draw a triangle or we're gonna draw a quad or we're gonna draw some kind of shape in OpenGL we want to draw it with the texture we want to use the texture as our as our paintbrush so we want to do geo texture 2d we want to bind specifically our texture handle there as our texture and then we're going to begin drawing some quotes how do you do the the quotes well first I say the the texture coordinates 0 0 the the left bottom most corner of the texture should go to the vertex corner in the world of 0 0 and then the texture coordinate 2d of 1 0 should go to the world of a hundred zero the texture coordinates and textures are basically zero to one they're like squares like the the coordinates for them are like that the let's see do we do one one should go to the vertex coordinates 100 100 and text corner to d0 one should go to the vertex coordinates 0 100 and that's wrong and then we're gonna say GL end and after that we're gonna disable our texture 2d again and that should render it all these warnings is because I'm using old-fashioned OpenGL I'm gonna disable those and so I'm going to go back in to see make and I'm gonna here go add add definitions GL silence application because I don't care if you want to learn to use proper shader language OpenGL you will have to find that somewhere else because I'm not interested in doing in building stuff with shaders right now there's no benefit to me I just want to get basic graphics on the screen so that I can do video processing right let's try that succeeded great but it's a white image why is it white what have I done wrong okay I must give more parameters here for loading the texture so I have to tell I have to tell the way the way the pixels are stored the the GL unpack alignment should be 1 perhaps that's the issue I don't need to you see make again okay that's not the issue I've written down some other things that I might need so I'm gonna write all of those back over the texture parameters parameter I texture 2d texture wrap yes I don't think these are what's causing the problem but it doesn't hurt to put in all of the the things you need to do GL tech end envelope environment I don't know what this this is the part of programming that I don't enjoy where you just have to to copy specific things just to be able to then do the programming you actually want to do you have to set up the environment in a very specific way and you don't really know what some of the stuff means and it's still white why is it white we're giving it RGB unsigned byte data she'll cost why would it be RG why would it be white it's as if is it like drawing to a depth buffer why does it have no content okay that's not going to help me okay let's see we generate our textures take with tech light yeah that's very nice I don't know why I'm not getting any any colors let's see if we can just move the coordinates around first of all so if we like offset it all by 200 and make it all twice as large okay so we still got our why is it why is it white why why am I not getting my reds and my glutes unpack alignment one that's so strange so the way it's reading my dates is incorrect but I don't know why I don't know what isn't correct about it unpack alignment 100 interesting let me see I've got another bit of code somewhere where I successfully did it let me check if I find that three notes color buffer bit texture 2d I thinking exactly the same thing what are the difference what's the difference unpack alignment yeah hmm it's the dates are different now the date is not different oh that's a much nicer way of doing it oh yeah generate my texture I don't know what's different I don't know what's wrong I'm gonna make the context current ah when I'm generating my when I'm generating my texture I have to have the right context otherwise the texture is not gonna be in the same place wow that's stupid there you go so we've got our texture we've got our data being represented or rendered in a certain place on the screen in OpenGL okay now the next step which is the interesting step is to get ffmpeg running too instead of having some fake day to have video data rendered to the screen I'm gonna take a short break and then come back in like five minutes and then start doing all of the work to get ffmpeg to run cool let's see about this you you all right let's get back to this okay right I'm cutting up this stream into multiple videos so that they can all serve as individual tutorials that people can watch so if you are joining this as a video happening if you're watching this as a separate video what you've missed is that in the previous video I set up OpenGL and GL FW in a civic project to be able to render some pixel data on the screen so I'll show you what that looks like it looks like this we open up a window with hello world as the title and we've got this 100 by 100 pixel texture with our pixel data so what I did is I gave opengl don't save that so there's it might look like it doesn't make anything it's a bit chaotic right now but in general up here we have the the essential part which is I'm allocating a buffer of a hundred by hundred by three bytes so we have a hundred pixels wide 100 pixels high and each pixel has three bytes and then with this code here I'm filling filling in every pixel the red the green and the blue components to have all red and no no green no blue and then here in a different range Y range from 25 to 75 and X ranging from 25 to 75 I'm filling in only blue pixels so what we get in this data buffer is an image with like everything red and then the center being this this blue square and then the code that I've made earlier is that I load that into OpenGL as a texture so then we have our tech handle or texture handle that we can then print to the screen whenever we want so then I bind that texture and I draw it at these coordinates so that's what we've got in the project what I want to do in this one is start decoding a a video file and get out the first frame of the video file and try and get that on the screen so that will involve initially setting up OpenGL no initially setting up ffmpeg to start working with that library to start getting information about our video file so I'm gonna quickly commit this commit all my changes get commit everything with the message pixel pixel data rendered through texture okay so how do we set up OpenGL it's again first first whatever operating system you're on you have to install normal gel how do we set up ffmpeg you have to install ffmpeg on to your system and so the way I do that is just with brew install ffmpeg and that will download ffmpeg but it will also download all of the dependencies for ffmpeg and ffmpeg is also is actually broken down it externally you can use it as this command line tool to convert audio and video files but internally it has a whole collection of libraries that together form the ffmpeg functionality namely there's a V codec like internally it's called Lib AV so yeah Lib AV here this is the the library that underpins all of the functionality of ffmpeg and it consists of AV codec which is a library filled with with different encoders and decoders for video files it has av formats which which contains all the code for for taking video data outside of containers outside of the format so if we look at a video file over here so I have let me open this up so I have my demo song if I get information about this demo song it is an mp4 file and so mp4 is the container or the format's and inside that mp4 file there are different different streams placed inside the file there's there's audio there's audio data and there's video data namely what is more information is giving me is that there's stereo audio channels so there's there's a stereo track of audio with the codec AAC which is like apples audio format and then we have h.264 which is the the codec for the video data which is a thousand nine hundred twenty five thousand eighty so inside a format an mp4 container you'll have multiple bits of multiple streams or packets or data and inside there you can also have like like subtitle information you can have metadata of like where it was made who made it the last time I was edited was software was rendered with or exported with lots of stuff can be put inside that file you can also have multiple video files at multiple resolutions you can do all sorts of weird like the stuff inside the the way video file data is placed inside a file is quite complicated and Lib AV format is responsible for that next you also have like Lib AV filter AV device AV util and so there are a whole collection of libraries that are used inside ffmpeg to make it do all those things we're mostly gonna be focusing on using AV codec AV format and AV util which offers have provides a few utility functions to make things easier but first things first we've installed it through brew how do we start using it in our in our project well another thing I want to install which will make us all easier easier is package config so yeah it's already installed on my computer but that's what you need to do Bruins or package config when you install ffmpeg globally on your computer it will put the the different library files the files that your project has to link with and connect with to be able to run it will store them in different locations and package config is a is a tool that helps you locate where all those libraries are and gives see make the information it needs to know where ffmpeg is and how to fit it into your application so we're going to be using package config to create our library to link with with ffmpeg how I like to do this is every library I use in my project I have a folder for it so inside Lib I've got GL FW which we're using for the rendering of graphics I'm gonna make a new folder called ffmpeg and instead of any of the code being inside there I'm just gonna put 1c make lists file in there and all the code all the C make information that I need to to link and link and connect with with ffmpeg I'm gonna put that in this one file and then from our main C make file all I have to do is say add sub directory Lib ffmpeg or Lib FF capitalized MPEG and then when it comes to linking the libraries GL FW extra libs and I linked with ffmpeg like that I want to make it super simple so that it so that like we hide all of the complications of linking with ffmpeg so how do we link with ffmpeg in the beginning of this file again again I want to say the minimum required version of C make so that C make knows that I'm like what version of itself is needed to understand this file and I'm gonna start a new project called ffmpeg and inside here I'm gonna find the package package config it's required I need to use package config after by saying this from here on forward we can use package config functions inside C make to help us find ffmpeg it's weird we have to find package config to then find fom but it it works so we're gonna use a function here called package check modules I'm gonna say I want to look required imported target Lib AV codec so that's the name of the library Lib AV codec I want to find this on my system and it's required I need it and it's gonna be an imported target and internally in Seema I want to refer to it as capitalized AV codec for now I want to do the same thing for AV format which on my system should be called AV format and I want to do the same thing again for AV filter AV device I just want to link with everything even though I might not use all of them right now Lib AV device what else do I need AV u till now it might be that you don't need all of these libraries and then you don't need to link against them but my assumption is that if you're going to have AV codec on your computer that's because you installed all of ffmpeg which means you would have all of these libraries so it's not really i wouldn't expect to see a situation where people don't have all of the libraries installed but if you're preparing your executable in a specific way and you want it to be as small as possible then only use the libraries that you need but I'm just gonna use all of it so I also want these two special libraries called SW resample and SW scale actually I don't need SW resample but again I'm gonna link all of them just just for brevity okay so we've got packaged config to search and find information about all of those files and to make sure that it has to find all of those non files all of those libraries and now I'm gonna create I'm gonna say a library I'm going to create a library in C make I'm gonna say my library is called ffmpeg and it's an interface it's not an actual library but I want people in other see Meg list files to be able to treat it as if it's just a library that you can link so it's gonna be an interface that's imported and I want to say okay within that library there will be certain header files that I need to include and so then I'm going to go through a a V codec includes dears all of the include dears for a V codec for ad formats for a V filter for a V device SW resample and then SW sample SW scale so if package config finds these libraries it will create variables inside see make like AV codec include dears which will have all of the directories that need to be included by someone who wants to use that library so we want to attach that to our new library ffmpeg so that everything that's needed to run out of MPEG will be will be prepared for us so we need that and then we need to set our link options for ffmpeg to link with all of the libraries so AV codec LD Flags AV formats LD Flags there must be faster way to do this alright like this LD flags okay that's our ffmpeg library let's see if that works so we add that as a subdirectory which opens RC make lists file and then we target we link it with our video app I think that should work let's see okay so there's a an error here but before we look at the error so it found package config good and then package config was trying to find all these modules and it found lib a V codec with a certain version that they reformatted with a certain version the only one it didn't find was lived as rescale that's wrong it's Lib SW scale not rescale just like that SW scale so that was just a typo there we go so configuration done build files we built it all so now inside here can we compile this yeah cool so let's start let's start writing something with ffmpeg we've now got a an executable that's linking with ffmpeg so it should be able to call some functions with ffmpeg and open a a video file I'm gonna commit this again linked with so this repository is available in the description you can clone this and use this yourself you can copy out files there's no license you can do whatever you want with it yeah so that's the way I got ffmpeg to run and link it took me ages to figure this one out but that's that's the simplest way I could do it package conflict to find the libraries make sure that all the libraries can be included properly and then all the libraries can be linked properly and I haven't tested this cross platform so I need to see but I believe that that should work that's a generic solution okay I want to start making I want to start making a function and I want to call it load frame so I'm gonna say bool load frame and we're gonna give it a file name and it should give us back an integer for the width integer for the height and unsigned char data for the image data that's what I want to do and it returns a bool false when it fails so that's the function I want to create I'm gonna create a new file for that called load frame dot CPP and it will have that well implement it here for now I'm just gonna say return false we just can't do it in here after we've created our window instead of having just creating this data I'm gonna say it's frame width frame height unsigned char frame data on-site pointer frame data I'm gonna run load frame I would say users BMJ desktop demo song I think that's it right demo song dot mp4 that's what I want to open and I want to get the frame width frame height the frame data if that fails then I'm just gonna return what I'm gonna exit printf couldn't load video frame and so then all of this code I'm going to change it a bit so I'm gonna change it to the texture width or the height will be the frame width and the frame height and the data there will be the frame data so the frame that we're loading will then be loaded into a texture and then we can draw it on the screen here we gotta use frame width frame height remove there so yeah pretty basic right now so in our C make file we don't have to say that our program our video app doesn't just have a main it also has that load frame called CPP so it's gonna be two files once this get bigger it's nicer actually to say list append sources and then you have like source main dot cpp source load frame dot cpp and then for the app for the video app i can say it will have all these sources so you can just list out all your source files there okay so let's let's remake that and run this succeeded couldn't load video frame good that's what we wanted because we're returning false now in here we could just to confirm that it's working we can we can say create a new set the width to a hundred set the height to 100 and we'll just allocate new data and then I'll have a pointer which will be data start a day to zero and then I'm just gonna do in X is zero X until that's 100 X Y zero my son out of it it's gonna say points A plus plus equals two I'm gonna fill it with just red colors again but this time written in a different way like a really short way then I'm gonna say true let's see if that works so now our load frame will actually just will just give us a red block perfect so we've got our red block there so instead of just loading in a red block I want to load in an actual video file so let's try that we'll turn this back into false and we've got to start including including some of fo epic ffmpeg is all in C it's not C++ so when including it I want to tell the compiler like the code in here is not gonna be C++ don't think it is interpret it as just C code so I'm gonna include the a/v codec will also include a t format we also include you know that's that's enough for now I also want to have int types just to have to be able to say like u8u int 8t instead of unsigned char like it you int 8t is clearer that it's an unsigned integer of 8 bits like it sits I prefer that so in types gives you all those type names okay how do we open a file first things first we want to it the function we want to use is a V format open input and let me see if I maybe format open input okay that's the file so it takes a format context a URL an input format a V dictionary so we don't want like we don't want to give any options we don't know the input format we want it to figure that out self so those will be know the URL will just be that filename and then for the format context we need to give it a context that it can put the data in so we start at the top by creating a format context allocate a context and we can also check like if it didn't allocate a form like a format context there's something that went wrong so we can set like printf couldn't create a V format AV format context and we return false but if we could create it we want to give it to our open function here like that so we want it to open our input and put the information about that open file into our AV format context AV for my input AV format open input will return a a boolean or an integer based on if it succeeded or not so we want to just check that that the result we got a zero anything other than zero would be an error message so I'm gonna say couldn't open video file in the case that it's not a non zero return yeah like in this situation you might want to like free up the context because we allocated this context we might want to free it afterwards but like if we can't open the file right now that's just that's that's an error that will crash the program any like we're gonna stop the program anyway so I'm not gonna think too much about how to properly like D allocate the resources as of yet okay what I want to do is stop there and run that and see what happens couldn't load video frame why do we get that message because we do all of this and we return false and then in the main function we say couldn't load video frame so that means that we actually could load the video file we did open it so I'm going to go into into Xcode go to our load frame function and put a breakpoint at the end there and run it okay so here's the cool thing we can then look at what AV format looks like it's a shame that I can't zoom in on this like it's a bit small for you guys to read but what we have is we've opened it up and we have a few things that we can look at we have let me see if we go to I format input format the name so it's like an MOV mp4 mm for a the long name it's it's a QuickTime movie extension so it can have an MOV mp4 m4a3 GP extension but we know that it's a QuickTime MOV file which is cool we can also look at duration their duration is 2 million let me just copy this that is probably in milliseconds so if I divided by a thousand how is it that long is this duration in samples in hmm bitrate 0 start time it looks like these haven't been set yet honestly so we're not getting much information just from opening the the format we want to actually find some video data inside the format so we've opened opened our file we now want to go looking inside the file for for streams for video streams and audio streams so the way you can do that is essentially just iterate over all the streams so inside AV format texts we have number of streams and we can access them through an AV format to context streams I so let me just say Auto stream there get the stream out and here I'm just gonna put a just a dummy dummy bit of code so that I can put a breakpoint there so it's racing over all the streams so we stopped here and we have a stream stream index is zero we have a codec so we can look at the codec it's an AV media type video so it's a video the codec ID is h.264 so the first stream in our AV format context is the video stream I see a width 1920 a height of thousand eighty a pixel formats there's lots of data here that I'm not going to use but essentially that first coat that first stream that we found is the stream that we want to get the video data out of so how do we turn that into code we want to say basically at the end one of these streams will be the video stream so I'm gonna say video stream index is minus 1 and I'm gonna try and find my video stream and if it's found its like pseudocode then I'm gonna set my video stream index that I to remember it I'm gonna break out so if by here if after that loop my video stream index is still minus 1 we have never found it so now I'm gonna say couldn't find valid video stream inside file and we're going to return false ok so how do we how do we check this so we have our stream let me go further for my notes so we're gonna get codec parameters AV every codec params parameters AV formats context streams i coded front par and then for that codec we want to find a decoder AV codec find decoder so you can see now that after just unpacking the format with Lib AV format now we're using Lib AV codec to unpack the codec within the file and so given the codec params codec ID I want to find if we have a decoder if we have the the code necessary like we have a program that's able to decode that file so if we have it I want to say like well if we don't have it if we don't have an AV codec then that's an issue we probably want to continue looking through all of the streams maybe we can find a codec that we can decode but really that's not a good sign you might even want to just break from that but I'll leave it like that if the AV codec that we found if the codec type is AV media type video we found a stream that's a video stream you could find the same thing like one of these streams might be an audio stream and it would be AV media type audio and then you can extract stuff like the the channels and the sample rate and determine what kind of audio file you're dealing with or audio stream you're dealing with but if we just found the video we want to we found it ok that's that's basically this code here we've got our video stream index I and then we want to break out but we want to keep both of those so I'm gonna put those outside AV codec so that we keep them like that oh what's the error code it type what's it supposed to be oh sorry it's a V codec perhaps AV codec Krabs is a video that's how we can find the video stream so okay let's let's run that again and and end at the end there okay so we blocks our video stream index is zero so we know that the zero the first stream inside the mp4 file is the video stream that we want so that code works works properly you could you could also look for like I don't need to I don't want to look for audio but you can also look for the audio stream index you can set that to minus one and then you can say like if AV codec params codec type is AV media type audio then we found our audio stream but then in that case you wouldn't want to break here you want to be able to find them both before breaking through the loop breaking out of the loop but so I'm gonna keep it just as video right now we haven't used that yeah we don't need that so yeah no AV codec we continue if we find a codec that's a video what we find a stream that's a video then we found our stream perfect so I'm gonna start commenting stuff that stuff out competeing some of this stuff so open the the file using Lib AV format find the first valid video stream inside the file okay that's it now we have to start doing the real the real brunt work of taking video data out getting video data out of the of the getting the actual pixel data out of the video which means using a decoder or a codec so right this is where it's gonna start getting kind of tough so all of these things you have to do every single thing right to get the video file out and you can't just copy and paste this from another project and use that but if you're going to really do some video streaming you have to understand how all of these parts of the ffmpeg work together and you have to call them the right way but yeah let me first set up my codec context so okay let's go into here set up a codec context for the decoder so when you have an AV codec you that basically means you you know or a ffmpeg is telling you that it has a program that can decode your video and what you need to do is if you want to start decoding it you have to allocate data you have to allocate space where it can store its internal data structures for decoding and that's the codec context so I'm gonna say codec context I'm gonna call that my AV codecs CTX AV codec a lock context free given our codec pointer right if that failed if we fail to allocate a code that context that's pretty bad so we're gonna just print out and say couldn't couldn't allocate you couldn't create AV codec context and we return false right no we have to the information that AV codec that Lib AV codec or liba V format the information that Lib AV format got out of the file has to be placed into our codec context as its initial state so we basically do AV codec parameters to context so we have our AV codec context and our AV codec params we're gonna load them in and if that doesn't work for some reason we're again gonna fail we're gonna say couldn't couldn't initialize AV codec context okay so if now that that's succeeded we want to we want to open the codec at the the file and we want to start reading from it so AV codec open to AV codec context and AV codec and what's the last option I've written down that it's supposed to be no but let me just confirm why oh it's the options yeah we don't have any special options right now if that is a negative value that means it didn't work so I want to just again printf couldn't open codec return false okay oh wait okay so at the bottom here let's now that we've done all of that I'm gonna properly see I'm gonna just write the closed input just have all the destructuring done free the context AV codec free the codec context yeah so that happens at the end so we have we open the file we find the video stream that we want and then we open the codec for the video stream and then lastly we have we just properly clean up our routine at the end freeing up the context that we allocated and closing the input file so that the file doesn't remain open right okay that's a lot of stuff that's that's that's the main setup for using Lib AV now we want to actually start extracting out date frames from the codec and that will require knowing a bit about how videos video files are structured so how do I explain that let me see okay so a video file internally is broken up into different units ones are called packets and others are called frames so a packets inside a video file basically a video file has all sorts of packets in it just one after the other one packet after the other and the packets are in are come from different streams so when you read in a packet from a video file it might be the packet for the audio it might be a packet for the video it might be some kind of data packet or something else when you're looking for the next frame in a video file you're gonna be looking for packets until one of them is a video packet and then you're gonna decode that packet to figure out if there's any frame date in there and for one packet there may be multiple frames stored inside it I believe but it may also be the case that for one packet that there that there are multiple packets that are required to describe one frame so I'm not exactly sure about the relation between this okay so the a/v packets tour is compressed data at a V frame I believe would store uncompressed data okay yeah so AV frame is the decoded raw data so we start by getting AV packets out of the file format and then we put those packets into the codec into the decoder and the new codec will then give us an AV frame which will have the the raw audio or video data so let's let's let's try that so first we want to start getting packets out of the date so let's let's create a packet and an AV frame an AV frame in an AV packet that we can that we can use as variables so we'll have an AV frame AV packet I want to allocate my frame if we didn't manage to allocate then we want to error couldn't allocate any frame like with these I don't really understand why you would I feel like the program should just crash when when you can't allocate a frame or something like this like that do you seem like ever errors that are never really gonna happen so I don't know why I'm writing them all down but this is like for every single one of these some of these would only fail if you're like out of memory and I really don't know how often that actually happens because your computer will just free up memory I put it like swap it out to the hard drive like you would you'd have to be in a really bad situation to ever get any of any of these errors to happen and if you're in that situation having an AV codec uninitialized is like the least of your problems so like the whole rest of the program will never function anyway so if I would write all this I would have just written like if it was just if I wanted to simplify it I would not check for these errors but for the sake of of following the rules doing what everyone else seems to be doing I'm gonna be checking for those errors all right we've got a frame we've got a packet so we want to do we want to use a V reef Reed frame where we give our format context format context again is the first thing that we that we allocated it's the information about the file that we've opened using a live AV format from that format context we want to read in a packet the naming here I don't understand why would you say read frame and then it's actually a packet like it should have been a veeery packet right why is it reframed stupid that's gonna return us a number if the number is greater than zero that actually means that it's that it's that there is a packet that we've read if it's gonna be below zero or zero that it's either an error or like an end-of-file there's no more packets left so we're just gonna look for for a packet I don't know maybe we don't need to do this in a while loop right now yeah we do because there's there situations where the the packet the frame we received is not actually a video frame so we have to go back through it and read a frame again and do the same code so we want to put this in a while loop we want to iterate around until we find our first video frame and then we can break out okay AV packets stream index if the stream of this packet is not our video stream then then we don't then we're going to continue we're gonna continue looking for others because this is we don't care about this packet we don't care about the audio about or any other kind of data we're waiting until we get a packet for the video stream and so we'll get to line 71 if it's packed for the video stream and so then we want to send the packet to our decoder and hopefully the decoder will then start processing it and eventually can give us a frame so we do AV codec send packets so we give our codec context a V codec context add our packet a V packet this will return a response let's see a response if response is greater than zero if their boss is less than zero there was an error so we want a printf failed to send pack filter to decode packet and we can get us an error string in that situation from because the response will be beneath the error we can say AV error to string of the response and we return false so we're gonna get a positive response and like so we're sending packets into the decoder and what we want to do is receive a frame from the decoder so we're sending the decoder packets we want them to give us a frame and so what we can do then is AV codec receive receive frame we give it the codec call text again and we give it our frame that we want it to put it in again this is going to give us a response so we'll have another response here and so if that responds like there's a few error responses that we might want to deal with is like if the response is a V error again or the response or the response is Aviara end-of-file then whatever packet we gave to the decoder there's no frame inside it or the frame has already been decoded or something so we basically want to say we want to break we want to we want to continue we want to go and do another read frame and try and read another frame so if the response is those two we just want to continue but if the response is actually anything else less than zero then it's actually an error we want to say failed to decode packet and give the a/v error to string of the response in all other situations we're going to have either the number zero or greater for our response if it's so in this situation we've got a frame we've received a frame somewhere so let me just check this all yeah we've received a frame I'm going to say for now put that if that's less than 0 and here in response oh yeah that's why I wanted to use heat response so that we can print out the response maybe I'll put that outside there and say response send the packet receive the frame if AV error or a V and a file we're going to continue if if the if the response is less than zero we fail to decode we won't actually return false there's something wrong at this point we've got D we've got raw data decoded in our AV frame so I want to just put again like it like a dummy statement so that I can put my breakpoint there so breakpoints let's run this see how far we get who couldn't in initialize a V codec context after I was explaining that this stuff would never happen it happens av codec context yeah it this might give a positive number as a response so we have to check if it's less than zero not if it's nonzero okay we got all the way over here so we've read in a frame we've sent the packet we've received the packet so if we look at what AV frame has gone inside it it's got some data inside here okay so it's got it's got a data array with eight entries and in the first one there's some data in the second one there's some data in the third one there's some data it's telling us that there's a picture inside this the width is a thousand nine hundred and twenty the height is a thousand and eighty we've got a color space we've got information about the format so we've got like inside here inside that data we've got some uncompressed raw data now which is great just to show you like let me just explain that so inside a frame we have data and dates is 0 like the way they decode the pixel data is they have the different channels like you have a red a green and a blue channel they put those channels in separate buffers so you have one channel for all the Reds one channel for all the green bait data and one channel for all the blue data in this case it's not stored in RGB it's stored in Y of in Y UV which is like one luminance component for dark and bright one like hue component for what the exact color is if it's a red or a blue or if it's a pink and then a saturation component of how how strongly saturated is the color in the in the pixel and so it's broken up in a different way it's not just components of red green and blue but the luminance one we can use that to get a grayscale image so before doing any complicated processing I'm just gonna break out here and before I break I'm gonna basically whether they give you a packet it's it's it's reference counted so what I want to do is pack it on ref the AV packet and say I'm not like you gave me an AV packet here I'm not using it anymore so I'm gonna unwrap it here what I also want to do is AV frame free that give my frame an AV packet free and give my packets so I'm gonna free up all my resources but before I've got a breakout and here I've got an AV frame supposedly so my a V frame I want to get the dates out of that I put it in to a square so I can put it on the screen so first I'm gonna I'm gonna allocate some some data so unsigned char pointer data is new on side char array with the width and the height of the frame with x AV frame height x free free pixels for each we're going to make RGB data and then I'm gonna basically just copy over gonna have a why I don't care about the efficiency of this an X that goes from X to the whatever the width is out of y frame whatever the height is and then we're gonna fill in our data here so oh it's this redefinition of data up here I'm gonna say width out height out data so the parameters there are not mixed with our code over here we're gonna have our data I'm gonna say y x AV frame width oh let's see frame width times 3 plus x times free that's going to be our red component our blue or green so now our data doesn't have any actual data it's still a red block but it will have the dimensions of our frame so let's just return that real quick we're gonna say width out will be AV frame with Heights out will be AV frame frame Heights data out will be our data so let's see if that works I hope it does couldn't load video frame because I returned false so I have to return true there okay that's big oh there we go yeah that's sixteen by nine so that's the video frame we've loaded well it's not the video frame it's just red let's get the actual data out from the for the time being I'm just gonna extract the data from the luminance of the frame so we're not going to get the the color we're because we need to convert color space properly to actually get a color image on OpenGL but I'm just gonna get the black and white image the way I can do that is the way the data is stored inside a V frame is if you see again so inside a view frame there's a data let me pause it here let me put a breakpoint there so we can see inside AV frame we have these data components we have data 0 data 1 data - which correspond to the the luminance the grayscale color component the U and then the V like the the color components there and the grayscale components and in those arrays all of the data is stored consecutively you just have one one byte after the other and then and so at some point you basically start in the top right corner of the video and you have all of the video pixels and then at the end of that that line of pixel line of bytes you start describing the pixels of the on the second line of the the video on the second horizontal and the third and the fourth and so there's another variable we need to be able to to turn this one buffer into an image which is how many bytes are there per line and that's described in line sizes I hope you can see this like it might be much too small but inside line sizes for each for each piece of data we have we get the line size so it's telling us for that luminance data that the line size is a thousand nine hundred and twenty that if you want to get the the first pixel of the second line on that video file you will have to look at byte number 1920 if you want to get the second pixel it'll be byte number 1921 so we can use that to extract out the data so I'm gonna say here in my my code let me switch back to to this for a second I'm gonna say my code that we want to get a V frame data 0 and we want to extract out whatever the Y value is x by a V frame line size 0 plus X that's going to be the grayscale pixel data for X and we're just gonna put that in the arch the G and the B to just get a grayscale image and so let's have a look at that let's uh let's see if that runs okay it does it does and so we're getting the data from from the the sample image but it's upside-down so why is it upside-down huh I guess I could just flip it in OpenGL I'll flip it in OpenGL so in OpenGL when I say or foe I can say that when you refer to 0 you're actually referring to the window height when you refer to the window height at 0 just flip around our projection yeah there we go black-and-white data okay so that is the first frame of the video loaded in and and rendered on to an OpenGL surface but it's only the black and white data so that's a good intro for now I'm gonna take another break and then I'm gonna go into using the ffmpeg library SW scale which here like it's it's apparently like within ffmpeg within the developers they don't really like F like SW scale but we're gonna use the SW scale library that comes along with ffmpeg to do color conversion and you can do scaling with it but we're not going to scale the pixels we're just didn't going to do color conversion to turn a your video a Y uvd pixel video into an RGB frame so yeah that's that's what I'm gonna do in the next video I'm gonna take a break but yeah I hope you're interested you okay Oh two and a half hours live-streaming programming for two and a half hours it seems completely fine but then programming and then explaining every step for two and a half hours is it's intense um it's thinking a toll on me like for sure but anyway the okay so this is actually a live stream that I'm breaking it up into individual tutorial videos for YouTube if you're watching one of these tutorial videos for context in the previous video I I went through setting up ffmpeg to decode an individual frame just from from so basically how did I do that - if I look at my load frame code we start off by opening the file using av formats then we select our stream like we we unpack this file and inside an mp4 file or an avi or whatever the format is there will be multiple streams there will be an audio stream there will be a video stream there will be other metadata there could be subtitle information and I look only to find a a stream that is a video stream and once I have that and once I know that I have a decoder for that I set up my decoder to to open up whatever updates is coming in from av formats and in AV codec it's gonna start decoding it so we initialize that and then we have this main loop where AV format will read a frame will take a frame out of the will actually will read a packets will read a packet of data outside of the video file and provided that this packet is relating to the video stream we pass it into our decoder we give a packet to our decoder and the decoder will give us back a decompressed raw video frame and that frame what I'm doing now is I just unpack the the the luminance data of how bright or dark each pixel is and I'm putting that into an RGB format by just putting the brightness as the R at the G and the B and as a result I'm getting I'm getting this I'm getting one image on the screen which is a black and white image because we don't have color data so what I want to do now is instead of just copying out the luminance of our decompressed frame is actually converting the frame correctly to red green and blue so we can see everything and the way we can do that is with a library called SW scale it comes along with ffmpeg and all I have to do is just include it which will be SW and maybe live SW scale slash what is the things that I have to include SW scale H like that I think that's it let me just check by running it yeah that's the right include file so we include this and so at this point here we have a variable we have an AV frame AV frame variable called AV frame like that and that stores the the raw data that's been decompressed by our codec and it will store it in y UV formats so so what is what yeah let me just rely on Wikipedia here so Y of is a color encoding system typically used as part of color image pipeline it encodes color image or video taking human perception into account so yeah so this is a great example here so we have a color image and your breaks that image up into free components we have the grayscale image of just the brightness and then we have two two components which together can describe all of the colors in the image and so it's it's not quite human saturation u and V are like a would look like this so the value of you on this scale and the value of V on this scale will describe every color that you can get the reason why it's split up like this and not described as red green and blue is that humans notice difference in brightness much more than they notice difference in color so what you can do is you can get away with with with storing less information about color and storing more about greyscale and humans will never notice the difference so you can kind of compress an image just by changing the format in which you store it so in the case of this raw video data if I pause it here at the point where we've decompressed it if I inspect my AV frame I will have so you see here a V frame is the frame we received it contains a property data which has spaces which has eight pointers pointing to data only three of them are filled so the first free contain in in order the grayscale pixel data in position zero the the Y pixel data or the U pixel data and then yeah the U the grayscale luminance pixel data Y the the color u data and the color V data those are the free bits of but those are the free pixel buffers that we get out interestingly how many bytes are there for each horizontal line of the video for the luminance there's a thousand nine hundred and twenty bytes for one horizontal line which means that there's one bytes for every pixel on that horizontal line because this is the HD video and it's a thousand nine hundred and twenty wide but for the U and the V color components there are only 960 bytes to describe one line which means that every byte of color is split up into four bits for one pixel and four bits for another if that makes sense like a thousand nine hundred twenty bytes means you have 15,000 360 bits for the luminance data of a line you divide that by the amount of pixels you have you have eight bits per pixel in the case of the luminance dates that we put in 960 so you have four bits for the luminance so actually we're storing half the amount of information for color than we are for for luminance and so that it's saving space by having it in a certain format but for the presentation of that data when we go into OpenGL we want to store it in the format that suits the presentation which is the display and the display is RGB we have a red a green and a blue LED or we have a red and green and blue light-emitting object on the screen and by varying the intensity of those free lights you will get whatever color you want so we need to switch the format and we can do this with SW scale so my old routine of just copying over the grayscale data I'm gonna remove that I'm not gonna use that anymore instead I'm gonna say I'm gonna create a SW scalar context which is all the data that the scalar needs to be converting color spaces or or converting sizes of the image so I'm gonna say SW s context and we'll have our scalar context here and we'll say we're gonna get a context we're gonna we have to give it the width or the height of our video so we have a V frame width a V frame height it's gonna ask us for the pixel format which our codec context knows we can give it the pixel format and what do we want to scale it to what what do we want to turn it into let me just show you get context it has a few different things it has source with source height source format then destination with destination height destination format so I don't want to change the scale of the image so the destination width and height will be the exact same but if you wanted to scale it down by 1/2 you could do that here you could like divide the width there by 2 divide the height by 2 and you would get half the size image so the software scaler can do that for the destination format what do I want as my destination format I want an AV pixel format RGB 0 which means for every pixel that we have I want 4 bytes I want a red byte agreed by a blue byte and then an alpha byte which is just not used but that's a very common internal representation for data where you have 4 bytes for each pixel where you have an alpha channel where you can you can describe the transparency of the pixel but here we're not going to use it but we're gonna get it to put it into that format and okay how do we want to do the scaling so we have like the flags we want to say we just want to do SWS bilinear scaling or fast bilinear let's do that that's even better it doesn't really matter because we're not doing any actual scaling so it doesn't have to interpolate pixels it just has to change color values so I'll stick with that it should it should determine that the width from the height is the same and that it doesn't scale at all but that comes in handy when you're actually scaling and then the we have a source filter we have the destination filter and we have parameters we're gonna set all of those to no we don't care about any filters that we're applying that's our context that's our scalar is defined to do that and so now we just need to give the scaler the information to to just two scaler our data I'm going to change this a bit I'm gonna say width and height on the same line like that like this so that's the input width height and format output width height and format and then these options on the third line if we don't get an ST scaler context then something went wrong we're gonna say couldn't initialize SW scaler SW scaler and we've returned false in that situation again we should actually close all of those inputs but you know error handling I'm gonna do that later on in this project right if you saw it's really annoying that that when I pause the debugger here that it is so that the text is so small I wish I could do something about that but inside a V frame you see we have this data data array with pointers to all the data and a line size array giving the information of how many bytes are in each horizontal line of data for each of those for each of those buffers so we're gonna give that to to our scalar so we're gonna say SWS scale we're gonna give it a scalar context and we give it the a V frame data and we give it the a V frame line size and we tell it to start from from the horizontal zero all the way up to the height of the image but then as the destination we have to give it another one of those data and line size vector or data and line size arrays so we have to create create one now unlike the input where we have three different buffers we actually want everything to go into one buffer where each each pixel is packed in where you have the grid degree the blue and a zero for one pixel followed by a red green blue and a zero for the next pixel red green blue zero red green blue zero until the end of the horizontal and then we have red three blue zero for the next one so we want to pack it all into one buffer so we're gonna tell it that we're gonna say we're gonna create a UN eight u + 8 t destination buffer which will have our data do I delete that I delete that okay so I need to to allocate my my buffer on side-by-side data points or data that's fine inside char pointer I'll use you at 8 T data buffer new unit 8 t AV frame width x AV frame height x 4 we're gonna have four bytes for every pixel because we're gonna have an RGB alpha so in our destination we'll have that as our first buffer and all the other ones will remain empty like that and then we have to have a we have to have one vector describing all the lion sizes line size which will be four again and the line size of our data will be whatever the width is our of our frame times four because we have four bytes for every for every pixel so the amount of bytes in the horizontal of our video buffer will be a thousand nine hundred and twenty times four whatever the video width is times four and the line heights of all those other ones will just be zero there's nothing in those buffers so that's what we want it to that's what he wanted to write the results into so we're gonna have distillation and we have destination line sizes and we'll see if that works so yeah let me just stop that and put a breakpoint after we do SWS scale and see what happens okay so over here in our data I see some stuff in there so I see some data in there that's pretty good I want to see it on the screen I don't want to just write it to disk or whatever so let me first free the context we don't need it anymore after we've scaled this this one thing the cool thing is if you're processing all this video you set up your scaler once and then for every frame that you have you do SW scale and only at the end do you free the context so when I start pipelining this where multiple frames will come in then I need to manage these contexts properly so that F Peck get set up once and then we just run the functions for every new frame without changing any of the any of the the options yeah so we've scaled that I need to set I want to set width out to the AV frame width so for context again if you're only watching this one video I've got a load frame function that we're building here and it takes a filename and it gives us back a width and a height and a data buffer and so I just need to hand all of this back to whoever's reading so that they can put it into so that they can read it so I'm doing that I'm gonna free up all our resources and then return true that we succeeded in loading the frame so let's run that see if it works okay that works but that's not a good frame okay it's I see if I go to main here we get our frame data back and I'm telling OpenGL when loading in this frame data that it's in the format GL RGB but it's actually in the format GL RGB a so I'm telling it that the the data that I'm giving it is RGB a so it's actually 1 bytes for the are the G the B and then an a byte it was misaligning it because of not knowing the format there we go beautiful so we have a properly decoded single frame from a video file going out of ffmpeg onto an OpenGL surface where we can project it we can do whatever we want with it if you think that seems like a lot of work to just get one frame you might be right but on the other hand it seems like a lot of work to get one frame on the on the screen but if you're gonna build any program that's that's animating stuff and that's actually streaming video you'll need to control all of these individual steps you will need to do different things in all these steps like you see that that we're like individually initializing the formats context for opening up the file separate from the codec but that's because you might actually want to to treat the file in a different way or only extract the audio or have to extract both of them so then you open your file you get those two streams and then you want to when decoding packets send them in to your audio decoder and into your video decoder decoder separately so when it comes to creating an actual real-time video player this would not be just one block of code it will break up and it will actually touch lots of parts lots of lots of parts of your application but for a single frame this seems like a lot of work but there we go that's that's it would be cool to do some more work of like animating frames but I'm I'm pretty tired now this has been a long stream and I'm pretty happy that I can get one video frame on on an OpenGL surface just just let me change the file name and see if I can open a different file just where are all my files stored I have a stream announcement and now it's done before all my desktop as well yeah really cool okay I'm gonna leave it there as as always all of this code is available on github you'll find it in the description you can clone this you can you can copy out any of the you can use any of the code that you want yeah I'm gonna check all of this in and I'm done color how do I say this decoding video data out rendering decoding first video frame and rendering to window push that okay that's it for now I'm gonna leave it there I hope I explained things in a way that is understandable that you can get to work using ffmpeg now and you can start manipulating video files sometime next week I will announce it I will start animating videos on an OpenGL surface so that we can actually start seeing moving images not just one image but this is an important starting point where you see how an individual frame can be decoded once you've learned once you know how to decode an individual frame it's not it's quite quite easy to then extend that to decoding multiple frames the big challenge will then be when you and when you're decoding all these frames how do you synchronize them to the right frame rate how do you understand when to render a frame and so I will get into that in the next live stream thank you very much for watching if you have subscribe if you want to see more of these live streams and I will see you in a week goodbye

Info

Channel: Bartholomew

Views: 11,952

Rating: undefined out of 5

Keywords: development, software, programming, C++, FFmpeg, OpenGL, GLFW, tutorial, explained, real-time, video, processing, graphics, libav, CMake, Xcode

Id: MEMzo59CPr8

Channel Id: undefined

Length: 165min 14sec (9914 seconds)

Published: Tue Nov 12 2019