Fast Window Capture - OpenCV Object Detection in Games #4

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
up to this point we've been using OpenCV to detect objects in static images now we're ready to apply those same techniques to videogames in real time remember that video is just a series of still images shown in rapid succession so today I'm going to show you the best way to capture screenshots fast and display them in an open CV window so that we get a real-time video stream of the game are interested in along the way I'll be talking a lot about the importance of good googling skills as a programmer I'll also be introducing basic object oriented programming concepts like classes and objects if you're just interested in the code and not how I got there there's a github link in the description and some of you might just want to grab that because there aren't a lot of open CV concepts we'll be covering in this one I'm mostly just going to be talking about my process for coming up with this solution hey I'm Ben and this is learn code by gaming so when I first started thinking about how we're going to apply these OpenCV techniques to a video game in real time the first thing I did is I went to Google and I searched for open CV video thinking we might get a video stream of the video game we're playing so the first link here I see is from the official documentation so I checked that out first my eyes immediately go to the code whenever I see a tutorial page like this so in this code I see that they're using open CV video capture which I know happens to grab video data from a source like a webcam and in our case we won't be getting data from a webcam we'll be getting it from the screen instead but everything else in this code will work pretty much exactly the same you can see they create an infinite loop and then each time through that loop that you read one of the frames from the camera and then they do a little bit of processing with OpenCV on that frame and then they go ahead and show it with imshow and then they use wait key down here to give you a chance to exit the program and this is just slightly different from the way key that we've used earlier before we didn't give it any argument and if you give it no argument or an argument of 0 it will wait until you press a key before continuing but you can also tell it to wait just a certain amount of time before continuing on so that's what they're doing here they're telling it to wait one millisecond before continuing with the loop and they're also checking here to see you I can keep you pressed on the keyboard only if you press Q will it exit the loop and when the loop is exited it'll close the camera feed and it'll also close all OpenCV windows so this is the code will be starting with today you can see that I've just taken the code from that OpenCV documentation and I've cleaned it up a bit so we're importing open CV and numpy as always and this OS change directory portion you don't need to worry about that if you're following along I just have it here because the way I structured this project on github is I put each video in a separate folder and I've taken out the webcam stuff and in its place I just have a placeholder for these screenshots will be grabbing and then once we get those images we'll pass them into imshow as long as we keep the window name of IM show the same it'll simply update the image that's in that window instead of opening a bunch of different open CV windows and then we'll wait one millisecond to see if there's any Q key presses in the key buffer and if there is we'll exit the loop and close all the windows and just for organizational purposes I'm going to go ahead and move this destroy all windows to happen immediately after you press Q and it'll work the same either way I think this is just slightly cleaner code and the next thing we need to do is we need to get these screenshot images and the easiest way we know how to do that already is using peyote GUI and if you don't have Pi or GUI installed already you can just pip install it they've install pi OTO GUI and this is just optional but the code we'll end up with we won't be using pi OTO GUI but obviously if you're following along you will need to have it installed so now that that's installed we can go ahead and import PI Auto GUI and then to get a screenshot as simply part a GUI screenshot and so again the concept here is we've got this infinite loop it goes and it takes the screenshot and it shows it in the opencv window and then it goes back through the loop takes a new screenshot puts it in that window again and if you do this fast enough and be fresh quick enough it'll be just like a video and if you run our code at this point you'll get an error I'll go ahead and show you what that looks like and you can see we get this type error I am show expected something different and what this is telling us is we need to convert the screenshot that we get from Pi auto GUI and put it in a format that OpenCV understands all right so how do we do that conversion well we've got to use our Google skills again so I went and I searched for convert how to give you image to open CV and then I just checked out the first link it's a Stack Overflow link Stack Overflow links are usually pretty good and we can see that this person is trying to do exactly what we're doing and then the answer here is we need to convert it using NP array and then they also warned us that we need to convert it from RGB to BGI but let's try it first without doing that so I'm just going to copy this code and in our code go ahead and paste it and we'll give it the screenshot and we'll save it back to the same screenshot variable so now by the time screenshot gets down to imshow it should be in a form that OpenCV understands let's try running it again and it's over on my second screen here you can see how it works OpenCV is taking a screenshot and then OpenCV is displaying it in this window and so we get this sort of infinite mirror effect as it takes a screenshot and updates it in the opencv window we also notice that it keeps flipping the colors on us this is because we do indeed need to convert from RGB format to BGI and if we go back to our google search result they tell us why here how to do that conversion so I'll try that you see when we run it it does indeed work all of our colors are what we expect now and even though this works fine I happen to know there's a another way to do this color conversion so OpenCV has this convert color function you just tell it what kind of conversion you want to do and this works just as well as the last bit of code the reason I prefer this for version is it's just a little more clear when you're reading it what's going on so as much as possible when you aren't in sacrificing performance you want your code to be as self documenting as possible and we can run that again to confirm that it does work and again I'm pressing Q to exit out of that window okay so so far so good and if you wanted you could stop here and start processing your images with open CV but I'm going to take it several levels deeper because I'm searching for as much speed as possible here because as we start using open CV to you process these images you're going to find out that it slows down a lot so before we ever get to that point we want to start from a place where we're getting these images as fast as possible and to know how fast we're going we need a way to measure it so to do that I'm just going to measure how much time elapses through each iteration through our loop then I'll go ahead and print that out to the console so we're setting a bookmark on the time when our code which is here and then we enter the loop we could do this processing here and then we print out now what's the current time versus the time that we started out and then we put another bookmark in the current time we go through the loop again we do this processing again and again we compare the current time to our last bookmark then of Earnest now if we look at our console here you can see we're getting about point zero five seconds or so between each iteration through the loop so right now this is telling us how much time is passed between each call of this print statement but if we really want to help with this as how many frames per second we're getting we need to figure out how many of these time segments can we fit in one second so that's just a simple division problem we'll take one second and we'll divide it by how long it took through this iteration of the loop and now when you run out of code it should give us that same data but in FPS format so you can see right now we're getting about 19 or 20 frames per second and of course the FPS that you get is going to be different depending on your monitor size and how fast your computer is and all that stuff but what I want to do now is I'm I tried different methods for getting this screen shot to see if we can increase our FPS so the first thing I thought to try was you know PI OTO GUI is a pretty high level library if you follow the source code Payoh de GUI for Windows uses something called I squeezed to do this screenshot function and then PI squeeze itself is using the Python image library and specifically it's using the image grab to do this screenshot so sometimes you can get better performance by cutting out the middleman and just going directly to the lower-level library so let's give that a shot if you're not familiar with the Python image library it's very popular it's used in all kinds of Python code that deals with images and for the modern versions of this library you actually pip install pillow but if you've installed piyo to GUI already it should already be installed so if we do a pick list you'll see that pillow actually is installed already and even though it's called pillow now you still import it using PIL to be from PIL import and we're just going to import image grab directly and image grab is also really simple to use just call image grab and it has a function on it called grab now let's give this a shot and see if we improved our performance all right so still you know 19 to 20 fps that I'm getting which is about the same we saw before but you know at least we cut out some of our dependencies so at this point I still wasn't really happy with the speed so I went back to Google and I searched for Python Windows screenshot fast looking at the search results here we got another stack overflow set of suggestions which is perfect and you'll want to take a note of the dates on these stack overflow results usually you want something that's new where but in this case I jump to this 2010 result and this is the the post that got me down the solution that I found so if you scroll down to the first answer you'll see this bit of code here and this is suggesting that the fastest way to get screenshots on Windows is to call the windows API directly which makes sense if you're calling the windows API directly you've got nothing in between you and the screenshot you're trying to take so theoretically it should be the fastest possible way to do this and in Python the best way to work with the windows API directly is through pi win and 32 so that's what this poster is suggesting you do here importing these win32 GUI win32 UI those are both from the PI in 32 library so installing this is easy it's just another pip install I win32 it's all I'm going to do is copy this code over and I'm going to put it in a function so it kind of keeps it separate from everything else we'll call this function window capture and then I'll clean up the imports we don't need I find image library anymore and then the poster of s Tech overflow article also forgot one more he's using this win32 Co n so import that as well so now we don't need the imports in the function itself and then there's a few variables here that aren't defined yet like this window name I think this is within his height and you can see it's actually saving the screenshot that it takes to a bitmap file so we need to give that file a name we'll just set the name here we'll call it debug BMP up here we'll set the window capture width and height so I'll just use the 1920 and 1080 for the full size of my monitor and then for the window name we don't really have a window that we want to capture yet we're just going to capture the whole whole screen so for now I'm just going to comment this out and this hwnd you just set that to none and then this code down here is all kinds of issues with spacing so I'm going to go ahead and clean that up alright so this function is all looking pretty good now I'll bet remember it's just going to take a screenshot and save it to a file and I don't want a bunch of these files being saved so for now I'm just going to disable our loop that we have and I'll just call that function once and I'll run this just to make sure that it is capturing a screenshot like we expect so when I run this it should create a new image file in my folder let's check that out it is a screenshot of our first screen like we expected all right so we don't want our window capture to be saving image files instead what we really want to do is we just want to get the screenshot and have it returned to our loop so let's go ahead and bring that back into the fold and the way we want to use it is we just want to say screenshot equals window capture and when we do that we don't want to do any of this processing to the image after the fact we're just going to get everything in window capture to return an image that open see if you can use right away so now we know we need to modify this window capture function do not call this save bitmap file instead it needs to return some sort of image and we don't want to return an image path here we literally want the image data to be returned so again the full idea here is we're going to use the window capture function to call the windows API get this screen shot data we might convert it to a format that OpenCV wants and we're going to return that image so down here in a loop we get that image and we immediately pass it over to this i''m shell but how the heck do we do that well this is Google to the rescue once again so I really can't stress enough the importance of using Google to being a programmer you use it literally every day and all the time so the search term I use to find this solution was compatible bitmap OpenCV Python am I thinking here was any code that has these solutions that are looking for it's probably going to use this bit of code here and one of the function calls that stands out to me is being unique to this is this create compatible bitmap function so any web page that has create compatible bitmap on it and it also has open CV and Python on it it's probably gonna have exactly the solution I'm looking for and that was kind of my thought process for using this search term so we get Stack Overflow again and this first item that comes up here is what has the solution I was looking for first thought when I saw this solution was there's not a lot of code here but then I saw this what a speed-up 10 times improvement so that got me excited so what I did is I just grabbed this code here you can see up in the original question kind of how it fits into the picture with the rest of our code it's this part right here where we were previously just saving that bitmap image but again I want the better code the fast code so I'm gonna go ahead and take that over to our code and then it's just a game of lining up the variable names so sign-in survey that's already defined right here but then this gift bitmaps bits function is being called on B and P which is undefined for us if you look at the example code in the question this BMP is just what he's getting from create bitmap here and of course we're calling that too but ours is called data bitmap so that's the same variable so if you've done this code here you'll see that you get a really unexpected result you've got this really long vertical OpenCV window so something's not right let's go back to that example code and see what we missed and sure enough after he calls this NP array he's then changing the shape of the image so let's go ahead and do that to you and of course in our code the height is just H and the width is just W now what happens if we run it now all right it looks like we're getting these screen shots like we expected and our FPS is actually improved it looks like we're up to about 29 or 30 FPS all right so we're looking good so far but there are several more improvements we can make I've worked ahead a little bit on this project and I know there's two more things we should do in this window capture function first thing we need to do before we return this image is we need to drop the alpha channel on the image so if you don't know the alpha channel an image is all of the transparency and even though everything works fine right now when we just pass over the image to imshow once we start using a match template it's going to throw up errors if we don't get rid of the Alpha Channel so this is how we do that we just use some numpy slicing to get rid of the Alpha Channel data and again I just looked up how to do this on Google and then when I started calling draw rectangles on our output image I started getting another err type error this time and I've research this one found this issue on github that explains everything that's going on and they recommend that you just use MP as contiguous survey to solve it so that's exactly what I've done in our code do and then if we run this again you'll notice that the FPS has dropped down quite a bit again we're back down to twenty twenty or so fps and this is all due to this code that drops the alpha Channel it's a little bit slow so if you're doing something that doesnt use metric template you might be able to get away with not having this bit of code here and you'll see a performance increase for removing this but in our case we're definitely going to need it so for now I'm just gonna leave it in there so at the end of the day it doesn't look like we've gotten too much benefit from calling the Windows API directly but we're gonna stick with this method because it actually is a benefit that I'll show you here and it all comes down to this find window function that we commented out earlier using find window we're going to be able to specify the name of the window that we're interested in capturing and by doing that we're always going to be able to capture the image data from that window regardless of if it's on another monitor or if it's behind some other windows on your screen so it's use find window all we need is the window name and the window name refers to the text that you see in the title bar so for example I've got albion opened here and you can see it's in windowed mode and in the title bar it says I'll be an online client so that's the window name of this window so here I just write the string I'll be in online client now in some programs it'll be hard to figure out what exactly the name is of that window so if you have that issue again you can look that up on google it'll bring you to this stack overflow article and here the solution is to use this enum windows call on PI win32 so if you need that I'll go ahead and put in our code and of course you call it just like any other function and it will just print out the names of all the windows you have open when you run it the output will look something like this it's got the hex value on the left and that's just the window handle and the name of the window will be here on the right and at this point our code is getting a little bit messy so I'm going to go ahead and move all of this window capture stuff into another file I'm even going to make a class wrapper around it so I'll call the class window capture and I window capture functioning I'm going to change the name of that to get screenshot and of course we need to copy over our imports here and we can go ahead and remove some of those from the main file and when you create a class every function inside of that class please do kind of self as the first parameter now I don't think these parentheses are really necessary here so let's simplify it if you've never used classes before use my best videogame analogy so most video games have classes right you've got like World of Warcraft has paladin's and warriors and hunters but inside the game there isn't just one hunter there's lots of people who play hunter but everyone who plays hunter belongs to the same hunter class and so all of those characters share certain traits and abilities in common but each one of those different people playing a hunter might have different values associated with those attributes so the class defines the properties and the abilities of all of the instances of hunters so within our window capture class we have different abilities that we are calling functions and when functions exist inside of a class they're called methods in classes don't just have methods they also have properties and properties are just variables that exist inside of the class so in our example a good candidate for being properties are the width and the height let's go ahead and move those out of the gif screenshot function you could still call it a function even though it's technically a method and we'll go ahead and put that at the top of our class so now we have two properties the width and the height and we've also assigned to those properties an initial value and then whenever we want to reference one of these properties inside of one of our methods because this W and H no longer exist inside of this function they're actually properties on the self so we'll do self dot W and self dot H same done here another feature of classes is the constructor and the constructor is just a special method that gets called when your class is first initialized so when you create a new object using a class this is the function that gets called so I've changed this to initially set my width and height the properties to zero on the window capture class but when a new window capture object is created it'll call this constructor and it'll change the width and the height to my full monitor size and another thing I want to do is I want to pass in the window name when I create a new instance of the window capture class and actually getting that handle to the window is something that doesn't need to happen every time we take a screenshot we can't just do that once when the class is initialized so let's go ahead and do that inside of the constructor but now in order for our get screenshot function to access this hWnd variable we'll need to create that as a property on our class and then every time we refer to that property we need to use the self again and of course when we go to call fine window now we want to use the window name that gets passed in we can also add a check here to make sure that the window that is passed in is actually found so here I've written some code to just throw an exception if I can't find that window let's move back over to our main file now and go ahead and set it up to use that new window capture class so you can see I've already removed that high-wind 30 to import and now we can import the window capture class so from window capture that's just the name of the file we created we can import window capture and now we need to create an instance of that window capture class so right now we have this window capture it's just this concept this idea of what window capture is but we need an actual object to attach that deal so the idea of a hunter already exists in a world of warcraft but here we want to create another example of one so to do that I'm going to create a variable to hold our object and then I'll call window capture and then I'll pass in the name of the window that we want to capture so when I make this call it's going to go ahead and go and call the constructor inside of our class and it's going to run this code and then it's going to return to us an instance of this window capture class and in object-oriented programming this instance is called an object and what's great is we can do that once outside of the loop so now in the loop all we need to do is use the window cashier and we're going to call get screenshot on it so now every time we call this get screenshot method on this wind cap object it's going to run the code in a class for get scheme shot and the self that gets passed in has all of the values for the properties that we set up earlier in our constructor and collectively the values of all the properties on an object is called the objects state so if you've been struggling with classes and with objects hopefully something that I've done or said here has made it a little bit more clear for you but I think our code is looking pretty good here so let's go ahead and run main again and we'll see what we get all right so the good news is our window capture class is working and you can see that it's grabbing the screenshot of Albion for us even though the actual Albion game is down here it's behind my vs code window I can also even move that to another screen and you can see as I do stuff in in Albion it's actually captured and displayed in our open CV window but some shortcomings you'll notice here is you see all this black around the edges that we don't really need you've also got this border around the game client we don't need that window border either let's go ahead and write some code and just cut out all that stuff all right I'm going to try to speed this up a little bit here because this videos get a little bit long back over in our video capture class in the constructor this is the perfect place to set a different width and height that's cropped down to just the size of the window that we want and to do that we can use this get window rect function from PI win32 this is just going to return the coordinates of our window that were capturing so the list will have four numbers in it the first one is the x-coordinate of the upper left-hand corner then the y-coordinate of the upper left-hand corner is the second element then it has the bottom right-hand corner as the third and the fourth elements in the list so using that and a little bit of math we can calculate the width and the height of the window so when we run that boom we get rid of the black bar and the rights and the bottom but we've still got this window border and this title bar up here that we don't really need either so let's cut those out too so I measured the size of those pieces we want to cut off on the left and the right and also the bottom there's eight pixels and then that title bar is 30 pixels tall so to get our new width which is going to take our existing width and we're going to subtract two times the border for the left and the right border and the new height we're going to subtract the border on the bottom and then we're also going to subtract the height of the title bar but of course if we just apply this width and height change as it is we're still going to see the title and the border on the left is just going to crop out some pixels that we want to keep on the rights in the bottom of the image so to account for that we kind of need to move the crop that we're doing over to the right eight and then down 30 so I'm just going to create some new properties for that I'll call it and crop why by the way your Cobra work fine here if you just use self and then create a new property to stay organized I like to also declare them up at the top of my class so just at a quick glance I can see what properties exist alright so now we've got those properties declared at the top and then to actually do this shift is going to be down in our gift screenshot function this bit blt call this fourth parameter is actually the coordinates of kind of the upper left hand corner of where you want that crop to start so we can use those cropped values that we just created to set this position and so now when are you in our code we get this nice smaller window and all of those borders and edges are cropped off but we're not losing any of the pixels from the game that we're interested in and you can see too with this change that we're getting much better fps we're hitting 30 sometimes 40 FPS here and that's because the image we're dealing with is much smaller than when we started with it's not the entire screen so it's a lot less data that we're passing through so if you are hitting any sort of performance bottlenecks with OpenCV one of the best solutions is just to scale down your images smaller or if you don't want any of the distortion of scaling your images directly you could also just make the clients of the game that you're playing make that client window smaller and of course you would do that in the game settings all right there's one more thing I need to do before I let you go our game video capture it is perfect but you're gonna struggle to translate those coordinates and the images that you process into coordinates on the screen with a video game it actually it is hopefully that makes sense maybe I should just read my comment here so we're gonna set the cropped coordinates offset so that we can translate screenshot images into actual screen positions so to get those offsets we can again use the result from get window rect because again the first two elements in the list that returns is the x position and the y position of the upper left-hand corner of the window and then to that we just need to add whatever we cropped out and this will give us an offset then I'll add those new variables to our properties list and then I'm at this function here called it get screen position we're just going to pass in a position to it a position that we found in our screenshot and it's going to take that position in the image that we processed and it's going to return the actual XY position of that pixel on the screen and there's a bit of a warning here so this is only gonna work if you don't move around your game client window after you start your script that's because we're only calculating these offsets but when we first start our script right when the constructor is run and if you want to see a demonstration of how this works you're gonna have to wait for the next video alright so thanks for sticking with me this video has been a lot longer than I was initially planning on but the key takeaways a we get from this video are first don't feel bad about using Google at all when you're programming if anything you should be focusing on improving your searching skills and secondly I hope you have a better understanding of what objects and classes are for a lot of new programmers this is really the most difficult conceptual hump to get over but once it clicks it's really all downhill from there learning how to code so now that we're capturing this screen data we're all set to start processing these images using OpenCV our next step is to simply combine the window capture code that we wrote here with the match template code that we wrote previously in this series if you're following along or if you're waiting for me to release the next video now would be a good time to experiment with doing that next step yourself and then in the next video you can compare the solution you came up with to mine my goal here is to help you become a better programmer and exercises like this will help you get there so good luck and I'll see you in the next video you
Info
Channel: Learn Code By Gaming
Views: 73,249
Rating: 4.9619198 out of 5
Keywords: opencv, python, opencv video capture python, windows game capture, real time image, real time image processing, opencv video capture, capture screen, screenshot python, python record window, swapping through frame, imagegrab, pywin32, opencv tutorial, computer vision tutorial, real time window capture, capturing screen, real time screen capture python, screen reading python, opencv window capture, computer vision beginner tutorial, programming tutorial, grab screen
Id: WymCpVUPWQ4
Channel Id: undefined
Length: 30min 47sec (1847 seconds)
Published: Thu May 28 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.