8-Bits Of Image Processing You Should Know!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello it's been a while since I've done one of these videos but this video is eight bits of image processing you should know and you might be thinking well why do I need to know anything about image processing well images are just 2d arrays of data and the algorithms that we apply to this data can shape it in useful ways obviously some of the applications involve images and cameras and video footage but there's also other ways of manipulating 2d data to your advantage for example in things like procedural generation on the whole I think most programmers should have an awareness of image processing this is a very useful tool to have in your toolbox so let's get started before I start I'm going to show you the video that I've created to demonstrate the 8 bits and it's quite nice because it allows us to quickly compare the algorithms so here it's going to show the first bit which will be fresh holding and we can choose different numbers to look at the different algorithms and see what their effects are because it's working with video you can see here's at the live feed of my arm waving around I think it makes quite a nice interactive tool which is great for learning but this video is going to be a bit different to some of my others I'm not going to go through it line by line from scratch I've already created the code and what I really want to emphasize is what is the image processing that is going on and how does it work bit one this is the process of binarize inimical image so the pixels go from black to white fresh holding involves taking an individual pixel and classifying it as being above or below the threshold if it's above the threshold you output 1 if it's below the threshold you output 0 this green line represents a single row in our image if I take this row and plot its X position against the brightness of that pixel I might get something that looks like this fresh holding involves specifying an appropriate value to act as a cut off so any pixels above that value will get classified as 1 and any below it will get classified as 0 the red dashed line represents my threshold value so now with my blue pen I can indicate what the binary equivalent of this might be so it starts down here at zero but that goes above the threshold to one below the threshold above the threshold and we've binarized our image to demonstrate these programs I'm using a pixel game engine application that I've already created and I feel it's necessary to give you a brief overview of what this application is before we get stuck into the algorithm code just so it makes some sort of sense fundamentally it's based on the idea of a frame which is a fixed 2d array of pixels in this case 320 by 240 the pixels of floating point type so instead of working with RGB values I'm taking the RGB from the camera and converting it to a floating point value between 0 & 1 by converting to the floating point domain from the integer domain allows me to avoid complexities such as integer division this simple frame class has some accesses get and set which will do boundary checks for me so I can quite happily set a pixels value beyond the edges of the image and if I get something from beyond the image it just returns a black pixel so zero is black and white is one my frame class also overrides the assignment operator so I can have multiple frames in my application and I can transfer the contents of one frame to the other with ease for this video I'm not going to dwell on the image capture side of things I've already done that in other videos and it's enough to say that we simply use the skp library to capture a frame from a webcam so in on user create the webcam is initialized and in on user update I capture the image from the webcam per frame and convert the pixels to floating point and store it in a frame called input this program shows eight different algorithms and so the bulk of the code shown here handles the selection of which algorithm is currently being demonstrated and the algorithms also have a degree of user input which allows the user to change the values to play with the algorithm and see how they respond under different circumstances for example when the user presses the one key on the keyboard it changes the current algorithm being dem straighted to threshold so let's continue looking at that algorithm here it is and you'll see this on most of the algorithms we do a little bit of user input if there are values to change and then we actually perform the algorithm under demonstration and Thresh holding is very simple for all of the pixels in the frame we read the input value of a pixel for that location compare it with a threshold value which will give us a 1 or a 0 in response and then we write that to an output frame at the end of the program I then draw the input and output frames hopefully you can see Thresh holding is very simple indeed so let's take a look at it this is Seche holding now my webcam has some automatic gain correction which is what you saw then as the image sort of changed and faded I can't override those settings using the API for the camera but for this video it doesn't really matter I'm in threshold mode now and we can see the input image here on the left it's in grayscale but the outputs image here on the right is in black and white it's been binarized and it says here I can use the Z&X keys to change the value of the threshold so currently it's at 0.5 it's halfway between the minimum and maximum intensities for the grayscale as I increase the threshold value we see less pixels being attributed to a binary one and as I decrease it we see the opposite fresh holding is essentially the coarsest of filters and this is usually the first step in removing as much rubbish from an image as you can for example here you can see on the notebook the text 1 loan coder comes through quite clearly but the lines and the slight grayness of it doesn't so if we were then to go on and extract this text for example it's much easier now we're not contaminated with this spatial background noise with fresh holded it out bit to motion and for this video i'm assuming the simplest kind of motion we won't be able to get any direction information from this the word motion implies that something has moved and for something to move takes time so to detect motion in an image we need to allow time to have elapsed fortunately with video this is quite simple because a video camera returns successive frames in time which means we have a built-in Delta time between each frame alongside movement in time motion also implies movement in space the object was in one location and now it's in another but for this bit let's not think of objects as being the things we're looking at instead we're looking at pixel grayscale values so over time if something is moving in the image a particular pixel value is also changing so we can identify that motion has occurred by looking at the difference of pixel values between successive frames of video input and so on this graph we can see that the difference between a and B is related to the change in that grayscale value the end result of this could be signed and in some applications that's a useful thing it gives you additional information but for our application I'm just going to take the absolute value of this to tell us that motion for that pixel is likely to have occurred the code for motion detection is equally as simple as threshold ago through every single pixel in my frames with these nested for-loops and I'm going to look at the difference between the current frame and to the previous frame by subtracting them and then taking the absolute value of that result and setting that in the corresponding location to the output frame and then draw the input and output frames I update the previous input frame before i acquire a new image in the input frame here's the algorithm running and it's looking at a reasonably static scene but as soon as things start to move I bring my hand into the scene we're looking at the difference between one frame and the previous frame but we only see illumination in the output where there has been change so that signifies that motion has occurred in those locations because the frame rate of my camera is reasonably quick it's about 20 frames per second I get what looks like an edge around the object that's moving but don't be fooled by this it's not strictly an edge although you can use it as an edge it is just the difference between the two frames motion detection like this is usually a foundation algorithm it is used to guide your decisions in subsequent algorithms that you apply to the image for example I might want a system to shut down if nothing in the scene is moving I mean why bother taking more images if nothing has changed so I could detect that by accumulating the sum of all of the pixels in the output image and then checking that against a threshold value to tell me has there been enough motion in the image for the system to switch on bit three low-pass temporal filtering as we've just seen in bit two the value of a pixel changes over time and if we look over a longer period of time we might see that the pixels change values quite rapidly between frames this is called noise because sensors aren't perfect lighting conditions electronics and all sorts of things can influence the value of a pixel this noise can causes problems because what we actually want to see is the real value of the pixel change over time she's indicated by this green line we can approximate that that's somewhere in between all of these noise values and noise can become a problem if you do things such as threshold encoder fish hold inappropriately we effectively want to run the grayscale value of the pixel through a low-pass temporal filter so the low frequency component of the pixel is allowed through and the high frequency components are removed we can approximate this with a very simple equation for a given pixel value P we're going to update that pixel value by looking at the difference between the input pixel value and the current pixel value and multiplying that by a constant fundamentally if this distance is small then the change in our output pixel is small and if it's large then the change in our output pixel is large but we can regulate that change with this constant in engineering this is also known as an RC filter and it's implementation is very simple in the low-pass section of the program and do some user inputs so I can change the value of this temporal constant and then iater ate through all of the pixels in a frame I look at the difference between the input and the output I scale the difference with our temporal coefficient and then I accumulate that difference back into the output frame with this plus symbol for this algorithm the output frame is persistent between updates of the video camera feed meaning that output pixels are only changed by a small amount depending on how large the change was of the input so here in the program I'm now running bit 3 the low-pass temporal filter and the two images look very similar it might not even be that possible to see on the YouTube video but the input image on the Left actually has quite a lot of per pixel noise but the output image on the right has no temporal noise visible to the naked eye if I move my hand into the scene this is a particularly slow filter so I can make rapid changes so by wiggling my fingers around here but we can see that the output image doesn't change very much it's ignoring those fast changes only allowing the really slow changes if I leave my hand in a fixed position eventually it feeds into the image so this is exaggerated in a way I can use the Z and X keys to change the value of this constant so I can make it very slow indeed which might not immediately seem a useful thing to do but if you wanted to do some background subtraction algorithms over moving images this is quite a nice way to do it you can accumulate the background of an image over time and then use that as a way to isolate things in the foreground if I increase the value of the constant it becomes far more live let's keep going a bit until the two images look very similar indeed but if you get it too high this constant you'll start seeing the per pixel camera noise coming back into the output image so low-pass temporal filtering is a great way to filter noise and it also looks all ghostly and cool bit for convolution whereas the previous two bits have looked at filtering things in the time domain convolution looks at filtering things in this spatial domain fundamentally we're going to decide what to do with a pixel by looking at its neighborhood for this example I'm going to look at the immediate 3 by 3 neighborhood of our target pixel and this neighborhood is called a kernel and you can think of a kernel as a template of coefficients that are used in a dot product of the neighboring pixels in that region and values of the kernel to give us a result for the central pixel so my kernel might be defined here as a 3 by 3 matrix of values these values are overlaid over the corresponding pixel in that location and we also include the central value which is the target pixel I can give these kernel values location information to identify the relationship to the target pixel I workout my final pixel value by performing the dot product between a kernel coefficient and the grayscale value of the pixel at that location so this component for example is this component of the kernel multiplied by this pixel value and we go on to go through all of the kernel locations and so what effect do you think this kernel might have then well we can see it as being regions of influence we're strongly influenced by the target pixel the one in the middle but we're also a little bit influenced by our immediate north south east and west neighbors conveniently in this kernel all of these values add up to one and this is quite deliberate so we take the bulk of our pixels value from what it already is but then we take a little bit from its neighbors but we still land within the approximate range for that pixel this will give us the effect of blurring the image because we go on to apply this kernel every single pixel in the image I've implemented convolution in a really naive way here I'm going through every pixel in the frame and for each location I'm accumulating into an F sum variable my kernels are three by three but I want to get the corresponding offset location for a kernel coefficient in my image so I've got an additional two nested for loops which iterate through the values in my kernel the kernel we've just created is a blurring kernel and I'm representing that as just an array of nine floating-point values I index into the appropriate location of that kernel using some simple 2d to 1d arithmetic once I've got the right coefficient I multiply it by the input grayscale value and accumulated into my F sum variable I then set my output pixel for that location to the F some variable in this demonstration I've included two kernels blurring and sharpening but the coefficients of the kernels are quite different there's a little bit of user inputs at the top to choose which kernel we're going to apply to the image in the demonstration program I've chosen bit four for convolution and it's currently running the blurring kernel and we can see that the input image on the left is sharper than the output image on the right this blurring only occurs once over a 3 by 3 neighborhood so it's a very delicate blur however if I press the X key I can change to the sharpening kernel and we can see that this kernel has the effect of enhancing any areas of high contrast in the image you may have seen these filters in popular art programs the downside to sharpening of course is it also sharpens all of the noise so we may want to combine convolution with the sum of the previous filtering algorithms we've already seen in this convolution example I've used a very small kernel 3x3 so it can only look at its immediate neighbors for blurring for example if we wanted to have more of a blurry image there's two ways to go about it the first is we could repeatedly blur the image so once we've blurred it once we then use that as the input and blur it again and again and again and again until we've got the desired level of blur a second approach is to use a much larger kernel so we can go 5 by 5 7 by 7 11 by 11 whatever you want but the kernels and the convolution I've shown here are four levels of nested for-loops they will explode computationally and become very slow and difficult to get any kind of real-time performance if you're serious about doing convolutions and most image processing technique and programmers are then you'll want to do your convolutions in the Fourier domain the frequency domain instead where you take the fast Fourier transform of your input image and your ass Fourier transform of your kernel combine them and then take the inverse Fourier transform of the result and this will allow you to get very large kernels with a fixed computational overhead a far more suitable approach for real-time image processing bid five Sibel edge detection edges in images indicate where the information of an image is simplistically because edges indicate where spatial change has occurred in this example image we have background and we have a box the background doesn't change locally and the box doesn't change locally so there's no relevant information in either of those zones so one could argue that the box is really only defined by how it is different to the background and this difference lies along the edge of the box so detecting edges is quite an important step and perhaps the most classical way to detect edges is using Sybil's edge detection which is a pair of kernels used in a convolution the two kernels detect edges in the two main directions we've got horizontal the kernel looks something like this and we've got vertical if we convolve the image with the horizontal kernel we will see the horizontal edges and then will convolve the image with the vertical kernel and we'll see the vertical edges will then combine the horizontal edges and the vertical edges into a single output image to show us all edges as before with the motion detection the sign bit of the result of these convolutions can contain useful and interesting information but I'm going to throw it away by taking the absolute value of the results of these convolutions the Sibel part of the code is exactly the same as the convolution part of the code before except I'm doing the two things at once I'm maintaining to summation variables instead of the 1 and when I'm writing the output summation I'm taking the average of the sum for the vertical and sum for horizontal ponents I've defined the kernels for Sibel in exactly the same way as I did before they're just arrays of floating point values here is the demonstration program running the Sibel edge detection algorithm and as we can see edges are illuminated and boring surfaces ie those that have low frequency spatial information remain black the nice thing about Sibel is it works on this grayscale input and you can see it starts to highlight all of the areas of high frequency information so look at my really hurry arm there it's quite a visible thing so that's that's really indicated texture bit six morphological operations even though that sounds like a mouthful at morphological operations are really a study of how things look spatially in the image we want to do things regarding the shape of objects in the image and we do this by working in the binary domain so we must threshold the image first to binarize our pixels to 0 and 1 values and for this bit it's really split into three different bits the first I'm going to look at is called erosion in this simple demonstration image I'm assuming that the background is 0 and my object is 1 erosion is the effect of eroding a single pixel of all edges of our object effectively shrinking it and it's useful to remove airiness spurious pixels from other stages of image processing because if I had a single pixel like this or a cluster of a single pixels in a line and I eroded it then if we're removing a 1 pixel from all edges it's going to disappear entirely whereas larger objects the morphology remains intact the opposite to erosion is dilation and it is quite literally the opposite this time we grow a one pixel boundary around our shape so just going back to the previous example if we had some spurious small information and we first erode it to remove it and then we dilate the image again there's nothing here to actually dilate but our original shape will go back to something very similar to how it was originally so this is a nice way of removing spatial noise from an image in many ways implementing morphological operations is very similar to convolutions but this time we use logic instead of dot products looking at erosion we use a very similar 3x3 kernel to how we used in the convolutions but this time my kernel is just going to be a three by three matrix of logic ones for every pixel in my source image I overlay my morphological operation kernel for that pixel so let's for example say I put it here centered around this pixel I then do a logic and with all of the elements in the kernel and all of the elements in the image in this case we can see that one and zero here is well zero and this one is going to be zero this one's going to be zero and once we've ended all of those together the end result is going to be zero because we've got some zeros in there so I will write to my image zero however when I get round to operating on this section of the image the result is a logic one because all of the kernel coefficients and all of the surrounding neighborhood pixels are going to add together to give me a 1 this pixel that was on its own has been eroded but this pixel that is robustly supported by its neighborhood stays intact instead of just doing logic and we could also do some simple arithmetic we could count how many of our neighbors are equal to one and come to the same conclusion so in this case I've got 8 neighbors all equal to 1 so my current value pixel is a 1 but in the original scenario only three of my neighbors were one so I could then look for less than 8 neighbors and in that condition I set myself to now the interesting thing with erosion is that one man's erosion is another man's dilation if I inverted all of these bits and applied the same kernel I would have the effect of dilating the image but I can also dilate in a second way without requiring this inversion and this one is quite simple if a given pixel exists then set its neighbors to exist as well and this would have the effect of growing our object by one pixel along all edges before I get stuck into the code on this I just want to show visually another useful thing for dilation that if I can identify a particular point on an image let's say here and I repeatedly dilate I can effectively flood fill in the image now there's an important condition here is that after every dilation if we logically and that with the original image will never go beyond the boundaries of the original image so this allows me to fill a secondary image with just the space occupied by a single shape in binary space of the input image and this is a great way for doing image segmentation the extraction of objects or labeling image parts in fact I find this to be really interesting and I think it's worthy of a video in its own right I've got some interesting examples of how we can demonstrate that so expect in the near future a follow-up video to this just showing morphological operations in the code the morphological section is the largest of them all it's got the great degree of user input 1 to select which particular dilation or erosion operation are we going to do there's also a third one called edge which I'll talk about when we get to the code but I can also control the number of times as I perform the dilation over or erosion and so we'll be able to see those effects visually but the first thing and the most important thing is we need to take our image from the grayscale domain into the binary domain so I'm just holding it and I'm going to threshold it using the value that we've specified in the threshold algorithm so this will allow you to tweak the threshold first then we go on to choose dilation erosion or this third one which I'm calling edge it's going to be an edge detect using morphological operations for dilation it was simply a case of seeing if given pixel value was equal to one then set all of your neighbors to one - easy enough but I use this third register activity because I don't want to alter the frame as I'm going through the pixels it's important that the frames are treated homogeneously for erosion rather than doing a set of logical operations I decided to go with the summation route so I'm looking at all of the values of my neighbors and all of the ones that have one I'm summing together now I know that in my activity frame they're going to be ones or zeros anyway we've binarized the image so I'm just going to add them and I'm going to check to see if my activity is 1 and if it is but not all of my neighbors are set to 1 then I'm going to set myself to 0 put myself out this third one which I've not drawn up in OneNote I've called edge and it's exactly the same as erosion but instead of looking for a less than 8 neighbors I'm looking for precisely 8 neighbors in the situation that I am an illuminated pixel and I have all of my neighbors I'm going to extinguish myself so let's take a look the program is started up with the thresholded image and this is just so I can tune the threshold value appropriately I can still tune it later on but it's just a bit easier to do whilst looking at the grayscale image then I'm going to choose bit 6 more binary morphological operations now the current operation is dilation and it looks like this just a single iteration being applied so on the Left where these individual pixels are occurring you can see they're expanding into a 3 by 3 region this has had the effect of removing the one lone coda text entirely from the book object so this might be a precursory stage in being able to extract where is the book this time if that's useful I can increase the number of dilations with the a and s keys and as you might expect the regions just grow so dilation is a way of grouping things together erosion is a little different here on the Left those are spurious single pixels have all but disappeared as has a lot of the noisy background so erosion has really just allowed us to focus on the core parts of the image if for example I roaded this image and then dilate well I can't bring any of that background noise back in we're looking at the eroded image now if i dilated this there's no source pixels to seed anything for dilation so combining these two operations is a nice way to remove noise let's add multiple erosions and eventually we can erode the image away entirely erosions and dilations as I demonstrated very briefly can be used for detecting and labeling objects individually and in a primitive way they can also be used for sizing objects if you continuously eroded the scene and looked for the finely illuminated pixel you could then dilate that pixel and logically and it back into the scene to highlight the largest object so you could use erosion and dilation for sizing too now finally there's one more mysterious one if I press the C key and that is using morphological operations to detect edges and this gives you a really nice edge detect where all of the edges are a single pixel if I bring my hand into the scene it's a bit messy let's find a nice high contrast scene there we go so that's quite nice with isolating all of the buttons now you see the cameras adjusted in gain there not much I can do about that but this is different to Sibel and it gives you a really crisp outline edge for an object bit 7 median filtering when working with real-world sensors things don't always go to plan sometimes your image will have little tiny artifacts in it these are bad pixels or snow depending might be in a radioactive environment for all I know but you end up with something which doesn't look quite right and these are usually quite tricky to filter out unless you use a median filter median filters are conceptually very simple in a similar way to convolution for a given pixel we look at its immediate neighborhood in this example I'm using a 5 by 5 neighborhood and we're going to make an assumption in our image that we're not really expecting spatial change to happen over a single column of pixels or a single row of pixels that's quite unusual in natural images so it tends to be that in general information in images is rather blurred out across the image so up here where I've got this single airiness pixel I can make an assumption that most of my image in that region is behaving one particular way and that this blank pixel is some form of outlier and we can statistically remove the outlier by looking what the median pixel value is across all 25 pixels in this 5x5 kernel now if you don't remember what the median is you take all of your values and you sort them in order and you take the one that lies directly in the middle of that sorted set it's one of those quirky mathematical phenomena where it sounds really complicated mathematically but there's actually no maths involved at all it's just a sorting and an extraction for the median filter code as usual I'm iterating through all of the pixels in the image and I'm not trying to optimize this at all I want the code to be readable and so I'm going to do something really horrendous and create a vector of the floating point values which represents all of the pixels surrounding the pixel I'm currently investigating over this 5x5 area and all I'm doing is extracting those pixels in my neighborhood and pushing them into the vector once I've got 25 pixels in my vector I then sort them using standard sort since I've got 25 pixels the one in the middle it will be at the 12th location of my vector and that's what I'm choosing as my output value it's pretty simple huh and so here in the demonstration program I'm now running the median filter and you might be at first thinking it's just blurring the image but it's not it kind of looks like it's painted it a little bit so I'm sure median filtering is used in some of these arty effects programs but I do have a test image that I've created here where I've got the words median filter written on my page and you can see the median filter is filtered out the lines because it sees those as anomalous and it's also filtered out to the dots but there's sufficient information left in the text to then go and extract it using Thresh holding or morphological operations or something else so it's just a precursor efface to help you with later stages of image process once the dots get large enough that they occupy a region if I even so zoomed in like that you really would struggle to to identify the dots but the text is just fine so when you get this sporadic salt-and-pepper noise in your image median filter is the thing to choose and finally bit eight locally adaptive threshold I started this video with bit one looking at fresh holes that were applied globally across the image more often than not this is sufficient but there are situations where you want to threshold an image based on local information working under the same principle as the median filter we make an assumption that overall for a small region of image there's not going to be a great deal of spatial variance so it's better to choose a threshold value based on the information in your locality well at least bias towards a threshold value found from the information of your neighbors so I know for example in this region of the image if I take the average value of my neighborhood then things that are statistically interesting may be a certain level above that average and that average might be different for different parts of the image which means if I used a global threshold that threshold value may not be appropriate for different regions of the image it might not be immediately obvious what locally adaptive threshold ink buys you compared to a global threshold but I find it really useful when you've got change in luminance across your scene change in luminance is just a fancy word for shadows we're now getting quite familiar with the code for these algorithms we're going to iterate through all of the pixels and in this case I'm going to take the average value of my immediate neighbors five by five but I'm then going to use that average as part of my fresh holding calculation region some contains the average but I'm going to bias that value with some user-defined constant and that constant is user configurable via the user interface before we go on to threshold the image so let's take a look here I've got the input and output showing just the regular fresh hold this was the one we started with bit 1 and I can try and find a value which sensibly thresholds the image but you can see it's a global effect and one of the things I wanted to show here is that the shadow cast by my hand that is influencing that threshold decision I'm going to press the the eight key now to choose the adaptive threshold and we can see straight away the difference is that my shadow has basically become irrelevant to the scene the local area around each pixel is used to guide what value were using to fesh hold it against and so the areas in shadow have overall but lowered the threshold value and the areas of brightness that overall have raised the threshold value and so we're choosing a value which is varying across the screen in order to make our threshold indecision so if you wanted to make things that are Shadow and luminance invariant then you probably do need to use some sort of locally adapted threshold inalca rhythm it's also quite a cool visual effect too and so that's that image processing is a huge field and for me I find it to be a very interesting one too in this video we've had a very quick look a very cursory introduction to some of the most fundamental techniques you need if you wanted to start doing image processing as I mentioned in the introduction you don't always have to work with images procedural generation may also employ a lot of these techniques too particularly the morphological operations anyway that's it for now if you've enjoyed this video give me a big thumbs up please have a think about subscribing come and have a chat on the discord server and I'll see you next time take care
Info
Channel: javidx9
Views: 69,099
Rating: 4.9831982 out of 5
Keywords: one lone coder, onelonecoder, learning, programming, tutorial, c++, beginner, olcconsolegameengine, command prompt, ascii, game, game engine, pixelgameengine, olc::pixelgameengine, image processing, threshold, convolution, motion detection, adaptive thresholds, webcams, video capture, morphological operations
Id: mRM5Js3VLCk
Channel Id: undefined
Length: 36min 47sec (2207 seconds)
Published: Sun May 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.