Convolutions in image processing | Week 1 | MIT 18.S191 Fall 2020 | Grant Sanderson

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

By the way, does anyone know what microphone Grant uses in his videos? It's honestly one of the best sounding mics on YouTube

👍︎︎ 3 👤︎︎ u/educationprimo 📅︎︎ Sep 03 2020 🗫︎ replies

which discord server is the one in the video?

👍︎︎ 1 👤︎︎ u/life_on_marx 📅︎︎ Sep 04 2020 🗫︎ replies
Captions
[Music] Hello and welcome to the first live lecture of the course. So before we jump into things, there's just a couple of logistics to handle here. The first one is that we have a new website. So if you check the link in the description, there's a new official course website and again this is something that anybody in the world is free to attend the lectures for if you want to. Second we're going to be doing the discussion during this lecture through discord. So if you want to answer questions, or ask questions, or interact with the other professors while I give this lecture, the discord's the place to do it. Questions will be fed to me and we can pull them up on screen here. So for example, it looks like last time there were some people asking about some of the functions I was using in the notebook, which I failed to mention were custom defined within that notebook. So if you just try and like, open raw julia to call decimate it doesn't mean anything. But it was just this small little custom thing. And in general we'll share the notebooks that we're using; so if you want to follow along you can see all those pre-definitions. And with that, you know we only have 30 minutes for this lecture, convolutions are super interesting and let you do a bunch of powerful things. So I say let us just jump right into things. So as a first example, I'm just going to take this 8-bit mario that we see; this very low resolution image and I want to blur it. And if you're thinking about blurring an image, maybe the first thing that you could think to do would be to look at each pixel and kind of average it with all of its neighbors, and do this sort of moving average. So the way I'm doing it here, we've got this little grid, uh three by three grid of values that's marching across the image and slowly creating a new image on the right. And to be concrete, what's going on here is, each of those values is 1/9, and we're sort of thinking of lining them up with some pixels, multiplying each of those values by the corresponding pixel, and then adding them together. So in this case, if they're all 1/9 then we multiply them and we add them together, we're just taking an average of the nine pixels. So in this specific window, it looks like there's four black pixels, four gray pixels and then one that's kind of the mario skin tone color. So you would expect the average to be like a darker gray with the tiniest, you know, hint of mario skin color. And if we go and look at the new image that's being generated, and what's painted into that pixel it's that: it's a dark gray with the tiniest hint of some kind of skin color. And as we keep marching along we get this blurrier image of mario. We might zoom in on another example; here, you know, this grid it's mostly covering the skin color with just a little bit of black here and there. So we would expect it to be a died down version of that, which is what we see. And he's, this gives kind of a nice effect, right. We zoom out and we see, what was once this nice clean, 8-bit mario: same resolution, nothing's changed about the resolution, but what we've got is kind of this blurrier image. Now a couple nuances about what's going on here because, as with so many things in computer science there's edge cases to handle. In this case, it's a literal edge case like what's going on at the edge. Because if we just let this grid of values fall off the side and, you know, maybe we either don't count them or we're multiplying them by zero, what ends up happening is we would only be taking four ninths of the given corner value, and it's going to end up disproportionately dark - which, in some mathematical contexts maybe that's what you want, you would want to treat the image as being something where outside of a certain domain it's entirely zero. But when our eyes look at the blurred image it doesn't quite seem right, because you see all of this black or this sort of darker gray along the side, and what we expect is that this was an image taken in a broader context where what's surrounding, you know, Mario's world, is the same gray that we would have thought. So in the problem set for today's, not today's lecture, for this week's lectures, you've got, well you've got the task of basically defining a function that will handle this and a more general variety of what's going on here. And one thing to think about is what happens at that edge, and a typical convention in image processing contexts would be that, when you're falling off the edge like that, you just multiply by the nearest value, right. So if you see that it's 1/9 times 0, instead of falling off the edge you take the nearest value in the image, and that would make the boundary of our Mario look a lot more gray. So that's all well and good, but there's another nuance that we might want to handle here which is, you know, if we're not in a super low resolution context like this, but if we're in something that's higher resolution and we want to have a box that's a little bit bigger, like maybe I have this 5x5 box that's wandering along. It's a little bit strange to treat all of the pixels in that 5x5 box with equal weight, because the ones towards the center are somehow more representative of what's going on in that image at that point than the ones that are farther around. So for this example, what I've actually done if we zoom in here, is I've given a grid of values that aren't all the same, you know, it's not all 1/25 in this case, but the middle one is 0.162, and out at the farther regions we've got 0.003. So when we multiply those by the corresponding pixel values, and then we add them all together, we're basically taking a weighted sum weighted according to how close to the center it is, and that's a - that's a way of saying that, okay, it's okay for this pixel here to be influenced by what's at the outer corner, but only by a little bit, you know, we shouldn't - we shouldn't get into it too much. These specific values are coming from a Gaussian curve, also known as a bell curve, so you might imagine kind of this bell curve surface sitting on top of the image, and it's bulging right above the point that we want, and then it dissipates in values as you go away from that. And that gets us a much nicer notion of averaging the pixels, so that it's, like I said, it's taking into account how close they are to the point we want. The Gaussian, specifically, has some really nice mathematical properties that we might talk about a little later on in this lecture that makes it really, really quite nice for doing this kind of operation. Now this operation that we're doing, as you can infer from the title of the video and uh everything about this and what we talked about last time, it's a convolution. What you would say is that the image on the right-- the resulting values that we get--are a convolution of the original image with the little grid of values that we have here. Okay. So it's an operation that takes in-- at least in an image processing context-- two different grids of values--maybe one is RGB, maybe one is numbers, they could both be numbers--and it combines them to get a new grid of values. In other mathematical contexts we often think of it as combining two functions to get a new function. And when you're doing it with uh just this little smaller window, it's quite common to refer to it as a kernel. So in this case we would say that we convolved Mario with a Gaussian kernel. Okay. And at this point you might sit back and think, “Okay, you know, that's all well and good. I guess with different kinds of averages I could get different sorts of blurring effects going on in the images, and you know, I guess there's some use cases for blurring images, like blurring out faces,” but it goes way beyond that, actually. First of all, smoothing out an image with a Gaussian kernel is useful in a lot of contexts beyond what you would expect, but the important thing I want you to keep in mind here is not the specific example, but, um, the general framework that's being developed, where if we tweak what the values inside that kernel are, we can start to do some more nuanced and interesting things with images. And to get to that, you know, I say let's just dive right into actually coding up some examples here. So um I'm just going to be creating a little Pluto notebook, like I did last time here, and uh right now I've prepared it with an image. So I've created large_image is loading a certain image of Tom the cat, and then I've shrunk it down, mostly just so that it's easier to see on the screen, but also that makes some of the things faster to play with. So if we just check the size of the image, it's now a very small 500 by 399. And, again by the way, if any of you want to ask questions, and you do so in the Discord, some of them will be, like, forwarded in my direction, and I'll try to keep my eye on if any of those have come in, so feel free to just do that whenever. And, um, oh I'm told my mic is peaking a little bit, so let me--let me just turn that down a little bit, give you a better sound experience. Hopefully it's not too unpleasant in any ways. It's always so finicky to get these live streams to work properly. But, we want to actually do this convolving operation, right? We want to actually see what happens when we mess around with different kernels. So first of all, to get a given kernel, there's this library that I'm going to use-- we've got all of our definitions down here for various functions I'm going to have-- but I'm going to be using one called ImageFiltering, and in it, it has this module called Kernel, and we can pull up the documentation for it--you know just using the live docs here, where it tells us the various different functions it has in there--and among them is this Gaussian, which is exactly what you want. But it already tells you that there's more interesting kernels that can be played with. And if I create a Gaussian, and I specify two different standard deviations--one in the x direction, one in the y direction-- what it gives me is a grid of values. So we take a look, and this type is a 5 by 5 offset array, okay. So essentially it's an array of floats, but the offset just means that the way that you index into it-- let's give it a name like kernel-- is such that the middle value is at 0, 0. So if I called kernel of 0, 0, I get that middle value that we actually saw in the animation--point one six two-- um, and then if we take, you know, something like -2--which usually, you know, you can't index at 0, much less below 0-- uh, with Julia arrays, in this context, it's specifically offset. That doesn't matter too much--just wanted to say it in case you were wondering what that term means. What we actually want is a nice way to take a look at what this kernel is. So I wrote a very simple function just called show_colored_kernel that's basically pulling out the positive part and making it green, and then if a kernel has negative part, it's going to pull it out as red. So you can take a look at the details, if you want, for that later. But all I want is for you to think about, okay, we've got this little grid of values that's very peaked at one point and then kind of dissipates out. We're gonna have it march along the image of the cat here, and what's that gonna do? And you already know what it's going to do because we've played around with Mario, but the idea is that, as we tweak what this kernel is, we'll get some more interesting things. So what you will write in the problem set this week is a function called convolve, that will do exactly this. It'll take in an image, it'll take in this kernel of values, and it'll produce a new image, which, in this context, is a little bit blurrier. Now because this is a small kernel, you know maybe it's not as visible, but if I changed it to be a Gaussian with standard deviation 3, the library is smart, and it knows to make it a little bit bigger, if I want it to. So now it's a 13 by 13 array, and we see that it indeed does make it notably blurrier. and we could go to an extreme we could say you know I want the standard deviation to be quite a bit larger so maybe 10 by 10 and this gets me a much larger array so that little Gaussian curve that we're imagining marching along the image that would now be 41 by 41 and the result that you get is that every pixel is really influenced by a lot around it very much like you took a lens and you um, you unfocused the image. Now while we're here, and because this is a class about computational thinking, let's think a little bit about computational complexity, because in order to process that image there, julia's generally pretty fast um but this took 9.5 seconds - now a big part of that is maybe because I wrote this convolved function less efficiently than it could be, we'll talk later this lecture about like very clever things that are done to make it more efficient - but just in principle if you want to like actively think about how many operations are taking place here it's a fun little exercise to think about because the size of our image, right, it looks like it's not too big, it's 500 by 399. Which means if we take the product of those two values that gives us the number of pixels, so if we multiply that by the number of values in the kernel, so the product for the size array on the kernel, it's actually quite a few different multiplications that have to happen through this whole process. Because remember every time that little window is sitting on top of a pixel, you're multiplying all of those neighboring values and then you're adding them up. So in this context it looks like we've got - you know - a thousand, million, just for one simple small image like 300 million multiplications have to happen; so that's just a thing to think about here is that as the kernel gets larger, it - unless you do something clever - it will by default take a lot longer to process images especially if they're higher resolution. But before we talk about how to diminish that kind of complexity let's just play around with what happens if I change this kernel. So actually before I even show you, I think I want to do this as a live quiz so that might be a little bit fun where we were doing a couple of these in the intro animation just some warm-up questions; but as the first real question I'm going to give you a sample kernel. Okay I'm not going to show you what it does or tell you what it does but I want you to actually try to think about what this would do. Okay so it's going to be a little 3x3 grid and it's got a lot of negative values around the boundary and the center value is a 7. Okay so it's going to be seven at the center and then some negative values around it, specifically one for the immediately adjacent neighbors and then negative point five for all of those around it. And uh at this point if you go to itempool.com/3b1b/live you can place a vote. We're seeing the statistics on how other people are answering and I'm - I'm actually genuinely curious what people will will say to this one. So this is going to be our first example of a non-blurring - well maybe I shouldn't say that - maybe it's obvious from the context that blurring isn't going to be the right answer, but while you - while you answer that I'm going to go ahead and type that up into our notebook just to have prepared; and also while you answer it, think about what aspects of this kernel you could try to compute to get a loose sense of how it would behave in different uh regions of the image; you know if it's in a monochromatic region what does it look like; if it's in a region where there's a lot going on what does it look like? Like try to - try to think through some examples in your head. And that's probably around the amount of time I want to spend, although let's let's just take a look if there's any questions in the doc to uh - to answer while we're waiting there. Is it possible to reconstruct the original image from the blurred one? I - I don't think so um that's an interesting question we'll have to ask the other profs; I definitely don't think so because you can take this to an extreme where uh it feels like you're definitely losing out on information. There's other things that you can do to it though, like the fast Fourier transform would be an example where you can recover the image exactly, but those feel like you know very rare. I don't want to say that offhandedly because I don't know quite off the top of my head. One of the other profs who's smarter than I might come in to answer. And the second one how does convolving an image with a kernel directly relate to convolving two functions - glad that you asked that; I'll talk about that in just a little bit actually, what the relationship between image processing definition of convolution is and then how you see it in some other mathematical contexts; it's essentially the same idea though, that you have one thing that's marching across another, you multiply corresponding values and then you add them up. It's - it's really the same operation modulo like one or two small conventions. All right so if we go back to our quiz I'll go ahead and see what the answers turn out to be here; and fascinating so it looks like most people thought that it would detect edges but if we actually play with this, if we go to our notebook here and I've gone ahead and written in the kernel so that same grid of values that I gave you where seven in the center and then some negatives around the effect that it has on the picture of the cat here is it actually kind of sharpens it up so if we look previously his chest is very fluffy very soft but over here what it's done is really accentuated the differences between one piece of fur and the next. Now for those of you who did answer edge detecting, you're definitely on the right track uh but one small thing that you could do to kind of check what the behavior of this kernel might be is to notice uh what the sum of all of those values are (uh misspelling kernel here) so if we add them all up they equal one which is much the same that you would do with any kind of weighted average if you have all the weights they've got to be one, and what that means is that if you're in a region of the image without any edges okay you're in like a totally monochromatic spot, if you add up all of the values and you're multiplying them and the sum for the kernel is one, well that means that you're just going to get the original color back. The circumstances under which this does something non-obvious are when there's a lot of variation near a point, so if you have one point that's um let's say a very bright pixel and then it's surrounded by very dark pixels, that will end up looking quite bright because the dark ones aren't subtracting off in the way that other things are. Now if you want to think about edge detection stuff there's a couple standard kernels that we might play with for that; so one of the simplest would be what's called a Sobel kernel and in this context it's yelling at us because what it returns isn't actually a single kernel but a pair of them, so I might pull out just one or maybe the second index um and let's - let's go ahead and visualize this kernel. See first of all we've got that the sum is zero so that's telling us something interesting. Let me do my show colored kernel, great so we see it's got positive values on the right and it's got negative values on the left but crucially now if we add up all of those values it comes up to zero so if you would imagine taking this kernel and you put it over a monochromatic part of the image it's going to come out completely black which sort of matches with what we see here where at first it looks like it's completely turning the image black but let's think a little bit more about what we would expect this specific kernel to do. So if we've got our image in mind of taking you know a specific little grid of values and what are we doing we're just kind of marching it across the image, what would you expect to be required for this to end up as a positive number? Well if we're adding up some positive values on the right and then we're subtracting off whatever they multiply on the left then if you have some region where it goes from being very dark to being very bright that's going to be where you get positive values. But notice you can also get negative values here if you had a lot of brightness on the left and then a lot of darkness on the right what this convolution would give you is a negative value, so that's important to keep in mind; just because we're displaying these convolutions as rgb images doesn't mean that's always the right way to see it because in this context when we've convolved our fluffy cat with something that occasionally puts out negative values the way that the rgb type is going to treat that is not necessarily going to be displayed in a way that shows everything that's going on. So what I could do instead is take an absolute value of everything happening - not an absolute value of the convolution - so we've got our convolution, take an absolute value of what's sitting there, and that's basically if I take the absolute value of an rgb it's um treating it like a vector getting the norm; this gives us a grid of values, I still want to see it as if it was colors so I'm going to tell pluto hey I want you to think of these as grayscale values and I might even scale them up a little bit just because I'm especially not sure how the youtube compression is handling this all, but this will really just accentuate out all of the points where this Sobel filter convolved across the image ends up non-zero. And what you see is all of the points where there's a lot of micro variations as you move from left to right, kind of like the you know the different color fur that happens on the chest, that's where you start to get a lot of activity and if we take the other thing that this kernel.sobel returned it basically gives us the horizontal version of that okay so as you move up and down, for example right around the whiskers, that's going to be where you get a lot of this variation and that's what's going to make the convolution of the image with this specific kernel something non-zero. So the point here is that you know you can do blurring you can do sharpening you can do edge-detecting type things and you can really get more and more sophisticated as you tweak the specific values of this kernel. It's a really powerful tool beyond just a moving average; many of you may have heard of the idea of a convolutional neural network, this is in a context where what you want is for a machine to learn specific kernels so rather than hand baking one like handing it a Gaussian or handing it a Sobel instead you try to have it learn which kernels are going to pull out some meaningful information from an image so it's a very good building block to add up on top of all of that. Now one thing to point out while we're looking at this Sobel filter is unlike the other ones it's asymmetric so that it looks different on the left than it does on the right whereas a simple Gaussian blur or a box blur it doesn't matter how you rotate it, and you might think this doesn't matter but it does end up highlighting a different convention that you'll sometimes find. So in mathematical contexts, instead of doing the thing that you would expect which is having this kernel actively march across the image, we often think of whatever the convolved term is as being rotated around. So I'll show you what I mean, if we have over here - we've got our kernel, it's marching along the image; what you actually want to imagine the - the picture to actually hold in your head for convolutions in a mathematical setting is to basically flip it entirely around. So if the original kernel had the green on the right and the red on the left really you're kind of flipping it around before you have it march across the image and then do the same thing where you multiply corresponding terms and you add them all up. And at first that might seem like a very strange thing to do, why would the convention in any context work out like that? And indeed in some image processing ones they just define convolutions to do the thing that you would expect. Um but I do want to take a little moment to explain why you sometimes see it this way and why it actually gives you these really nice mathematical properties. I don't know if I'll have uh full time to delve into the details here but we're going to start by talking about something that seems totally unrelated to image processing which is to multiply two different polynomials together. So for this one let me actually pop back to our live quiz and just have you do a little bit of active thinking. So the question asks that you consider the following expression you've got a certain polynomial term whose coefficients are k0 k1 k2 and then we're multiplying it by an infinite series okay just an infinite power series with different powers of x in there and it wants you to just look at one specific term what is the constant for the x cubed term okay so think about that you know choose your answer as you're thinking about that maybe I'll start talking through how you'd multiply this out and again this might seem extremely unrelated to image processing and it's actually um it's actually closer than you might think for reasons that I hope to be able to get into but a half hour is an awfully short time for a lecture I do have to say so k0 times [a0] is just going to give you the constant term any other pair of terms that you have in here isn't going to uh is going to involve some kind of x so if we try to organize things in terms of what their constants are you know we've got uh just simple term for the constant and then if you start thinking about what's the term in front of x well we can take k0 times a1 and then k1 times a0 those are the two things in the product they're going to result in x just to the power of one and you know I think a lot of you kind of know how to do this we're just doing a foil basically between two different polynomials one of them just happens to be infinite and you know the next term is going to be the answer to the live quiz so let me go ahead and pop over there see what most people are saying here so it looks like we've got 92 93 people answering correctly that it's a which basically is just marching through all of the pairs so that the index on the k added together with the index on the a is equal to three and those of you who answered b it's like completely in the right thought it's just that we're multiplying by a quadratic polynomial instead of multiplying by something that goes on forever as well so there is no k3 term that's the that's the only distractor in that one question and so you know you know how this goes we've got various different terms that we could get it's a nice big product a way that you could think about multiplying two polynomials like this that's actually heavy heavily related now to all the ideas that we were just talking about is that I'm going to take a copy of that polynomial and then I'm going to take this quadratic but I want a copy of the quadratic which is turned around so I'm putting the k2 term on the left and then the k0 term on the right we could imagine doing exactly what the little kernel for our image was doing just having it march across where first here we we align the k0 with the a0 terms we multiply them and then we have it march one further and we align the k1 with the a0 and the k0 with the a1 multiply the corresponding terms and add them together and that's exactly what we get for our next term here then we have it march along once more play the same game multiply corresponding terms add them all together and what you see is that the terms in the new polynomial could be described as a convolution between the terms in the original polynomials that we had but notice that in order for that to work we needed this convention where one of the polynomials was flipped around before you do this marching process so because that's how it comes up in a lot of mathematical contexts this ends up being a convention more broadly but it does actually have a little bit of um influence on on the image processing context because there's a heavy analogy here between the values k in some little polynomial and the values of our kernel and then thinking of all of these values a as being kind of like pixel values right except instead of storing them in different spots it's kind of like we're storing them next to an associated power of x now that might seem like a strange thing to do with pixels and indeed we never really do that with pixels but one thing that we actually do do is take a um a slight variation of this where instead of multiplying each pixel value by x we might multiply it by this e to the two pi i times some sort of frequency term and as we take the power of that you get this thing that rotates across and it lets you pull out frequency information from the image now I'm not expecting people watching this to necessarily know about Fourier transforms or Fourier series if you want to know you know I hear there's a youtube channel that covers some of these topics that you could watch some of the videos of but at a very high level I do want to try to finish the lecture showing you something really cool about convolutions and about making them more computationally efficient than you might think is possible and the only things you need to know about Fourier transforms is we're doing something very analogous to power series and then mathematically it behaves very similarly so all we're doing is taking these values and multiplying them by something it's not x but it's something that we have successive powers of and what that means is that if we do that to two different arrays to two different collections of values multiplying the two results corresponds to convolving the original terms so for example right when we take the polynomial that was generated by k and we multiply it by the polynomial that was generated by these a terms we get a new polynomial where each term is a convolution between a and k same thing would happen if we replaced x with something um like this e to the two pi i stuff all of the terms from one if you take this collection of the terms from one what's going to be called the Fourier transform of that thing and you multiply it by the Fourier transform of the other it corresponds to convolving the original terms now this will be a little bit easier to see if we just again actually pull up a notebook and play around with it so what I'm going to do is let's just remind ourselves what the image looked like this was the cat and I've just written a very simple plot 1d Fourier spectrum oh asking me to update my software in the middle of a lecture how rude alright so again I'm not necessarily expecting everyone to know what a Fourier transform is but at a high level I want to be able to describe what information about the image it's pulling out so tom the cat is a rather soft image with not a lot of variation happening from left to right so if I go through and I try to ask um you know is there very very fine detail a lot of high frequency changes happening as we move from left to right there's not necessarily going to be a lot of information there now I don't know why this is taking as long as it is to process all right maybe it had to pre-compile the function for the first time but tom doesn't actually produce a very interesting Fourier transform and he doesn't give a good uh instinct for what information is being pulled out so what we could do instead I'll go ahead and get rid of his uh convolution term there is take a more interesting image so I've prepared one that's just a herd of zebra so this herd of zebra we've got lots of stripes as you move from the left to the right kernel by the way I'm just going to prepare ourselves a Gaussian kernel here let's get a Gaussian that's maybe you know this nine by nine thing if we look at the, this Fourier information associated with the image the only thing I want you to notice is that as you go away from zero you actually have higher values so for example it kind of peaks up a little above 50 here which almost certainly corresponds with the frequency of the stripes themselves as we move from left to right now the reason I bring all of this up is to try to highlight what happens when we take the convolution of this image okay so I'm going to create a new image called conv-m that's going to convolve the image with that kernel which we define to be a Gaussian and as you expect it's a blurrier version of the of the zebras but now if we do the same thing we plot this 1d Fourier spectrum but we do it on the convolved version of the image what we get looks very similar to the original but notice how what's happening around the edges have been pushed down and that's a way of saying all the high frequency stuff happening with the image all that finer detail is no longer there so now it's behaving a lot more like the image of the cat that was soft all around and there's actually something much more specific that's happening here where it turns out that the, the Fourier transform of this Gaussian kernel is another Gaussian curve specifically in such a way that if you have a really narrow Gaussian the Fourier transform is something really broad if you have a broad one it gives you something really narrow and if we just look at the Gauss, the ah Fourier information, just the frequency information and we multiply it by a bell curve; kind of squishing down what's at the outer regions that gets us the same Fourier information, and remember how a little bit earlier I was talking about the computational complexity here, how as we have a really big kernel it requires a lot of multiplications. It turns out, if instead we first translate things to describing it in terms of the frequencies of the image, where convolution is just like multiplication, so we can just multiply the two functions as functions just pixel by pixel, and then we take it back. It gives us a much faster way to do this operation, and given that this is a course about computational thinking I think, you know, there is a certain general principle to be had in that, where finding different ways to represent your information actually can sometimes help you do things much faster. Now here I'm plotting just um a very one-dimensional thing it's kind of marching across the image from left to right, but we could show... what did I write, I wrote like heat map 2d Fourier spectrum, and again I'll share this notebook if you want to play with it. Um we can do kind of a two-dimensional analog, where the information that comes out of this, again I think it has to pre-compile it, is essentially saying what's the level of detail happening both in the vertical and horizontal direction. So in this context you know we get this strange galactic image, and if we apply it to the smoothed out version, the convolved image. Uh... not con image conv image what we see is that it actually pushes down the values that are far away and concentrates us towards the center and just to kind of see the contrast, if we had started off with a much softer image, started off with something like Tom, he's soft to begin with, this, this Fourier spectrum, which again it's okay if it's not crystal clear what it's supposed to represent. All I want you to know is that the farther away you are from the center the brightness of those values corresponds to the level of detail happening. That this is what like a softer image looks like, and as you multiply by a Gaussian in this Fourier space, it pushes you towards this smoother looking spectrum, and that's a much faster operation. So often in the standard libraries for convolutions they don't do the naïve thing when you have a large kernel, instead they do this more sophisticated thing, and again you can see kind of what effect this has had. So here the version of the function convolved that I wrote by the way, it's not doing the sophisticated thing it's just doing the naïve, you have the little grid of values march across and you have them multiply and add as you need, and you'll be doing that in the problem set this week. So I think, oh man a half hour is such a short time. All right, I've already run out, uh but that is the basics of image convolutions and the main thing I want you to take away here, is a simple tool that seemed like it was specific at first, where we were just doing moving averages, if you abstract in the right way. You write it so that rather than just hard coding all of the moving average functionality, you take it out, where this general convolution principle can be applied as its own and we tweak some data, we tweak the kernel, that lets you do qualitatively different things with your data. So in this case, it let us do things like blurring, like sharpening, like edge detecting, and then the other thing is that when you represent your data in distinct ways, so in this context if we were taking the Fourier information taking this frequency information, sometimes that actually lets you do things much faster, even though you're doing, under the hood, well not under the hood like in the abstract sense, you're doing the same operations on the data. One domain can often be different from another and how it behaves. So with that I'm going to call it an end to this first lecture. For the MIT students in the course, we're going to do like a zoom call that's basically the profs just discussing more of this material in person, so you can interact there. As always there's the discord so please do jump into the discord ask questions, have discussion, and I will see you next week, and enjoy the first problem set. [Music] [Music] [Music] [Music]
Info
Channel: The Julia Programming Language
Views: 319,858
Rating: 4.9829898 out of 5
Keywords: julialang, julia, programming language, image processing, machine learning, ai, artificial intelligence, data science, convolutions, images.jl, 3blue1brown, anyone can code
Id: 8rrHTtUzyZA
Channel Id: undefined
Length: 36min 10sec (2170 seconds)
Published: Thu Sep 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.