Compressed Sensing: When It Works

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome back so in the last lecture we talked about the compressed sensing problem and how under some circumstances you can get away with measuring far fewer measurements than your full system so in let's say that X is an image which is sparse in Fourier space you could get away with measuring a massive subsampling of the pixels in X so maybe only 10 or 20% of the pixels in X and you can infer what the sparse astir s that satisfies this measurement equation is and once you solve for that sparse vector s the sparse vector of Fourier coefficients you can inverse Fourier transform and reconstruct your full image so this sounds kind of like magic it's not and I'm gonna walk you through conditions on when this will and will not be possible ok so just to remind you this is kind of the picture we're dealing with here we have some measurement matrix C we have some Universal and coding basis sigh I'm gonna think of this as a Fourier or a wavelet basis but it's a big n by n matrix hopefully orthogonal unitary that kind of nice properties of sy and s is a sparse vector that is basically our full image Fourier transformed or wavelet transformed ok so we're trying to solve for s we know sy everyone knows besides just the Fourier basis we know C we know which pixels were measuring and we know Y we know the value of those pixels that we measured and so what we're trying to do is solve for s the sparse vector out of all there's infinitely many vectors s that satisfy this underdetermined system we're trying to find the sparse assess that satisfies this using an l1 optimization and I'll just write that up again we're going to try to find the minimum one norm s such that Y minus C sy s is either equal to 0 or less than some epsilon I'll just say less than some really really small number because that's more general okay so we're basically trying to find an S that almost approximates almost equals almost satisfies this equation even if there's a little bit of measurement noise that's what this says and we're trying to find out of all of those infinitely many S's we're trying to find the one that is the Spahr cyst which is going to be achieved by this one norm regularization okay that's what we're trying to do now I'm going to walk you through what are the conditions when you can expect this to work and not work okay so the first really really important condition is something called incoherence so what we need is si to be incoherent sometimes I feel incoherence C needs to be incoherent with respect to sigh okay this is a really really important condition I tell you exactly what this means but basically what it means is that I need the Rose of C to be kind of orthogonal to the rows of side to the columns of psi if my rows of C are really really parallel to any columns of sy then they're essentially only measuring that column and they don't give me information about the other column so remember the trick not not the trick the the big point here is that sy as a universal basis that could encode anything this s could be the Fourier coefficients of a picture of a mountain or a dog or a human face or a coffee cup this is a very expressive library and if my cup my rows of C are two parallel with any columns of sy then essentially those C's are only measuring those columns of sy and we lose the full generality of the library so C must be incoherent with respect to psi which means that kind of in words C cannot be - oops so in words what this means is that si cannot be two parallel to any columns of side to side so si cannot be two parallel tips I okay and I'm gonna give you some examples of good measurement matrices and bad measurement matrices to walk you through this okay so these are all different types of good measurement matrices that you could use that would be incoherent with respect to this basis sigh okay so our example of random pixels or random kind of point measurements here is a good one you could also generate this C from a Gaussian normal distribution so like a just a every pixel is randomly distributed that's this one you could have kind of a Bernoulli distributed where you basically flip a coin on every single entry of Si and it's a 0 or a 1 you can also have sparse or masks like a kind of a sparse Bernoulli mask here and the point is that all of these have some randomness built into how you're measuring okay and that's really really important because if I don't have randomness built into this measurement matrix C then I'm probably going to be measuring certain features in psy more than others these random measurement matrices are with high probability kind of sampling the full column space of psy so that this this vector Y has information about s no matter which columns S is exciting okay so these measurement matrices have to excite all of the columns of psy unless if you want s to be you know any signal reconstruction okay so that's really important is that you need C to be incoherent and generally this is achieved through randomness now I will point out that although mathematicians love these Gaussian measurement matrices because it's easy to prove theorems these are not very practical in a lot of engineering applications like I don't know if it's easier to take a megapixel image and take Gaussian random linear combinations of all of the pixels to create all of the entries of Y that actually might be harder to generate this this Gaussian measurement these single pixel measurements are extremely engineering relevant because in many situations a single pixel means a single spatial measurement so if I'm measuring the ocean a single pixel is measuring at one point in space and so I personally like these sparser kind of single pixel measurements even though there may be not as mathematically as nice or as robust as these these Gaussian measurement matrices that's just kind of an aside and I'll also point out that single pixel measurements kind of Delta functions in space are almost optimally incoherent with respect to Fourier basis remember that the Delta functions excite all fourier frequencies and so these little Delta function measurements are actually really really good when you pair them with discrete Fourier transform or discrete cosine basis okay so that's also really cool an example of a terrible measurement matrix would be if I literally just took columns of sigh and transpose them this would be a terrible measurement matrix you can see I just picked columns of sigh and transpose them and I'm gonna walk you through why this is such a bad measurement matrix so there should be an equal sign here y equals C times bisaya times s if I measure in this way C times sy is only going to excite those columns of sy that are parallel to these rows of C and so this is literally the theta matrix all of these matrix examples by the way are actual data and actual optimizations that I've run to find you know these are actual actual matrices so this C times this sy equals this theta ok and you'll notice here that this theta is not measuring any of these frequencies here in s okay it is an identity for these super high frequencies it's not measuring any of the low frequencies of s so the way I think about this is that this C matrix is measuring this portion of the Fourier transform so it's able to pick up this frequency here but it's not able to get any information about these frequencies here because it's not exciting any of these columns of psy okay so this is an exceptionally bad measurement matrix because it is very coherent with respect to psi okay I hope that was somewhat coherent the basic idea is that we want our rows of C to be not parallel to any columns of site we want them to be a linear combination of all of the columns of psy so that each of these measurements Y is telling us information about all of the potential entries of s okay good the other piece of the puzzle here is how many measurements we we need to take so this vector S is called K sparse if it has K nonzero entries okay so if this thing has five nonzero entries it's five sparse or K sparse with K equals five and so I have to measure more than K to find those K nonzero Fourier coefficients and so my number of measurements my number of measurements and sometimes I call this P P is my number of measurements it has to be proportional or on the order of K log n over K where K is how sparse my signal is and is the original dimension of my my full dimensional measurement you know this might be a million if it's a million pixel image okay and so what this tells us is that the more sparse my signal is if K is like two I can get away with not that many measurements P is gonna be on the order of two x log 1 million over two and you know log of a million over two is like six it's not bad okay so this is small if K is small but as kbecque gets bigger and bigger and bigger maybe this is only 10,000 sparse then this is actually a lot of pixels I need to order I need to measure more than 10,000 log and over 10,000 pixels okay so the scales kind of almost linearly in the sparsity of X of s but there's this log and over K and notice any time you see an order here you should be asking yourself there is a constant out here so this actually equals some some constant times K log n over K where this constant k1 depends on how incoherent c is with respect to psi so this is related to C and sy and essentially to their inner product how incoherent or how coherent they are okay so better measurements C allow me to take fewer of them okay but as a back-of-the-envelope for the best measurement you can think of let's say Delta function single pixels for Fourier this K is gonna be at least around 3 or 4 something like that and so for example if I had a megapixel image let's say I had 1 megapixel image so N equals 10 to the 6 and most images let's say I have in a 10 to the 6th image I need to keep like I don't know this is a little dangerous to do live let's say I need to keep 10 to the 4 sparse that's and this is really low it should be a little higher than that maybe closer to 10 to the 5 ok then I'm gonna need to take at least some constant 10 to the 4 times log of n over 10 to the 4 that's log of a hundred and you get to decide if that's log base E or 10 I don't really remember times this constant I said was related to the incoherence but it's usually about I don't know five right so this is if it's log base 10 of 100 that's 2 times 5 times 10 to the 4 is about 10 to the 5 okay or 10 percent of our original image so if I want to keep if I want to keep 1 percent of the original image I need to measure 10 percent of the pixels randomly ok this is super back-of-the-envelope you I need to measure a little more or a little less maybe your signal is not actually this sparse maybe it's more like 5% in which case this would go up a lot okay but that gives you just an idea of how you would work this equation to get the number of measurements that you would need to take okay okay good so that tells you kind of what types of measurements you need to take how many measurements you need to take I will point out and this is really important there was a big advance in applied math 15 years ago that allowed us in to solve this system of equations for the sparse assess it is now possible okay without a brute force combinatorial search it's it's now a convex optimization to solve for this sparse assess if these two conditions are satisfied okay and if these are satisfied then you get something called the restricted isometry property the restricted isometry property and what that means is that C times sigh this big measurement matrix acts like a unitary matrix like an isometry matrix it acts like a unitary matrix on sparse vectors on sparse vectors s and remember that unitary matrices will rotate pairs of vectors but it'll preserve the angle between them and kind of the distance between them and that's exactly what we want when we're doing this compressed sensing problem is we want this C time sub site to kind of preserve the geometry of sparse vectors and if these two conditions are satisfied then there are rigorous mathematical results saying that you have this restricted isometry property which means at least for sparse vectors for K sparse vectors s this measurement matrix acts like an isometry or a unitary matrix and so kind of the gia geometrical information in in pairs of vectors here is preserved in pairs of vectors here and that's that at least gives you a flavor for the mathematic underpinning why you can do this optimization but I will point out that just because you can solve this system of equations doesn't mean it's easy this is extremely extremely expensive to solve for this s given these measurements why especially for a large megapixel image I mean at least when I used to do this this is kind of supercomputer type computations and there are better and better algorithms coming online to solve this but this is by no means cheap okay it's possible but it's not it's not in expect it's not it's not inexpensive so when would you want to use this well there are some applications where measuring the full resolution X is really really expensive and speeding that up or measuring less of this might save lives that's a great example of one you'd want to use compressed sensing okay so for example one of the classic examples of when compressed sensing has been used is for infant MRIs in in hospitals okay so if you want to get an MRI of a really really small baby they're moving around all the time you can't tell them to lay still because we're taking this measurement so you have to sedate them okay and if you take and sedate that child for 2 minutes to get this full MRI that might really hurt that sick child that might be really unhealthy or have bad health outcomes when you're sedating sick infants for a long time so you can collect this full image X ok so instead if you could only sedate them you know much less so that they only have to be still for 30 seconds and you can collect a quarter of the information and X but you do it randomly you're willing to pay that supercomputer price to solve for s and invert to get the full image if it means that you're you know maybe saving infant's lives ok so it's very expensive but sometimes it's worth it to measure a dramatic subsample of the measurements and infer what the nonzero coefficients were other examples you just literally can't measure the full resolution X and so you try to design some randomized measurements Y where you can infer the nonzero Fourier coefficients okay that was a lot of math we're gonna talk more about this and I'm gonna help you build more intuition for why this l1 norm is so useful in this compressed sensing problem for promoting sparsity all right thank you
Info
Channel: Steve Brunton
Views: 14,290
Rating: 4.9828691 out of 5
Keywords: Compressed sensing, machine learning, robust statistics, Fast Fourier Transform, FFT, Derivatives, PDEs, Fourier transform, Fourier Series, Fourier analysis, Hilbert Space, Complex analysis, Wavelets, Data science, Linear algebra, Applied mathematics, Compression
Id: hmBTACBGWJs
Channel Id: undefined
Length: 17min 46sec (1066 seconds)
Published: Fri Oct 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.