and the Howard Hughes Medical Institute. And I'm going to tell you today about deconvolution microscopy. So all about the science of how to remove the blurring to get 3D images out of a light microscope. So by way of background, you all know that microscopes are inherently designed for two dimensional imaging. But to do three dimensional imaging, what we do is we acquire a stack of the images, essentially a focal series by taking an image of the sample, changing the focus and taking another picture, change the focus and take another picture. The fundamental problem with this is that each image contains useful in focus information from the focal plane, but the rest of the image is contaminated by out of focus information and the rest of three dimensional sample. There are two general strategies to solve this problem of out of focus blur. The first one is the confocal microscope, which we've already covered and the idea is that one uses a physical pinhole, right here to exclude the light that would be coming from other focal planes and only accept the light from the actual focal plane of interest. Alternatively, there's another strategy and that's what we're going to be focusing on today, which uses computational approaches to take the stack of images that you record in a computer to then remove the out of focus information to generate a 3D reconstruction where everything is in focus. The advantage of this particular method, the computational approach, is essentially what you do is, you put the photons back where they belong so that none of the photons are lost. The difficulty is that it actually requires computation and it requires knowledge of the detailed imaging properties of the microscope, whereas in a confocal that's done for you automatically. But because in this method, that no photons are rejected, that is potentially way more sensitive. And hence is a way more ideal imaging tool for in vivo imaging, for live cell light microscopy. So, as a first step in thinking about this whole method, I want to convince you that this approach actually works, before we put in the effort to figure out how it works and how to make it useful for us. So by way of example there are two images here the first are raw data coming from a pombe cell stained with several different fluorophores to show both the nucleus the cytoplasm and the contractile ring. And then the same shown here, deconvolved where now you see very sharp details for everything throughout the three dimensional volume. So clearly this is a useful tool and as I said, this is especially good for live cell imaging. Okay, so how does it work? How do we think about this? All this is really related, ultimately, to mathematics of image blurring. At the heart of that, is what's known as the point spread function point spread function is what you see when you actually record by the microscope if you're viewing an ideal point object something infinitesimally small in three dimensions. And as we'll talk about, what we see is you get a blurred view that comes out of the microscope. And we need to understand what limits that, shape of that, where that comes from. And this then requires understanding sort of what's limiting the resolution of the light microscope in the first place. And I think as you have already covered it's a combination of wavelength, numerical aperture and aberrations of the microscope and then all of that gets rolled into the shape of the point spread function. So if we know the point spread function and we know the object then we can compute what the image is. And the challenge will be computational, given the image and knowledge of the point spread function, to go back an recover the object. Okay, and that's what we're going to be talking about. So we have to understand where the point spread function comes from, what it looks like, and then how to do this mathematical conversion process. So, basic, most important optical element in the light microscope, of course, is the objective lens. And what we see here is the light going through the objective lens is limited and determined by this angle, alpha, that determines what the angle of rays that can make it into the light microscope are. And the highest resolution then comes from an equation shown there that is dictated by what that angle outfit is and which is essentially related to the numerical aperture. For an oil immersion lens we actually get good resolution and the cutoff frequency, or highest resolution of something, like 0.18 microns. Okay, so if we look at two different lenses a low NA lens, like the air lens 25X air lens shown here. Or a high NA, 1.4NA, oil immersion lens we see that the point spread function is very sharp and narrow, in the case of the oil immersion lens but is very broad in the case of the air lens. And what that translates into is a much lower resolution of the air lens. Now this is all in focus imaging and now we'll talk in a moment about what that means in out of focus light. But first we need to understand what the consequences of this blurring are in a more mathematical way. So one of the key things is to talk about the Fourier transform of these point spread functions, which is the optical transfer function of the lens. So an ideal signal in a Fourier transform, if it was an ideal point, would be a straight line going right across here at 1.0, but because of the finite numerical aperture of the objective lens this gets reduced down to cutting off frequencies and decreasing their amplitude and then as you go to a lower and lower NA this gets worse and worse and worse. So only the lowest frequencies make it through the objective lens. And so what we're going to use is this Fourier transform because it will help give us an intuitive understanding of what's going on and how to correct for this problem through the deconvolution process. Okay, so first just to briefly introduce the mathematics of the Fourier transform, we're not going to dwell on this in detail but what it is, at the heart of it is that any wave front, any shape image can be decomposed into a set of sine waves and cosine waves. And this equation up here shows you the mathematics underlying that and that at each wave, there is a frequency to that wave and a phase to the wave. And you can see in this picture that these different little sine waves start at different places. So this one starts high this one starts low. That's the phase information, the amplitude is how high, how strong this wave component is and the combination of all of these, allows you to mathematically decompose the image signal in an arbitrary way. The important thing is, the optics, the actual physics of the optics is akin to the Fourier transform. And so we're replicating the scattering process that the lens does, and recombining that via the mathematics of the Fourier transform. So by way of demonstration of the Fourier transform, let's first think about trying to recreate a square wave. And we're going to build that up from a series of sine waves of different frequencies. And so I'm using a little java applet that you can easily download on the web. And what I'm going to do is increase the number of terms, so the first thing you have here is a basic red sine wave which would be of the same period as the square wave that you're trying to create. And then as I increase the number of waves adding in, and you can see that at the bottom that each one of these is a wave of double the frequency or the basic frequency times 2, times 3, times 4. As I add each one in, you can see that the red line gets closer and closer to the square wave and builds it up in a very nice way. So we can very clearly represent and arbitrary function it could be noise, it could be anything by using this method. So we're going to turn on the sound now, and see what it sounds like to go from having the primary frequency to then the secondary harmonics, the tertiary harmonics, four, five, six, all the way up to all the harmonics we need to recreate the square wave. So, in fact, the primary frequency is on but because it's a relatively low tune it's hard to hear. But as I add on harmonics it'll become immediately apparent that there is sound, and it will get more and more electronic sounding as we keep adding terms. And so as we go up more and more and more we can see that we do a better job of recreating the square wave but it takes higher and higher frequencies to be able to do that and that adds this level of annoyance to the sound. But the basic bottom line is that we can recreate an arbitrary function so this was done with square waves. But I could just as easily shift to a random noise pattern which looks like this and build that up subsequently by just adding more and more terms and essentially be able to perfectly recreate that from just a series of sine waves spaced in time. So in this next java applet I want to illustrate filtering. Essentially blurring input signal and to do that, I'll likely end it up on the Fourier series I'll start with a random signal, random noise, and we'll do it with the sound on this time. And you'll hear everything, and it will be high frequency noise and I'll filter it more and more aggressively to lower and lower resolution. And you'll be able to see at the bottom here, the impulse response, essentially the point spread function happening and changing with as we go along in the filtering process. So it'll start out with a very noisy white noise these two represent the actual wave form varying as time, and this is just a magnified view of that. And then what we'll see up here will be the frequency response which right now is completely straight across the entire field. And as we aggressively filter it, we'll lose high frequency information successively, and go down to lower and lower. This will be akin to going more and more out of focus in the light microscope. So we'll now do that, and you can see it gets quieter because we're missing the very high pitch tones and then you can see from the wave form that it gets lower and lower frequency. And if you look at the impulse response the point spread function is getting broader and broader as we cut it down more and more and you can see, that as it broadens out we've lost most of the information, all the high frequency information, which might be an interesting structure in a cell is now lost. So our whole challenge is from this lost information and knowledge of the point spread function, how do we recreate what the image should've been? We've seen both visual and audio examples of blurring, and now I just want to briefly go through the mathematics of that. So that we can understand in detail what we're trying to get the computer to not only be able to understand but also to get rid of, or remove, the effects of the convolution. So there's a simple equation right here, it's an integral equation which defines what blurring actually means and what a convolution is. But to really understand it more intuitively I have a few little figures that will help clarify this. And so if we imagine an example like this, where you have a point spread function, which is our blurring function from the microscope and this is what the real object is. Just a one dimensional representation you can see it's got some sharp points right here and a couple here and a broader structure like this and we want to know how that would get blurred by this point spread function essentially passing through the microscope. So what you do is you place this point spread function at every location where there's intensity in the input image so you center it at that location and multiply the intensity by the amplitude of the input signal and then sum all these up. So there would be a blurred function here another blurred function here and that gives rise to these two humps here and here, but then when we come to the square we can put one right here at the very beginning of the square infinitesimally in, we put another one, and another one another one, and another one, and another one. And all these sum up, where those Gaussian like functions all overlap to give this big broad distribution that you see here. And that's the mathematical convolution and blurring of the input signal with this point spread function essentially our one dimensional blanket state. Now in two dimensions, it's sort of the same story if this now is our blurring function, this triangle, it can be an arbitrary shape, it doesn't have to be a nice smooth Gaussian it can be whatever you want. And if you convolve that with this two dimensional image with a point like structure and another square structure what we see is, we get a bigger triangle where the point was, because it's actually blurred by the diameter of this point. It's not an infinitesimal point, it has a finite size and so it keeps the rounded edges of this point right here and then when we convolve it with the square we get a bigger square shape but it still has the triangle appearance because it has components from the original image. Okay, so mathematically, one of the reasons why I was emphasizing Fourier transforms before, is that this actual convolution operation, this blurring operation can be much more simply understood via what's called the convolution theorem. And so what we have here is that the what we observe is the convolution of these two functions And the convolution theorem tells us that the Fourier transform tells us that what we observe, this "k" with the little tilde over it is the same as the product of the Fourier transforms of each of these two functions. So instead of having to do a complex convolution integral we can take the Fourier transforms multiply them together and then come back and we get exactly the same results as here. But because we have this multiplcation, you can immediately see that the loss of high frequency information that I showed you before either on the transform of the point spread function or in the little java applet that gives rise to this blurring. So that a 25X 0.5NA lens would blur out much more than the high NA lens. So, just by way of example everything I told you before was with the lens in focus and so the system is doing as well as it possible could on a 2D sample, and even then you saw we lost a lot of high frequency information. Okay, but now when we defocus the lens we see that the problem gets a lot worse. And so here are some point spread functions showing an in focus, 0 defocus for a 25x lens 2.5 microns defocused, and 5 microns defocused. And you can see this gets really broad and really ripply. And what that translates to in Fourier space, in the Fourier transform of the point spread function as shown here, where this line on the outside that has a zero represents the in focus point spread function and then you can see, as you defocus, you lose resolution and you get less and less transfer and you also get these ripples that come in and all of that are aberrations that degrade the image as it goes out of focus. Now, a 63x lens, the situation is a little better but basically, the same principles apply and this is what one would use for high resolution imaging in a light microscope. Where you see, in focus and then moving down to 0.5 microns defocused and you can see that the amount of information that you get changes drastically from over here going all the way out to high resolution to being broad all the way in to very low resolution by the defocus of the lens. Even a few microns defocus has a profound effect. And so this is what it looks like in two dimensions and so if we had the Fourier transform of our image, for every defocus we could use this to figure out then how we want to multiply it to give rise to what the defocused image is. And so putting all that together in a three dimensional imaging experiment. We have this "X" shaped structure where this is taking essentially a cross-section through all these different defocus steps where you have an in focus image and then you have bits that are more and more out of focus as you go throughout a thick sample. So here is the in focus part and this is the focus direction shown up here and you can see it's nice and narrow. But as we defocus it gets further and further, broader and broader and has all these other little peaks in there that correspond to the ripples that I showed you in the other pattern. Now if we put all this together and Fourier transform it in three dimensions, it becomes very intriguing that this is, what's called, the region of support where the observable information is in Fourier space. So the vertical direction right here, about the center you see toroidal donut like shape, that's missing information, like a missing cone of information that's in the focus direction. And in this direction, we have the radial axis, the xy plane and what you see is, there's no information at all about straight along the z-axis. And the reason is, that at every plane in the sample all the photons go through that plane they may be blurred, but you have the same number of photons at every plane, because they're all just coming up and uniformly going through the sample. And so, as a consequence, there's no information about structure, or very little information about structure along the z-axis. Now this just gives you the region where information is observable but it's also significantly weighted and non-uniform. And so this is an experimental point spread function actually the 3D transform of this image that I showed you here and what you see is that the intensity is very high close to the origin, it's very high right here and then falls off very rapidly. So even though this is the full observable region, in fact, the weights of that information vary dramatically. You can see that in this picture back here where you can see far away on the edges of the observable region very little signal, and you get a huge amount in towards the center. And the problem in deconvolution microscopy is given knowledge of this point spread function this transform of point spread function is to figure out, how to recover the lost information. And so, there are two parts to that the first is this big varying intensity that we see as we go up and down this peak as we go in and out across the image and that we can do by changing the weights of the Fourier space, but the fundamental problem is, what do we do with the missing information? Information that's completely gone and lost by the microscope process, and that's this here alone the z-axis where we don't see much of anything. And so this will require special types of algorithms to try to recover that. And so, again, back to a brief jaunt in the mathematics of all this. Again, so this is the convolution idea, that what we have is an observed image which is equal to the true object we really want to discover, convolved with the point spread function. Okay, and so that's written down here and as we talked about, the wonderful convolution theorem tells us, we can understand this in terms of Fourier transform of the image equal to the object times the point spread function. And so this simple multiplication suggests that we could do a division to then recover the object given knowledge of the point spread function and of the image. And that's what down here, we can say that the object is equal to the Fourier transform of the image divided by the Fourier transform of the point spread function which is the optical transfer function. And this is the deconvolution process. The challenge here that you can think about is, what do we do when the OTF right here is either zero or very small? Because we can't divide by zero. And so how do we recreate that information that's lost? And even if it's small out at the edges of that point spread function that I showed you before. Out at the edges of the point spread function here, how do we deal with the very low signal when there's a lot of noise, either from our detector or just because there are so few photons. And so much of the sophistication in the computation, is really aimed at how do we deal with these problems. But there are a couple different strategies, and I sort of will go through these because you may run across these in actual applications. And when the first ideas of deconvolution were developed, there wasn't much computing power to really do these calculations in three dimensions. And so one of the first strategies was to just not think about the entire three dimensional volume, but just the section above and below the focal plane. And there's this approach known as the "nearest neighbor" approach which tries to deal with that, and is essentially a subtractive one So this nearest neighbor approach is a very simple approach it uses only the adjacent sections to correct the image it's a first order approximation, but it ends up being blindly fast and very easy to compute. And so the idea is is "O" is what we observe what we do is, we take, for section "j" we look at either the section above, +1 or below, -1. And we convolve them with the point spread function corresponding to step of defocus and then we subtract those blurred contributions from the observed image, and then put in a scale factor to make it nice, and that ends up giving a good approximation. And this image that i show here, represents one of the first deconvolutions ever done in light microscopy. That looking at a polytene chromosome in a drosophila polytene nucleus and what you can see is, the difference is very dramatic. Even this very, very simple approach, which can be done in milliseconds now using current computers is quite sufficient in doing a very good job in getting rid of the deblurred information. But it doesn't do anything to try to recreate the missing data, where we didn't really observe anything along the z-axis, this really is trying to get rid of the blurred information elsewhere. And so what better ways of dealing with this first is to use all the available 3 dimensional information instead of just the adjacent sections because all the other planes that you collect will have useful information about the deblurring process and what the image should be. Because it should be apparent that every image even if this is the focal plane, the image you collect up here has some finite amount of information about what the photon distribution should be on this plane and same with the one below. And so by putting that all together, we can do an even better job. But then we have to deal with this problem of dividing by zero and so we get around that problem by a general method coming from signal processing, known as a Wiener filter, where in the denominator where we're trying to divide by the OTF when it'd be zero, we just put a constant in there some small value constant and that then turns out to not only take care of the dividing by zero but it also deals with proper behavior when OTF is small, in terms of signal to noise. And the actual value for this would be based on the noise level in the Fourier transform of the image. And this is a general strategy you can show by linear theory that this is an optimal strategy for dealing with inversion of low signal to noise samples and that works quite well. And there's a variation on this if you're transform function is a complex function instead of a real function, but we don't have to worry about that. Even better than this to try to do something to use other knowledge which is called "a priori" knowledge. Information that we may have about the general behavior of what the density distribution looks like for the object. And it turns out a very, very powerful bit of mathematical information that we can incorporate is really simple. It is just to assume that the light being emitted from the object is always positive. Now of course we know there are no negative photons it's always going to be positive but the mathematics doesn't know that. And in the linear theory deconvolution that I showed here you'll get negative ripples that will surround the area of the object because of the information that's missing. And so, if we can incorporate that information then we can do much better, and this, in fact, allows us to begin to recreate the missing information in that missing cone region in Fourier space. And so, there are hundreds of different variations on a theme here. But they all use the basic same kind of process And at the heart of it is that while the division by zero is badly constrained, right? I mean, it's infinity you don't know how to divide something by zero multiplying by zero is always good so we can always analytically do a proper forward convolution and what we do is, we use forward convolutions to then figure out how to do the deconvolution And so the strategy is as follows we start with some guess we blur that guess with the PSF and then we compare that blurred guess to what we really observed, our observed image And if our guess were perfect, these two would match and we'd be done. But, in general, they won't be because we started out with the wrong guess we take that error and figure out how to make an update to our guess. We make a new estimate to the guess and we keep repeating this and one of the things we can do when we make the new estimate is to employ positivity constraints. So that only positive signals are allowed and negative signals are truncated. It's a simple approach, not only does it work for microscopy, it works in astronomy, it works in spectroscopy you can actually run it out on sequencing gels with DNA and read more base pairs. Any observable system that's degraded in a known way, you can then use this kind of strategy to correct for it. And there are two.. the key thing here is, how do you calculate the correction there are two broad strategies one is where you take the difference between the observed image and the blurred guess right here and you use that in a very simplistic way to just add to the guess to make an image or you could use a ratio between the observed image and the blurred guess and use that to update. If the image at any given point has too low of a signal this blurred guess will be too low this will be a positive signal, you'll add that and enhance the guess for the next round. Or down here, for example, if the blurred guess is twice the signal of the observed we need decrease it so that this will turn into a half then we'll scale down the image at that point. So it's a very first order of simplistic corrections but because we iterate this over and over again we correct for all the nonlinearities and complexities in the convolution. And we can apply these constraints. And the positivity constraint you can do also if you know that the object is only in a given part of the sample, and the rest is blank field you could employ that constraint as well. These are very, very powerful methods. Essentially, pretty much all the commercial packages that do deconvolution that go along with various microscopes, use this kind of approach. There is an interesting variation on this for when you don't really know what the point spread function is. You know roughly what it is, you know roughly, you've got a 63x objective lens that is out of focus, but maybe you have different aberrations in your system and this is what's known as blind deconvolution, where you alternate cycles of deconvolving your image guess with deconvolving the point spread function. So if you have a, for example, if you knew the result, you knew exactly what you had observed what the object was, and you have the observed image, you could deconvolve that, and instead of getting the object, you would get the point spread function. And so you could do alternate cycles, the first one and the other, as a way of solving this iteratively. And if you don't go too far away from where you start if you have a good guess for the point spread function these work pretty well. And here is just an example of a series of, of a HeLa cell that's stained, or the chromosomes are stained with DAPI using these different approaches. Here you can see we do quite well with even simple approaches, so here's the original image in xy and xz. The xz direction is generally more challenging because of all that missing information you could see that even the nearest neighbor does a decent job of cleaning this up, but it's still pretty fuzzy, especially in the z direction. But as we get better and better, we can use the Wiener filter that does a 3Dimensional approach, that does even better and then these iterative methods do even better and in this particular case, the blind method does fine and is comparable to this because we knew the point spread function. But if we didn't, then this method could give better results another problem that can occur is if you're looking very deep into a sample you're imaging a thick sample, say 10-20 microns then another aspect of the optics comes in to hurt you, that is that the objective lens was really designed only for imaging, so if this is your cover slip, imaging right at the surface of your cover slip and not deep into a sample. And to image deep into the sample the problem is that the index of refraction of the sample generally doesn't match that of the glass or the mounting media or the objective lens. And so the point spread function changes throughout the entire sample so you can't just do a nice simple calculation. You can minimize this problem quite a lot by adjusting the index of refraction matching media to try to balance all this out but for live samples, this is still particularly challenging and has to be worked out. Okay, so the other challenge with these computational approaches as you can imagine, is what happens with very low signal to noise. And this becomes critical with live samples where you're trying to image a sample first of all, you want to record for a long amount of time you don't want the fluorophores to bleach. But also, much more important, you really don't want the light itself to damage the sample. And it turns out, that recent experiments have suggested that samples are way more sensitive to phototoxicity than we ever imagined. And so, here's an example of an experiment done recording live yeast in the light microscope and looking at just the cell cycle time what is the doubling time for the cell as a function of how much light you're giving it to observe. And it doesn't even really matter what was labeled but you know, certain fluorophores will be worse than others and so, if you start out essentially an unperturbed sample with basically no light you get, in 24 hours, you get doubling cycles, 5 cell cycles. But as we crank up the light intensity then it gets worse and worse, and if you get too high of course, the cells are just dead. But this one here, is meant to be this is what's typically used for normal fluorescence microscopy under in vivo conditions. And you have to go down factors of 100-1000 in intensity before you get to a really unperturbed sample and obviously some biology still goes on at this level people have done an awful lot of live imaging. But none the less, it can be very severe phototoxicity and stress, and reactive oxygen species that are induced that are affecting the biology the problem is, as you go to these low light levels you can't see anything. And so here's an image of something that was a point object turns out it was a marker on DNA, the lac operator proteins bound to DNA. And as you'll see, as we go down in intensity 1000-fold, you can barely see that that's there, and as you go down further, there's no hope. And the problem is, to then extract out all the out of focus information in cases like this become extremely challenging. And the authors of this particular paper said at these dose levels essentially even with current state of the art deconvolution methods it was hopeless to do any deconvolution. And it was very difficult, under conditions like this to try to do any tracking of the dots as they would move around the sample and that's a particularly easy case, to track a dot. And so, a number of labs, including our own have been working on developing new deconvolution methods that help deal explicitly with the noise problem. And I won't dwell on the mathematics of it so to say, this is our normal convolution equation this is the blurring function, blurring a guess and this, "f" is what you observe and this is the current guess and what we talked about before was to optimize the guess to minimize the difference between the blurred guess and what we observed. In this new mathematics, what we do is we add another term, called the "regularization term." which is meant explicitly to deal with noise And there are a whole lot of different strategies to do this the one we've chosen is based on entropy reduction and entropy regularization but there are a whole ensemble of these that people are working on. But just by way of example I want to show this new method, ER for Decon compared to a couple of existing methods just to give you an indication of what is possible not so to say that ER-Decon is the best but there will be new flavors of this, being developed by many labs in the future. And so this is a yeast vacuole imaged, this is the Raw in the xy plane and this is the xz And these are the deconvolved results from three different methods And you can see, this one looks better, but all of them clearly show that it's a round vacuole and this elongation in Z is because of the missing cone of information but nonetheless, you can tell what the shape of the object is. But if we go down, if we crank down the intensity, 60-fold roughly you can barely tell that there's anything there and in the xz direction, it's really hopeless. But when we look at what these methods can do that this new regularization approach really gives rise so that you can see that there's still some shape to the vacuole both in xy and z. Whereas these other methods kind of fail miserably and to me, this is an indication that the potential to solve these problems which will be critical for doing long-time in vivo imaging and getting really high-resolution detail. And especially in one of the things we also talked about in this series, is trying to do beyond the diffraction limit imaging super resolution microscopy. And the coming up with mathematical strategies like this will be critical for being able to go to those higher resolution images and still have it be compatible with live cell imaging. And so, I'd just like to say that it's very clear that deconvolution microscopy is very powerful tool for biological imaging. It's a great counterpart to confocal imaging and spinning disk confocal imaging and especially useful, I think, for live samples.