Super-Resolution at gamma.earth

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome to the satellite image deep learning podcast I'm Robin Cole and it's my pleasure to present another technically focused episode in the series in this episode I catch up with Yosef acman to discuss super resolution with satellite imagery super resolution is a technique which enables transforming an image with 10 met pixels into an image with 1 meter pixels while this method has some Skeptics its potential to improve analytics on the imery is undeniable this was a fascinating conversation and I hope you enjoy this episode hi how you doing excellent how are you yeah fantastic looking forward to getting stuck into this topic super resolution do you mind getting the ball rolling and telling us what it is yeah absolutely so generally uh super resolution has been around for quite a while and the idea is uh taking an image at certain uh resolution and trying to reconstruct as much spatial details as possible from the available pixel data and uh traditionally there has been a few uh algorithms and approaches and then there was quite a significant breakthrough with something that is called compressive sensing well people actually manag to so the the the original idea was that uh you there's very little that you can do in terms of reconstructing additional spatial information from the already captured pixel data but then uh the compressive sensing managed to achieve quite uh impressive results that people didn't expect and then this opened the gate for gates for further kind of uh experimentation and research and then with the emergence of deep resolution really it opened the floodgates and there was now it's a very well researched uh area with some incredible um nearly kind of magical results especially in PH Rec construction and some other areas okay yeah so I suppose with faces it it makes sense that you would there's a general structure to a face like the you know the edges of a face Etc do the same principles hold in remote sensing imagery when you're talking about super resolution so exactly so so the the big question in super resolution is how uh information is sparse is your data how well you can build a prior that would uh uh capture as much as up prior as much up prior information about what you expect uh to be able to see and then you just get a little bit more extra uh uh pixel data and you are able to reconstruct uh a lot of the information that are not present in the original data so kind of this is where a lot of uh uh controversy gets into picture because uh people have this U little bit of dogmatic I would say concept of U uh um I think the kind of very simplistic interpretation of information the MH and all of this uh a little bit simplified understanding of things like Nest you know law and shanon limits and all of these things and uh and I think uh uh we are a little bit we need to have a revision of these Concepts a little bit for people understand what the and and again I'm absolutely not saying that these principles and laws are wrong we just need to have a review and better understanding what they actually mean right okay and just for readers that might not know or listeners that Nikos theorum is about sampling and saying that you need to sample double the resolution to be able to identify features but presumably if you have a model and it can somehow detect that there's higher spatial features by reference to previous training data then it then it can reconstruct those so that's what you're saying that it's not not necessarily applicable in this situation is that right I mean the simplest example is that let's say that you have a bunch of samples right and then you have a prior information that it's actually a line okay so you just do linear regression and then you can reconstruct this line with any uh you know number samply frequency you like right yeah so this is the simplest possible example where so seemingly nyquest law here doesn't apply you have just two basic basically you just have two samples right and then you can sample with uh I don't know million samples per whatever at any arbitrary resolution you like just knowing but you need to have a prior that this is a line This is a prior information yeah and uh and the same thing you can do with you can go and make it a little bit more complex let's say that you know that this is a just a one frequency sinusoid right and then you just from few samples again you can do the same thing can reconstruct it with any uh uh frequency you like right yeah so here you start to get a feeling that uh just this simplistic um understand of the nqu law is uh doesn't always hold and the uh to generalize this is that uh the nyquest law talks about dance signals meaning that the that's uh there is an assumption there that there is no prior that all the samples are statistically independent right mhm but there if the but natural signals most of them or all of them they're always sparse is all they are always correlate if they wouldn't be wouldn't me uh vision would be impossible like imagine that uh pixels are statistically independent in any image it just noise right you wouldn't be able to see anything you would be able to detect and so all images are always correlated if you have one pixel value you can say a lot about the next pixel value right so the more you know about the statistics and then about this what you expect to see in this image the more you can actually reconstruct and understand from this image and and the obvious the uh uh example of this the kind of the culmination of this is really pH reconstruction because you you know you have such a strong prior about the Symmetry about the facial structure like uh in the end all the facial features are um very similar there are some there are some parameters you can parameterize this but you can parameterize these faces with very few variables and this is what allows you to achieve this magical amazing results in imag reconstruction and the same absolutely the same thing applies to any Imaging and to satellite imaging in particular it's different uh obviously satellite images is are much more complex uh but still they are highly sparse they're highly correlated uh there are some very common textures for example all the natural textures they're very repetitive if you have a surface of a forest then uh it's extremely sparse uh data in terms of content right so if you so the advantage of the deep learning approach is that you can learn these more complicated priors you know face is an oval but obviously a road is also a line so there's some simple priors but there's also more complicated prior which the models will learn is that correct absolutely if you can learn all this uh we can call them Primitives right yeah this is a primitive of a of a forest canopy texture and this is a primitive of a of a rod texture and this is you you learn all these Primitives and then you like a Lego you can reconstruct the image from this Primitives uh of course there are limitations to it and also there are there's another aspect to it is that U there's more to the pixel values than what we see there are you know there's uh sometimes head bits but if if you have a really raw satellite date right with a reasonable signal to noise ratio you have 12 bits or sometimes 14 bits of information and this small bits you never see them we are completely we're ignoring them most of the time but information is there there's a this little bits they uh store information between uh about the cross talk between the pixels so if you can tease out all of these cross correlations between the pixels you can really uh restore a lot of the information right so it's kind of goes both ways is is to pull in as much information as possible from the available uh data and on the other hand to have any very high like very strong prior of what you expect to see right so in practice the models are trained on pairs of images right low resolution and high resolution and in practice you want to go from the low resolution to the high resolution is that correct this is absolutely correct the challenge is that these high resolution images they don't always or in reality they never exist right if you have a there's this approach that most of the people are U uh trying is that they they take high resolution images they downscale them right yeah and then they from downscaled images they try to reconstruct the yeah but this is uh this is not good because the downscale images are absolutely not don't have the statistical properties of The Originals optically captured data right especially this is true for I mean this is Works to some extent for uh frame sensors but this is becomes completely untrue for other types of sensors such as push broom right push Brom cameras are significantly different the data is if you downscale uh push Brom image uh you just have a something that is completely different animal from the original so if you if you using this approach it doesn't really work right so there's a lot yeah so you've been working on Sentinel 2 super resolution in in that context what was the training data there then well I had to synthesize it I had it it doesn't exist for for the reason that we just discussed but it also doesn't exist for the reason that there is just no other sensor that high resolution sensor with these spectral characteristics so I actually had to synthesize training data and uh obviously using as a source uh high resolution satellite data but uh the I think what allowed me to achieve the results that I managed to achieve is that I have ex actual experience of Designing push Brom sensors and uh and designing hyperspectral cameras and so I kind of uh I think if I'm thinking of what is the skill set that is necessary to build something like this would say it's 10% you know machine learning skills and 90% sensor design and uh uh and remote sensing skills okay so what you're saying is we can't just do a simple resample operation on our high resolution we've now got a pair there's much more understanding of the actual Optical system that's required to do a good uh translation yeah that that that's definitely my opinion again there might be alternative ways to do it and obviously yeah I'm not definitely not claiming you know I'm the only one who is able to do that but I feel what what allowed me uh to achieve the the desire result was this very particular you know combination of of skills okay and also there's a there's a um uh because of what we discussed your data needs to be as close to your real world data as possible and so first that's the reason I decided to focus on Sentinel 2 because this is a uh this is a systematic publicly available uh data set right which is uh there are only two satellites right that collect all of this data they're extremely well calibrated I mean heads off to the designers of this uh uh of the systems incredible uh Satellites with very high signal to noise ratio they have some issues uh that I uh discovered and uh there's there are some chromatic operations that nobody talks about and there are some issues that I had to explore but otherwise these are amazing sensors and I had to do everything like fine tuned for this particular system this is doesn't it's not something that translates you take you know same model and you apply it to lset or something like this and it works it just doesn't work like this it's a specifically fine-tuned for Sentinel tool and if I at any point you know decide to develop uh something similar to another satellite that we need to start almost from scratch yeah one question about s it's got multiple channels but some of them are lower resolution 20 I think even 60 M do all of the channels contribute kind of equally to the super resolution or the high resolution channels doing most of the most of the work yeah it's a it's a very interesting and and not an obvious question it's a so definitely um the high resolution channels uh do most of the work in reconstruction of uh uh um spatial details but also the other channels because of their spectral characteristics and the unique spectral signatures that different textures produce they make this uh you can I mean you can describe the whole process in a very simplified way as one enormous lookup table essentially right you look at the particular neighborhood of pixels right and you say okay this particular combination of different spectral bands and pixel values correspond to this high resolution uh you know tile right so there's a lot of generalization going where it can generalize and but in it like to put it so in a super simple metaphorical way it's en one enormous lookup table okay and in this enormous lookup table uh these other bands are very important because they allow you to you know make one to one uh like relate a particular combination of pixels so relations with different spectral bands to a particular texture yeah that's really interesting uh point and I suppose that will feed into the sort of relative performance if I have four different satellites some with just RGB but some with other bands they might perform equally on some kinds of targets but there might be other kinds of targets where the ex spectral bands contain the information to reconstruct those much more accurately yeah and and this uh brings another point that uh unfortunately multispectral remote sensing is is really underutilized there are in the end you have all this wonderful talk about multispectral hyperspectral all of this you know dozens of companies especially recently that but nobody's using multispectral and nobody's definitely nobody's using hyperspectral I'm not aware of a single real application especially commercial application of hyperspectral and the reason for this is uh first of all you need to calibrate you need to have a very good atmospheric calibration and Atmospheric correct corre um correction to make multispectral coherent enough the spectral characteristics coherent enough to be useful right and the only such satellite satellite is actually Sentinel to even amazingly even lanet is not properly spectrally calibrated so the only uh and again heads off to European Space Agency for doing that and especially you know giving public access to all of this data uh but that's the only satellite that is properly spectrally calibrated MH uh and and then there is a question of resolution the problem is that uh uh everything you see is mixed you have this 10x10 uh meter pixels and a lot is going on in a 10x 10 meter space and so you don't have any spectral signatures of that would be pure enough to do real analysis it's all mixed M and so I feel that uh I mean we are quite far away from really getting the most of out or even getting the least out of multispectral remote sensing there's a lot to be done and I think uh doing super resolution is a is I feel it's a very good step in the right direction yeah the next question I've got what is the kind of practical range of use of super resolution obviously I can go and it's usually measured in scale factors so you can like double the number of pixels quadruple that's a scale factor of 204 right what's the Practical range can I get arbitrarily high resolution from a 10 met pixel or is there kind of a practical limit Beyond which you're not going to get realistic predictions uh great question uh my uh uh experience is that 4X you can really reconstruct pretty well uh any arbitrary shapes but there are certain scenarios where you can reconstruct significantly beyond for X so maybe that's a it's a good moment that I'll that I'll show you a little example of that and um and I'll try to describe what I'm showing so you see my screen yeah yeah so so so these are Orchards right and and some of these Orchards the this this plant lines are 2 m across okay so uh so you you see here indivi individual plant uh lines you see individual trees you see missing trees uh all of this is is completely invisible in the original yeah and this is definitely uh uh this is definitely like really properly 10x increase in resolution useful 10x and because you see individual missing trees right something that like if you see here you see absolutely nothing in the original image yeah and this is useful and the reason that it's again the reason that it works is because these are textures are highly periodic yeah so you can kind of you can uh figure out the texture from a large very uh uh informationally sparse right like really structured neighborhood right so uh so depending on the on the type of surface you can go to um to 10x super resolution in a useful practical way yeah but but it's it's again it not it doesn't apply always it doesn't it depends on really uh depends on the structure yeah I suppose that that leads us up to the controversy of super resolution that it generates lots of features some of which will be accurate some of which will be people talk about hallucinated and then the catch is distinguishing between the two of them could you talk a bit on that yeah uh so definitely there's a lot of uh there's a question of good terminology here because obviously Hallucination is taken from this vocabulary of a generative AI of generative models that just uh that are trained purely for generative hallucina purposes like right so you you just they produce a very pretty structured image okay uh but I feel that in this particular use case uh the term Hallucination is a little bit misleading this is an estimation this is estimation as the way estimation is in any uh Computing you know uh Computing task like again you can go to linear regression and ask what would be uh the value why like um at this particular Point X1 versus y1 right based on several available samples and it will give you an estimate right so this is not perfect it's there's it's an approximation of the desired value and the question is is this approximation good enough for your practical application or is it not right so nobody's talking about hallucination when you are trying to do linear regression of particular data set right you don't hear this so I don't feel that it's Justified to talk about hallucination here yes there's approximation here and yes there will be some artifacts and yes this approximation is not perfect and sometimes it will give you result that is the only question is are these results good enough for your practical use and some applications you would say yes some not if if it's a defense you know application where you're trying to direct I don't know artillery I I I wouldn't recommend right to use super resolution for this application but in agriculture where you're trying to estimate how many trees are missing in your orchard I absolutely feel absolutely comfortable to recommend using super resolution yeah that's inter if we just if we just dive into this particular application which you're showing on the screen moment so you're taking the 10 meter and you're in ing it to one meter and you can identify visually there's some missing trees so the next logical thing to do is to try and count the missing trees or estimate the density of those which you would probably train a model for so if you're training the model to directly predict missing trees what's the benefit of this intermediate step where you super resolve uh excellent it's it's an excellent question that I that is a kind of the obvious next next uh uh question for discussion uh and uh again this needs to be tested it's a it's an excellent in this particular it just needs to be done it's just uh something needs to I didn't have time yet to work on this particular use case but somebody needs to test however I have applications where we did test and and this is in U in uh just it's a in um delineation of field boundaries right okay and and uh um uh and and so the question there it's it's very similar question and I have uh endless discussions about this with with potential clients with other competitors uh the question is okay so we use super resolved um fields to delineate field boundaries okay uh so other people would say okay but wouldn't you get the same result if you train a model that would directly delineate field boundaries from original 10 m per pixel Cal data and to the best of uh of we experimented extensively on this and our conclusive answer is no you cannot achieve the same result uh as uh from directly from Sentinel two as you can from Super resol Sentinel 2 and the reason is similar to like let's for a second use again CH GPT as a as an example uh like you would type a prompt I don't know chicken salad right and it would give you this very detailed recipe of a chicken salad right and then you would do uh some analysis on this recipe that is the output of the response of CH GPT okay and the question is okay so all the information comes from the prompt can't you can't you do directly the analysis on the on the chicken salad prompt because seemingly all information comes from there and the answer is absolutely no the information comes from the prior the information comes from this enormous model that learned all the intricacies of Sentinel to data and allows you to draw exactly the field boundaries with all its corners and the uh and the Shadows of the trees uh that can be I don't know 20 M long and all of these uh details that then allow you to apply your very simplified uh field delation model on the result of super resolution now I think that's very interesting in the parallels of chat are quite uh significant as well but it's a it's a very straight forward parallel because that's exactly what uh what super resolution does it it's exactly what shed GPT does and and even the architecture is somewhat similar so it takes it takes a low resolution image as a prompt and it generates high resolution image as a output it's exactly what the model does yeah and just as an observation there's a couple of groups publishing foundational models now but they typically combine resolutions in the image source is like a 30 m and a 10 m or 10 M and A A 1 M so it seems like there's some some extra intuition learned by the model by being exposed to multiple resolutions rather than just a single resolution so I I agree uh there's a definitely there's a lot to be said about the tradeoff between how wide you go and how many different modalities you include in your model mhm versus how uh focused you are it's so definitely there's there's a possibility that multiple resolution will but it's there's a trade of with the size of the model the time it takes to train it and then what do you want to use it for really there's also an argument that you can say that you could indeed produce a model that would generate field boundaries directly from the Reg senent to data but to achieve the same result this model would need to implicitly include all of the prior knowledge of the super resolution model so you could do this but it's just it would make that it would mean that you would just train basically uh concatenate like you train both models together somehow could potentially do that yeah in practice you see the super resolution being used as maybe an intermed y to other an analytics uh and being a better better Foundation or better starting point than the original imagery absolutely that's exactly how we use it now already now this is this is not uh again I another line kind of criticism that I get is that okay this looks nice but is it uh useful my answer is that no it doesn't look nice there are artifacts and in fact it's not it doesn't look nice but it's incred incredibly useful for analytical models and that's what it's uh useful fantastic well there been a really interesting introduction to the topic and uh I think there's lots of food for thought and uh the conversation around super resolution will continue for some time I expect um you're commercializing uh this super resolution approach uh absolutely uh so I have a absolutely wonderful Partners partnership with digifarm where this is already commercialized particularly for delation on field boundaries but they're also selling uh uh directly deep resolve data for for analysis to other organizations uh but I'm open I'm exploring other options as well cool so if people want to follow along your progress maybe if they want to find the article you mentioned looking at the field boundaries I think you publish online right absolutely I'm I'm trying to keep people up to dat with our progress uh so there's a this article medium that uh the most recent update to the model which I really focused on um on spectral Fidelity so the new model as opposed to the previous model it it really well reproduces uh I'm I'm really quite happy with the result it reproduces the very subtle variations in spectral characteristics of vegetation uh uh and uh and actually the next I'll go now back and I'll try to in the next uh release to even improve I'm hoping I know how to do it now to improve the spatial kind of characteristics and reduce the artifacts a little bit but definitely you can follow on medium you can follow on LinkedIn you can write to me directly I'm more than happy to discuss and explore uh I mean the the the hardest part part of the entire I feel remote sensing C observation Community is finding useful applications I right now the situation I would say is a little bit better because there are very few practical commercial applications on of all of this wonderful technology uh so I always super happy to discuss possible practical applications that sounds good I'll put your your LinkedIn and your medium articles uh in the show notes but once again thank you yoseph and hope catch in the future and learn more about your progress absolutely we'll keep in touch thank you rob thank you
Info
Channel: Robin Cole
Views: 814
Rating: undefined out of 5
Keywords:
Id: wlFt2WABjsY
Channel Id: undefined
Length: 33min 37sec (2017 seconds)
Published: Sat Jan 06 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.