Deep learning in Google Earth Engine with Jake Wilkins

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome to the satellite image deep learning podcast I'm Robin Cole and it's my pleasure to present another technically focused episode in the series in this episode I catch up with Jake Wilkins to learn about deep learning and Google Earth engine Jake has been building commercial Earth engine applications for the past three years and in this conversation he described the pros and cons of several approaches to using deep learning models with Earth engine I hope you enjoy this episode hi Jake welcome to the podcast how are you good thank you Robin how are you doing yeah pretty good thank you looking forward to getting stuck in and learning about Google Earth engine and deep learning in there uh do you mind just giving us a brief intro to what Earth engine is and then we can take it from there yeah sure so Earth engine is the Cloud solution offered by Google uh kind of a competitor along the lines of you know Microsoft planetary computer and it's all based in the cloud perfect kind of for your big data uses you know landsat Sentinel to that kind of thing and we're digging into that today I expect absolutely and you're you're a bit of an expert in machine learning in Earth engine what are the options there then yeah of course so let me just start screen sharing because I can have a quick look at that perfect yeah so first of all everything that we talk about will be at this link which should link you to a Google doc which has everything I'm talking about so first of all let's have a look at machine learning itself uh in that's actually implemented in the platform so machine learning and Earth engine itself is kind of offered in a few various ways you've got the traditional methods so this is supervised unsupervised and regression and these are implemented directly within earth engine's kind of data types and and handled on-site on the server side so there's no need to kind of bring any data across them to the client it's all based on there and the benefits of that is that it's quite efficient it's quite quick so you were talking about uh Earth engine offers a few different apis but the on the Fly API is requests less than five minutes and this is the sort of thing that will run in less than five minutes it's it's quite well implemented unless you've got maybe big data sets and then outside of those existing options you would have like what we're interested in which is the Deep learning applications the more advanced applications that's kind of done mostly outside Earth engine but it we they kind of cheat a little bit so it's still in Google Cloud you can still do everything server side that's the last option that we'll look at but it's also the most complex and it's a bit of a faff I would say it is a bit of a faff but um yeah I'm happy to kind of start having a look if you'd like yeah let's go for it perfect so I think first off it's going to really matter to people who are looking to do this on pricing and how that's done so I think it is still free for those who are doing research applications and non-common social use cases but Earth engine has now gone commercial about a year ago so commercial applications you are now charged and you are charged in various ways so there's three pricing pricing options at the moment so basic professional and premium and this does affect how you might go about implementing a solution right you know if we're interested in the batch system where you can run stuff for 10 days it's going to be quite important to you that you don't use too much of that during deep learning so it's worth having a note of that while we're going through this so to start with the first option which is the batch option now batch is kind of the simplest because it probably ties into what people are already used to using so geotiffs okay and and that makes it nice and simple so if we start with in this case we're looking at kusheng Wu's uh segment geospatial uh a package alongside our GE map and it's really dead simple in this use case so first off obviously install packages import them and this here is our first Bit of Earth engine code so we're looking at naip which is an agricultural image collection uh in the US it's quite high resolution I think around one meter and we're filtering for some dates filtering for some bands and then getting the very first image so this will be quite early in that image collection and on a particular point which is here and if I visualize that just using like a WMS URL and that's our image so as I said we'll have a look at the batch system so the batch system you can run tasks for up to 10 days I believe on the batch system it's at a cheaper price per compute hour and it's really for those long-running jobs maybe something that's quite an expensive operation as in it's going to take a long time or if you're not too worried about speed and you want to save some money the batch system is kind of your go-to and you can see here I've I've created that image um and I've saved that in this case to my drive my Google Drive yeah okay yeah so for a commercial application you might want to put it into cloud storage which is also a viable option or you might want to put it into an asset which is Earth engine's own internal storage so that means it's in their format you can then run stuff inside the app with that asset so you can kind of do a pre-computation step and then save it as an asset sort of thing and save it as your own image collection but in this case multiple different ways of getting the data out essentially but they will Google ways as well as I can see you couldn't put it onto Amazon S3 uh from this API no no so you'd have to probably push it to drive or press it to storage and then to S3 yeah to a bucket so yeah you can see this creates quite simple to create a batch export run that this is how long it took roughly um used up two seconds of battery ECU so considering I think we have 10 hours on the most basic plan not that expensive I mean in this case I've not done anything I've not done any Composites I'm literally just exporting the image um people who use collab will be familiar with that just mounting my Google Drive and then pointing uh Sam Geo to that so we're using the segment anything model uh and I've got a location which is my geotiff and there we go we get our our nice segments out of that our segmentation so that's kind of a really basic look at how you might do it for instance that's probably quite good for someone who maybe already they have a pipeline set up but they want a way to kind of do the big computation jobs like your Composites or your cloud-free mosaics in Earth engine and then export it out and run the inference locally all right and then presumably you could then import that mask back into the Earth engine and display it there yeah yeah so there'll be I think in the next one there is a little bit of a code sample on how to bring it back into Earth engine uh so you can upload geotiffs back into Earth engine that way so the next method of kind of extracting data out of Earth engine is the rest API so this next notebook is based on presentation that Kell Market did from Google scaling with the rest API as I said there's links in that in that document that's got a great YouTube Talk he runs through all of this in like brilliant detail but we can go through uh the very like key points here and because this is actually quite an interesting look at things this ties in well with x-ray and things like that which probably interests people who are doing that kind of deep remote sensing stuff and we're also using this package which Kell Market wrote himself I think he wrote this during his PhD before he joined the Earth engine team and it's essentially kind of a wrapper for the Earth engine rest API so rather than having to make requests yourself and push it into x-ray he's built that up himself which is really nice right and so the first bit is this is kind of Earth engine this is like your bread and butter Earth engine code it's quite basic in a sense we're looking at Cambodia uh we're taking some landlord images and doing a cloud mask and also adding some extra bands and I think the end goal of this notebook is to create a deforestation monitoring algorithm so what he's first doing is he's grabbing some before and after data of a particular area of interests uh and creating that and exporting like I said pre-computing it as an asset so this is going straight back into Earth engine and you can see that here so ee batch export to asset so that's staying in Earth engine we're not getting out any uh egress on that but what it lets us then do is add in an extra band so are you familiar with the Hanson Forest data set not personally no so so this is a big one that's kind of Google's brought in alongside working with Hanson who's a researcher it's uh essentially a historic look at landsat data for about 23 years and it's adding deforestation bands and lets you monitor those deforestation uh so it's got a pixel where it would say in this year we lost this pixel to deforestation and so what we're doing is taking a before and after image grabbing the 2020 loss Forest loss data and we're going to grab and start sampling that as pixels so that's that's a deep learning data set or it's just a segmentation data set it's a deep learning data set um I don't think the model is is public um they've the one note I would say about it at the moment is that they made the 2001 to 2016 data with an old model updated the model but I've only run that from 2016 to now so pre-2016 you'll see and two now there are differences uh where the algorithm obviously got better so that's just a heads up about that one data set uh because it's tripped a few people up I think it even tripped up the guardian at one point sorry just to clarify so the Earth engine itself has the model deployed inside and you can use it uh no so it's not no it's a pre-computed data step okay so they're not doing any actual machine on demand okay they've just run it that when stored the inferences in Earth engine yeah exactly and that's why you find pre-2016 that's not so good because they haven't re-run it on that old data yet I think that is a plan for version two uh but at the moment we're on 1.10 and another example of a data set the Earth engine also gives out which is a machine learning algorithm but it's run you can't inference in Earth engine it's pre-inforenced and loaded in is dynamic World which is a land cover data set based on Sentinel 2. um and that gives you near real-time land cover data uh but they're running that model outside of Earth engine as Sentinel 2 gets ingested into their system so that's how they're running that okay so in this case uh we've got a before and after Lanta image and alongside that we've got pixels which are Forest loss so what we're going to do is we're going to take a sample of that data uh and and Export that as a table um and what we can also do alongside that and this is where we get into the kind of the meat of the rest API stuff so what we're doing here is we take our bounds which were Cambodia I believe uh take the bounding box and split that up into sizes of 500 by 500 pixels with a resolution of 90 meters so and then we take the we get a number of x's a number of y's based on that and we create something called a domain and a domain essentially is made up of a bounding box and the number of pixels that you want inside that bounding box and what you essentially what you get is a block of domains that we can then make let's say a hundred of those domains and request them all at the same time using um multi-threading and and request those through and get them through as X-rays and so that's a really powerful tool because you can imagine with that obviously in this case we're requesting all of the tiles to the same compute point but you could actually just generate those URLs and put that on a distributed system uh where and where you could then get it to run through I think tensorflow has generators I'm not sure what the equivalent is in pi torch where you can basically say if you need more data look at this URL and it'll pull that data yeah so we can pull that through and this is just a quick look at uh randomly visualizing some of that but essentially what we've now done is we've pulled through image tiles with x-rays and samples where we know at specific locations where deforestation has occurred and we're building up I think in this case a very uh a basic psychic learn logistic regression model but you can imagine in this case you know putting any deep learning application here yeah um build up that model um people will be familiar with this your X's uis and what we can then do is we then have that model and we can now run through our blocks and and inference on those and so because it's all in x-ray we've got x-ray blocks it's quite simple from that point run through run inference on those and afterwards we can save those as cogs and once we've saved them as cogs so this is just a basic script for running through and saving these as cokes from x-ray using Rio once you've done that you can then start running through and this is how you'd upload back in to Earth engine as a cloud optimized geotiff backed asset so in this case what we do is we're uploading it to a Google storage bucket and then we can actually Point Earth engine to look at that bucket and say hey I've got a bunch of geotiffs in this bucket treat these like an image collection and now I can then use it as an image collection natively in Earth engine right so it's a bit of a roundabout way uh because you're taking that data out running inference and putting it back in but it really depends on what application you're looking at right you know if you just want to pre-compute all your data in Earth engine and then do like the easy bit which is your inference on your side you can do and you can kind of plug and play as I imagine many people have already got systems set up yeah and they may just want to look at Earth engine to do yeah that's that's like I said generate landsat Composites for 40 years realistically you know spinning up all the compute downloading all this landsat imagery be a massive pain whereas in Earth it's quite a simple task to put together yeah yeah I've had a similar workflow using a stack API where you you get you know queries on it and then you put that into X-rays and you do your processing and then you generate new you know objects items whatever and then presumably but you know you can put them back into that catalog as well so this is the mental model I've got in my mind and the Earth engine is quite similar to this little cataloging system that holds all the data will respond to queries but not necessarily it's a place where you could endure your your number crunching that that's going to happen somewhere else yeah yeah exactly and I mean they've got literally pet bites I think it's something like 50 60 petabytes I think the number keeps getting too big that they've forgotten to update the the number on the home page but it's probably approaching 100 at some point now um so it's an impressive catalog it has got a simple API which is nice you know for the basic tasks it's quite easy to do stuff so it really is yeah your big number crunching your batch exports it's a good place to do that yeah and so finally let's have a look at what I would say is kind of the most complex model the most tied in as well to Google cloud and probably you know the best performance as well in that case uh but this is a faff I would the personally I would I would say this is going to be you would have to be looking at building I think a Google platform only solution to find this kind of to be super super up in your Realms of Interest okay and it's the way that you can run the model in the Google AI platform and I should say it's a little bit confusing at the moment because Google's obviously having a massive AI Drive and uh they've renamed Google AI platform to vertex platform and but what you're going to find is that there's going to be lots of places that still say AI platform but when you actually try and find the AI platform you're going to get a sense of vertex so I imagine that a lot of this might be likely to change in small ways but I imagine Grand General overall scheme this should roughly be as you expect it to be and you can see in this notebook which is made by Google this is Google's own notebook still refers to it as the AI platform and the function calls are still to the AI platform and the rough infrastructure for the way that this works and before I go through like the the nitty-gritty of this don't book is that the you've got your tensorflow model that you've trained outside of Earth engine most likely or maybe with some Earth engine data and then once you've got that tensorflow model you can upload that to the Google Cloud platform uh at vertex AI which will handle your model for you and it will also handle inference and inference scaling so you know you can distribute that model across maybe 10 20 compute nodes and then what you will do is you send data directly from Earth engine and Google is managing this for you so it's piping that data for you manipulating that data so it's in the right format it's in the right tenses puts it through the model as tiles and then brings it back into Earth engine but it's doing that without you having to handle that piping but it does mean that you have to do quite a bit of work before like uploading that model to make sure that your doing it in the way that they like and in the way that's going to work for Earth engine yeah I mean that sounds like one of the most sort of practical approaches once it's all set up or deploying yeah and of course with tensorflow you can do not just deep learning but also like decision trees and other kinds of models so you could just have this as your approach for doing machine learning on uh Earth engine and you know that would be flexible yeah yeah exactly and then you could you've also got um ways to build models in the Google platform without actually having to do this outside of it uh there's uh automl which is a pretty cool platform where you can upload imagery that's pre-labeled and it will handle all the hyper parameters it will kind of create the best guest segmentation model the best guess detection model uh that it can come up with and so there are benefits to being in that platform but I would say this is definitely a very much a commercial application I can't see individual users or academics really wanting to do this if you're doing this just you know having a play about with Earth engine you know you're gonna have to put your credit cards up on Google Cloud platform and you may be reticent to do that yeah well I'm intrigued let's have a look at this then yeah perfect so simple importing packages importing tensorflow and they link to another notebook here which is where they've kind of pre-computed the data for us very similar to the last one which is landsat a uh generating some land cover labels and this instead this time rather than uh Forest loss and what we're going to do is we're going to read that data in read it in and I mean this is probably your bog standard tensorflow sort of workflow you bring it in as a data set and you run through the TF record you can export straight from Earth engine as a TF record rather than as a geotiff which makes this a bit easier um we're going to run through and train the model we do have to adjust the shape so that's specifically for how Earth engine handles I think inferencing over space rather than time you'll have to modify which way your array is or which way your tensor is and we create a Keras model here and then we save that model and so this model is going to get saved into a Google Cloud Storage bucket and then this is the bit which I say is a bit weird and probably doesn't seem to be that logical but essentially you're mapping the inputs and the outputs that tensorflow gives you to the Earth engine outputs and you're doing the same thing here there's a they have a command line tool which you have to use and to actually upload your model and I did have a look at the source code for this and I've linked that in the notebook sorry not the notebook the document as well so you can have a look at that source code it's to do with the rpcs of How It's called they're manipulating that so it's in a particular format um for for us engine and then finally you have your deployment step you've created that model and you deploy that model and then in Earth engine this is kind of how you then call that model once it's uploaded so that was before we've created a landsat a uh Sentinel sorry not Sentinel surface reflectance collection we're going to filter for a particular year we're going to map through and mask that image get the median so this is doing our compositing and Cloud masking and we can visualize that map as well like this and then this bit is like your non-traditional Earth engine thing so in Earth engine you try and avoid as much as you can data types Earth engine really keeps that away from you a lot of the time so you know your images your floats your unsigned you're signed but in this case we have to convert it to an array and that's as I was saying before it's a particular format that the AI platform expects and then you create this Earth engine model object and as I said this is currently from the AI platform this is synonym vertex platform and you create your predictor and you're essentially you're saying here you're defining you know where the model is what version of the model to use and your tile size is the projection to use um and your output band so what you expect to get out because obviously some models will out have a single output but if you have multiple outputs you can also Define that as having multiple outputs so you could maybe output a three band image and maybe you output think for example with a lan cover data set you might output your probability per class and then also your your ninth band let's say if you have eight classes your ninth band will be what class you actually think it is and then you run your prediction and then you do this array flatten which gives it back to you and gets it back as an image collection and then you get your uh the maximum probability which you then rename as label and then visualize that so there's still a little bit of data wrangling in and out but considering this is all happening server side so you're not actually running anything locally at this point there is a lot of benefits to that but it's definitely tying you in obviously you're going to be quite tight in that platform I'm also curious about the developer experience because obviously if you're working locally you can you know debugging is going to be straightforward et cetera how does that work with this you know client server Arrangement here yeah I mean Earth engine and debugging is definitely like a tricky Point as someone who yeah I've been doing this for three years and it never necessarily gets easier I would say it's definitely something that's you really have to get your head around lazy computation as well so nothing is actually happening you know the code that you're writing is essentially doing nothing until you request that tile or until you request the map layer and that's one thing to get your head around as a developer and then secondly yeah you have to start essentially just chucking in uh get infos and get info is how you make a server-side call and that's how you break up that request and try and debug it and work out as best as you can but yeah it's definitely not easy I would say that I mean it's a learning curve you start to get a knack for these things you get an understanding of how Earth engine works behind the scenes but it's not it's not easy as a debugging experience there's full guides on the earth engine docs that take you through it and there's best practices and so on but at the end of the day it's it's practice to to get to work okay get stuck and get hands dirty and Earth engine itself is also a bit of a black box right it's not right you can understand the source code for Earth engine yeah yeah so um from my understanding I mean Earth engine has been around for a long time so it predates uh predates cogs predate stack and so they had to come up with solutions for that time and so my understanding is that behind the scenes it's cloud bigtable or at least it used to be Cloud bigtable as the storage format which and they're essentially using their own proprietary version of COG uh before because like I say it predates Cog but they came up with a similar solution uh where kind of like an mp4 you've got your table of contents at the front you can find stuff um I know they use gdal on the back end as well for quite a lot of it and but at the end of the day it's a black box you can't look at that source code yeah and it doesn't provide a sort of you know a standardized interface like a stack API that complements the Earth engine API does it so they have stack but it's not a stack API in the sense of um you can just request images via it because it's still Earth engine code that you have to write they have their standardized API and their documentation which is oh I can't get that up right now which is here and so there's full documentation there on the client libraries which are JavaScript and python the code editor and I will say one benefit of the earth engine is you have this online code as a quite similar to colab uh where you've got a map a little code window direct access to all your tools but it's all in JavaScript is also a negative so um but sorry chat gbt is very good at converting JavaScript to python so if you want to you know make some quick proofs of concept and then translate that to python for your environment you can do that okay that sounds like a fun fun workflow well thank you so much for this introduction into machine learning in Earth engine uh just one final question if you had any feature requests with Earth engine team what would they be oh oh that's a hard question um I'm interested to see how their Vector support improves okay I think Vector support is going to be a really big commercial application that they're going to need and a lot of that is handled in bigquery at the moment but piping between bigquery and Earth engine and back and forth is not necessarily a thing you want to do they recently added in kind of I suppose Vector tiles is what you'd call them but they call them future views which is really nice really helpful to have well I'm interested to see how they improve mostly the efficiency and speeds in in Geometry applications at the moment that's going to be a big one to get the commercial players on board is GIS okay so just just so I understand does that mean right now I've got a model that you know does like building detection and I want to polygonize that and Export that as vectors I currently couldn't do that with the workflows you described no you can you can do that there is Vector there is some Vector support but I would say it's not as good as it could be it's what I'm saying I'm interested to see how they maybe make it a bit quicker because I think the main thing at the moment is it is a bit slow okay all right that's a good one uh let's hope they're listening well thanks again Jake it's been a really fascinating conversation I've learned an awful lot about Earth engine today if people want to follow uh your personal uh you know posts on this topic where's the best platform for them to do that yeah so I'm not writing a huge amount at the moment but I'd love to um I'd say follow me on LinkedIn connect with me on LinkedIn or on GitHub a good place to keep up with what I'm doing is Earth blocks uh you know that's where I work that's where the majority of my Earth engine stuff goes uh check that out uh it's a good place to see it's a good entry level point to Earth engine fantastic I'll put those in the show notes well once again thank you Jake and I'll see you next time cheers
Info
Channel: Robin Cole
Views: 4,797
Rating: undefined out of 5
Keywords:
Id: HMR_2VkDE9s
Channel Id: undefined
Length: 31min 19sec (1879 seconds)
Published: Thu Aug 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.