Applying Deep Learning to Satellite Images to Estimate Violence in Syria and Poverty in Mexico

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it really is it's it's a huge honor to be here I love coming to this conference I feel like I learned so much and you know I'm probably the least capable our programmer in this room I love coming here and realizing everything I'm doing wrong like I still set working directory as even though Jared tells me every time not to do that so so this is supposed to be a stump speech about my research me on the stuff I'm doing using satellite imagery and machine vision computer vision but I want to convey something that I've learned about it through my research that are relevant for for you guys so I retitled this talk so it was using satellite imagery and machine learning to estimate conflict in Syria and poverty in Mexico I'm gonna call this wild speculation on the future of data science when someone barely qualified with two applications so and I I think really the only the only thing I'm Way I'm going to mollified I think there's two qualifications that are important here is that I'm tall and I'm Jewish so so this is of course is of course from Jared's wedding all right so so I do I do have some qualifications I was hired by Chapman University after I finished off my degree where I teach are a machine learning to MBAs and undergraduates and in fact I have two undergraduate students here whereas Nancy and Andre Nance were they yeah they're back there okay so 200 graduates chaplain flew them all all the way out from from California this is the first time they've been in New York City Nancy has a job but Andre you were still looking for a job is that correct so if somebody is hiring if somebody is hiring an entry-level data scientist I give my word that Andre knows what he's doing so so I teach so I teach MBAs and undergraduates but in my research I mostly work in development economics so I tried to solve problems in development economics and one of the problems that I focus on our data gaps data gaps are problems where we just don't know a lot about the developing world and one of the startling statistics is that from the period of 2002 to 2011 there are 57 countries that just had zero or one poverty estimate and if you're somebody like me who cares about poverty and reducing poverty the fact that we're just not measuring this at all should be very startling and you'll see this throughout the developing world we're just not measuring things that are very important and so the reason why we don't do a good job of this is because it's just it's really expensive like you have to physically send somebody to your summons household you have to interview them for an hour - and you have to figure out basically how much income they have their areas can be located very far apart and so visiting each household is very expensive so even like a very large country like Pakistan you'll survey maybe maybe 20,000 households once every three or five years and in fact in Pakistan there there are places like Jammu and Kashmir or the Northwest Frontier Province where you just never you just never send surveyors there and another another like startling fact is in Nigeria we actually have no idea how many people there are like estimates vary between 200 million and 140 million because every time they try to run a census the government just deletes a bunch of people because they don't want certain places to have more populations like it's just very startling and so so my work is about a lot of it is filling in these data gaps you know one of the more popular data gaps ways to fill this is you know look at satellite imagery so maybe some of you are familiar with this imagery this is what you know if you look at a satellite image mcnight you can kind of tell well which one's North Korea and which one's South Korea it's pretty obvious North Korea is the one without any lights so in case you're wondering and and that works really well for gross just kind of grow statistics but we're looking to do something a little bit more with with with our numbers so so and in particular I'm interested in these like not just like these old-fashioned satellites or what I'm interesting is really like frontier micro satellites so these are there's a there's a start-up called plan it that releases these like small satellites that are size of shoe boxes and they currently have 190 in orbit they have more satellites in orbit than any other country besides the United States and China and so they have daily revisit rate of all of the Earth's land mass and they can start you can start doing some really crazy things with this data so yeah and we're just launching like a crazy number of imagery satellites so if any of you are searching for a dissertation topic it's just probably something that like what I did I was just like I have no idea what to work on so I'm gonna start messing with satellites because there's probably something here so previous work that I've done as I've tried to you generate intermediate features from satellite imagery so I use computer vision to extract out the number of cars under number of buildings so this is a raw satellite image of Sri Lanka Iran it's kind of like an off-the-shelf computer vision algorithm against there I could figure out the number of buildings and I can figure out the number of cars and I threw that used R in order to just pump that through and build a prediction model of the amount of poverty in an area so to push that a little further I wanted to see like could I do this directly from daytime satellite imagery using the fancy new stuff the convolutional neural networks directly and you can kind of tell like like there's a lot so if we look on the left we see that this is a very well-off area in Mexico City so this area has like large streets it has a lot of green areas on the right this is a poor area buildings are more regular they're a little more closer together the streets are kind of crappier and then the poverty rates are are accordingly so as we in the previous work I tried to tell the computer what are the poor areas what are the rich areas so now I'm just asking the CNN to figure out what are the visual features that are the poor areas have and this is like a read this is a really hard cast to do because there's just not a lot of data so we're trying to do this at the municipality level in Mexico and there's 2,500 municipalities so this is this is this is very hard and so a lot of people would say this is impossible so we're going to do transfer learning against this so we're gonna do is we're gonna take like an off-the-shelf CNN model which truthfully I don't even understand this like I understand the end point we're all this is pre trained using imagenet so it's been seen like millions of images it's learned like all the intermediate like weights effect effectively or excuse me Jared those are coefficients right yeah yes okay so we're gonna learn all these intermediate weights and at the end we're just gonna replace whereas this whole network was trying to produce to figure out what like if it was a car or maybe a cat or a dog or hot dog we're gonna replace that last layer and just say try to predict poverty like if you learned all these intermediate things that are useful for figuring out a hot dog or not and now you're gonna figure out if it's a poor neighborhood in Mexico or a rich neighborhood in Mexico and so this part was not done and this was done in in Python and Theano and you had some details on on Mexico so we're using two poverty levels kind of severe and moderate poverty and then you know we do some nice withholding a temperature municipalities and but we're training this off of 900 municipalities these are the only municipalities that are surveyed by Mexico it's suppose to be representative but that's a separate story and so we trained on 800 these and then we're gonna see how performs of the test samples so let's look how performs okay so it's not not a perfect fit so we get our squares between 0.4 and 0.5 but still this is better like imagine in a country where you have no data whatsoever so this is and so see if there's something else we can do so we also can do land classification so we could look at the we could ask ask the to train a classifier and say okay what type of area is this is this a road is this a building and I think this is super cool because we trained this against the planet satellites so this has daily revisit rate meaning that this is every road in Mexico and if you wanted to you want to throw this algorithm against new plant images you calculate the changes and roads for Mexico and you can do this anywhere in the world so I just think this is like like likes loads better than trying to do surveys and then if we include this land clapper classification our model it improves it a little bit and we get these predicted problems poverty for urban areas in Mexico okay so what's a lessons we can take away like I think I think a lesson here is that like using off-the-shelf data like if you were just sitting in your art console trying to trying to estimate something there's this concept in economics like the efficient market hypothesis so like if a data set is out there is it probably somebody who is minded to write their dissertation or to do some analysis so I I think like if you're in a field where prediction you're trying to predict something in the real world and you're using the same data that everyone else has like you're probably not gonna do better than them or better and the way you would really get an edge is by just generating some data that somebody else doesn't have so you probably don't have to go to the lengths that I'm going to but like just try to get some new data and in many cases generating new data or getting some new data is going to be better than using a better algorithm okay so the second project I'm going to talk about is really really recent stuff the motivation here is we have a lot of information on where violence is occurring in the world but that information is biased it's outdated and its imprecise so it's a lot of the information on violence it's recorded either by hand so that there's some eyewitness who says like there was a violent event here in Syria in this location or it's actually recorded from YouTube videos because every like every Syrian army like has like a YouTube video where they post and say we bombed this building and so we actually don't know the full extent of damage or violence because maybe there could be delay where when the videos were posted or maybe there is some bias and just what was reported if somebody was there and if you think about this you know like the point of bombs is that they're supposed to to do some type of damage that should be visible some space this is an area in Homs Syria and you can see that there are some there's some detectable damage from these bombs and so we can utilize some some of the same computer vision technologies and we're gonna so we can utilize that same technology in order to tell whether an area as a destroyed building or not and this is the dream is that you're running this in real time so this is running on daily updated satellite images and like we're like far away from that but we're like inching towards towards that goal so we located 380 training sites in Syria with visual identified building damage and we eventually want increase this to a thousand but right now this just shows through in 280 locations and then we do this process of data augmentation which is like cropping and stretching and skewing the the images that increases this to 5,000 so we have about 5,000 images to Train off of and we're gonna train two classifiers one for this planet like fastly updated imagery and the other one for the Google Maps API okay the labeling process is pretty straightforward we just pull up some some image in Google Maps we're taking 1250 by 1250 images and then the label we just try to figure out what do we think is a is a destroyed area and so anything that's a story area gets a 1 and everything else gets a 0 and the we're gonna ask the CNN to try to figure out some type of mapping between those two some visual mapping the architecture is gonna be we're kind of starting from a place we're using like a building architecture like this is this has been used to detect buildings so it's you know it's it's kind of its fancy sandwich of things that are happening in the CNN and if you do computer vision we like there's there 7 cnn's there is that are there stacked against each other and then we take output for the first the second in the seventh and use that and create a stack feature map to do pixel wise so the end result is we get a pixel that says whether it's the destroyed building or not we do some like advanced stuff because that wasn't working out too well so I talked to my like people who really do computer vision because I'm an economist believe it or not and like they say no you want to use this you net architecture so we kind of like make it a little more fancy all right training notes all this is done in theano it's run off Microsoft is your server where where's mark Heisman who's saying mark you were saying like you're right like like Microsoft Azure platform is really fantastic yeah it's really fantastic it's super expensive though so like so I'm burning through money like I'm spending like $1,000 a month so if you have credits you can give me because aw AWS is they offered me $30,000 and like I really don't I really don't want to take my research we'll talk okay can we talk online no it but he's absolutely I'm giving a hard time but he's absolutely right it's a great platform you can you can load up a virtual server and so we have 2k 80 GPUs and it takes about four days to run but we're able to estimate the model in that time and so all this is we're all working off of a twenty thousand dollar research grant from Lakeisha and it's kind of you know we started this like six months ago so and we can see the preliminary learning anything that's read as a building anything that's that's not or is it anything that's read a destroyed building and we see like it's kind of learning that things towards the top right are destroyed buildings kind of starts out thinking everything's destroyed building then it stops and then the final after eight hundred passes of the data we get something that sort of closely approximates the input and you could say like every data scientist here is saying that's super easy because that's an in-sample so we can look at some of the out-of-sample results and they look pretty damn good so they're not perfect we're still you know we're working through the training data and the more training day we throw into it the better but they look pretty damn good if you care about confusion matrices you can roughly we're getting about 90 accuracy at the pixel level so next steps we really want to use this to predict refugee flows this is where we're actually economists is we're gonna look at this from like a micro economic standpoint people are making decisions about where to go on the basis of is there you know is there safety in this their narratives and so we're having micro economic models that are going to predict where somebody travels to we want to also analyze reporting biases for different type of conflict and yeah so what's the lesson here so this is actually backwards you know the data we're building our models and like a lot of our statistics ISM based around static data data comes in you get like you get a CD you get some thumb drive it comes in and we're building our data off of that we're building our models off that data and this we our whole project was on the idea of that like data is going to be coming in real time and we want to build our models that are gonna incorporate that and so so maybe maybe the goal here isn't necessarily to build the best model to build something that is useful for people on the ground for humanitarian workers and so this is I got this completely wrong because it's a really dynamic model is gonna be better than static model so static model is going to break if it sees stuff in in real time and it's not it's not able to adjust so completely opposite lesson here but okay alright so that's everything I want to present so any comments or suggestions are appreciated if you want to give me money research money that's great or if you want to talk to me about group lasso because I've got a project where I'm where I'm doing some group lasso stuff okay all right thanks so much [Applause] you
Info
Channel: Lander Analytics
Views: 2,040
Rating: 5 out of 5
Keywords:
Id: pLqL7qhi6pw
Channel Id: undefined
Length: 17min 45sec (1065 seconds)
Published: Wed Aug 15 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.