Andrew Duberstein -Pydeck: High-scale geospatial visualization for Python | JupyterCon 2020

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi i'm andrew duberstein and i'd like to talk about pydek pydek is a library for data scientists that renders interactive large-scale geospatial data in your browser it is similar to other mapping libraries like folium or ipi leaflet and then it's interactive in the browser but different in that it is able to leverage your machine's gpu for larger visualizations it's also different in that it's the official python bindings to a specialized mapping library written in uber called deck gl through deck gl you can plot hundreds of thousands or even millions of data points all without a specialized server throughout my career i've done geospatial visualization for logistics real estate planning robotics and regulatory use cases among others i'm currently working on fraud detection and instacart i'm also a technical advisor at unfolded which has written incredible state-of-the-art tooling for geospatial visualization you can check it out at unfolded.ai previously i was a tech lead manager at uber a data scientist at uber and before that a data scientist for kroger and tesco at these companies i've used folium leaflet js cardo pi gg maps ggplot uh qgis arcgis postgis geopandas d3 and probably other libraries for map visualizations and gis analysis if it renders data in space and even if it's just barely not malware i've probably had to use it for a job or with a client at some point pie deck emerged from my joys and frustrations with all of these libraries i'll talk about one use case in particular for fraud detection that made me think pi deck would be useful before we start there i would recommend that you follow along you can see the notebooks associated with this talk at the link in the slide here the main notebook for this talk is titled jupiter con i'd also recommend that you take a little bit of time to install pi deck and make sure it's set up correctly on your machine i'd like to start with a real use case i encountered from detecting click farms in china click farm is essentially a location with multiple apps and usually multiple users that are let's say artificially inflating click counts for advertisers or perhaps those users are in this case setting up artificial accounts they are not tied to the identities they claim to be tied to these accounts can be sold on the dark web they're usually trying to take advantage of things like first-time user promos you know that idea of sign up and get your order free that you might first order free you might see this on doordash you might see this on uber if you lock the location data from those mobile apps though you can start to combat this kind of fraud so if you log the gps signal from a mobile app you can get a sense that all these phones are in the same room together and it should look like an unexpected amount of phones in the same location so let's take a look at that here's one pi deck visualization of user sign up locations you can see that mostly sign ups map to where you would expect population density here is the urban core you can expect as we get further from the city there are fewer people and therefore fewer sign ups interesting interestingly there's actually one location on this map that has an anomalous number of signups it's a little difficult to see here we have the same data set as above this time rendered in 3d also grouped to an h3 geohash essentially a hexagon shape so if we scroll over mouse over the honeycomb we can see the number of signups that happen in a particular area if we hold command or on submachine shift we can move the map throughout three dimensions and of course we can instantly identify that location with an anomalous number of signups here in nambo you'll see that there's a lot of power here in that we're not down sampling the data set to be able to do this we're not censoring the data in any way to be able to do this and you can start to pick up on trends in these uncensored undown sample data sets that you could not pick up if you were using another library that might have to down sample data in order to render it that's one quick demo i want to talk a little bit about how pidek actually works then we're going to review some visualization examples and go through line by line pi deck is actually just a wrapper around another library called deck gl it's a javascript library it's particularly performant because it relies on the javascript api called webgl which renders high performance interactive graphics in 2d in 3d webgl makes it possible to take advantage of hardwired graphics acceleration allowing for dramatic speed and scale increases versus other mapping libraries so here we have an interactive demo of 860 000 points this is census data essentially representing the population of new york in 2015. you can see it's uniformly interactive and of course you can see actually that we can update the visualization here is a visualization of earthquakes over time so these are 218 000 earthquakes you can see it's fully responsive in the same way the visualization above was we can also play that data over time so if we zoom in here you can see how active these particular fault lines are and of course you can see where the ring of fire is you can see where the individual fault lines are and you can even see things like the fracking boom in oklahoma which ends up causing a series of earthquakes later in this visualization since dexcel is a javascript library it might not be the most accessible to data scientists you can imagine geologists want to render the visualization we saw above or actuaries might and they might be more familiar with python and less familiar with javascript we figured a good way to create a cross language spec for deck gl and rendering visualizations declaratively would be through this json api so essentially any language that can write json can now write gl visualizations and if you go to this link here that https slash deck gl playground you can actually play around with this editor yourself or of course you can load up the iframe here in the notebook here we are in the editor let's change the pitch the map free pitches let's change the height let's also set that pitch back change the pitch change the height and you can see that it's fully responsive as we modified these parameters around this hexagon layer the hexagon layer actually changes shape we can even change the base map and there you go so that's an example of using the deck gl json api directly in this editor you could imagine that as a python user you might not want to write a large block of json so essentially pi deck is three libraries it is a python interface to this json api it is at gmail slash json which converts a json config to a dexial visualization and it's attack gl slash jupiter widget which connects a visualization to jupiter stream letter collab often in a bidirectional way let's take a look at the json from our visualization before the artificial signups visualization so here's the output for that and we can paste it into the json editor and then see that pi deck is more or less just writing json on your behalf you can see that visualization is identical to the one we had before vitek also aims to be interactive and easy to update here's a data set of blue whale migratory paths rendered over time each individual color represents one unique well you can imagine if you were a climatologist or an ecologist you might be interested in knowing this data or having access to it as well as playing it over time and we can see pydek does not struggle with rendering these large volumes of updates things look relatively fluid and of course you can see the patterns and data like the whales migrating towards california in the summer and then by the time that winter rolls around we'll head south let's just really quickly go through one pi deck example line by line we'll use that data set from the json viewer before this is a data set in the uk for 2014 motor vehicle accidents the data set has within it the longitude and latitude of each accident and let's just plot it so here's one example these are all the accident locations we load the data in to in this case append this data frame we specify the kind of layer we want you can look at these in the pi deck docs to sort of have an idea of what the menu available to you is we load in that data frame we specify our data as the second parameter here in this layer we specify get position which is our x and y value for our data set next we also specify some parameters that are unique to a scatter plot layer um occasionally parameters shared by multiple layers this auto highlight will this radius min pixels here essentially says the visualization no matter how far you zoom out from the map will have to be at least one pixel in other words each point gets a minimum of one pixel in this case we have a fill color specified in rgba format so you have the layer you have to have the viewport essentially a camera angle relative to the viewport it's the first angle the user sees if you want to quickly find an appropriate angle for your data there are a number of tools online to do this i wrote one so i'm going to use that really fast if we go to say united kingdom we can copy and paste that viewport in and we'll render everything the last thing we need to do is of course bring our view state and our layer together so there's our layer it could also be a list of layers right now we're just rendering one in this case we're going to take our view state that's that angle of the camera relative to the map we're also going to specify a google based map using satellite imagery and here's our visualization so we can see one point for every accident within the uk let's zoom a little bit closer here suppose we wanted a richer three-dimensional visualization we can also aggregate the data to hexagon you can see that we have a tooltip now we have this auto highlight going and you can see the number of accidents at any particular location under a hexagon let's go through this visualization line by line as well so here instead of scatter plot layer we're specifying we'd like a hexagon layer we specify the same data set as before this time we can see that pi da can also read directly from the url you don't necessarily have to load pandas first we specify a couple parameters that are unique to the hexagon layer i'd really recommend checking out the deck gl docs to get an idea of what these parameters might do next we specify the view state much in the same manner we did before finally we combine that view state with that layer to be able to render a map and we'll also specify that we'd like a mapbox base map in this case as well as a tool tip and you can see that we can actually render the tooltip using html so we have a bolded number of observations there and of course then we render it in this case we're going to plot geojson with geopandas we'll load geojson for all the hurricane locations for the last 150 years into pi deck you'll notice here we don't actually specify a position column we just call the data frame directly we specify that we'd like a geo json layer we specify that the layer can be selected that the data shouldn't have an outline that the polygons if they exist should be filled and a couple other parameters you can read a bit more about in the deck gl docs once again there's an example of what our data set will look like the name of the hurricane the geometry shape the year of the hurricane the maximum sustained wind in that hurricane so here we have the paths for all the hurricanes for the last 150 years we've also specified the line color be rendered as an rgba value but the r and b values here are provided by the maximum sustained wind value so pi dec will look at that variable for maximum sustained wind and actually substitute that in in this case if the maximum withstand wind were actually 80 miles an hour as it is here the line color for that particular hurricane would be 80 0 80 200 which is one of these darker segments of purple the data set above is a little bit chaotic when visualized all at once so it might be helpful to visualize the data over time so here we see the data set for hurricanes rendered over time let's talk a little bit about how this works we take our layer from before we specify our tooltip we specify our map style we put this all into the deck object like we've already done specify an ipi widget's html object that way we can display a year as the data updates and then essentially this is our main method we go and we play the data over time for the first year of the data set the last one we substituted out our data we called deck dot update in order to tell pi deck that our json config is changed we substitute in the new value for the year and then we pause a little before loading in the next year of data i'd like to quickly show you how you can use ipi widgets plus pi deck and a couple external tools to build some really powerful applications we're going to talk about isochrones essentially an isochrone is a geometry that answers the question how far can i get from x point in y minutes so in this example below here we are in the city center of palo alto within the green ring is everything that's reachable by car within five minutes within the blue ring is everything that's reachable by car within 10 minutes you can imagine in the case of tourism real estate retail site planning transportation planning and much else you might have questions that benefit from an isochrone how many shoppers are within 20 minutes of my grocery store how many points of interest are within walking distance of my hotel pitec is capable when combined with external tooling in this case we're going to be using a route engine called graphopopper of interactively rendering these isochrones it's worth calling out you'll need a local version of grab hopper running on your machine for this demo see the graph hopper docs for setup i'm using the dockerized version of graph hopper you'll also need to install archery in your python environment which may have some additional dependencies depending on what platform you're using also last thing worth calling out this demo will only work for berlin but you can configure graph hopper to use other open street map data you can see geofabric the link there to get an idea of what your options are i've written a graph hopper client this will reach out to graph hopper and fetch an isochrone based on the latitude and longitude i'll very quickly render one example isotron here we are in berlin and this is how far you can get within 20 minutes from the city center of berlin i'm going to combine one of these data sets from the pi deck docks this location of all the beer gardens in germany according to openstreetmap i want to answer the question how many pubs are within 20 minutes of any particular click so here if we zoom into the city center berlin we can see we have some beer garden locations so here's rosengarden for example located at that latitude and longitude if we click there are 38 points within 20 minutes of the clicked points essentially these are how many beer gardens are within 20 minutes of the click here's another nine points cool 18 points and you can see the pi deck might be useful for certain kinds of geospatial applications that you might want to write and you don't have to write a line of javascript to do it this is also going to be one more example of a a two-way application that i think is kind of interesting as well as some other features of pi deck that i don't really have too much time to go into but want to show you at a high level here we're looking here we're looking at a data set of concert performances by a set of artists the data is from setlist.fm in case you're curious to extract it yourself we have a unique id for that performance we have the artist in question in this case the band the national we have the date the event happened and then we have the name of the venue where it happened the lattice the longitude the city it was associated with the torn name as well we also have the previous location that the national was coming from here in this case we're actually going to plot the data on a three-dimensional globe and i'd like to draw on that globe essentially a line linking all the tour locations that a particular artist went to so i'm going to actually specify a globe view you can look at this in the pi tech docs unfortunately for these 3d globes you can't really get the 2d tile data on it just yet so we're going to bring our own custom geojson layer as a base map i'll do a little bit of data cleaning i'm going to be adding a color to the tour we'll take a look at how this random is used in a moment essentially it's to keep lines from colliding with each other in the visualization and then i'm actually going to specify this time three layers we're going to specify our geojson base map we're going to specify an arc layer essentially a line that will join where the artist was to where they're going next this get tilt times random essentially takes that random value multiplies it times this 15 and then provides a tilt value for the lines we'll see in a moment like i said before this keeps them keeps them from colliding with each other a scatter plot layered highlight the city location essentially each circle will represent a city or a venue our specialized globe view is passed here we have to set map provider equals none to tell pi deck we don't want a base map it assumes you usually do and here is the visualization so here for the recording artist john legend we can see all of his tour locations for the entirety of his career you can see also the tour names this is only as good as the data from setlist.fm is but i think it's probably pretty good here we can see a european tour we can change the artist so let's take a look at daft punk bob dylan international beyonce looking at this tour map i hope you get a sense for how pidec can be used for richer applications i'd like to talk about some other features that we didn't get as much time to cover as i would like in the catalog of available layers you can see at pidek.gl that's our website other spatial projections so we talked a little bit about the globe view we haven't talked about the orthographic projections or anything of that sort uh custom layers are google earth integration is actually a pi deck custom layer and you can check that out and of course like i said before i really recommend taking a look at the docs in case you're curious to learn more i also want to talk about what will change pidec is a relatively young library and has a lot of space for improvement a lot of you've requested legends we will be adding legends to pidak some better defaults so that you don't have to configure as much about a visualization there are certain cases in pi deck where you don't get as much information back from the error message as i would like more 2a interactive environments will have support for streamlit and collab in the same way you saw our jupiter widgets integration work we'll have improved support for time-based updates and of course much more follow our release tracker tasks on github join our slack channel and of course feel free to tag me or message me directly with issues just a quick thanks to the team and a particular thank you to the urban computing foundation for supporting the project and especially thank you to jupitercon for having me and you for showing up feel free to tag me in any awesome visualizations you might have and feel free to message me with questions thank you guys
Info
Channel: JupyterCon
Views: 900
Rating: 4.8333335 out of 5
Keywords:
Id: i-dGU80hNOw
Channel Id: undefined
Length: 22min 10sec (1330 seconds)
Published: Fri Nov 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.