[Uber Open Summit 2018] H3-js: Hexagons in the Browser

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody looks like we're almost at time so I'm gonna get started because there's a fair amount to go through hi I'm Nick Rabinowitz I'm a engineer on the visualization team here at uber and I'm gonna talk about hte GS which is the JavaScript findings for our h3 library and I'll talk a little bit about what that is in case you missed some of the earlier talks today and just very very small amount about me I've been primarily a front-end developer for over 20 years now and I've been an uber for about three years and I'm currently the maintainer of HDS I've worked some on the core h3 project and then I've worked a lot on the JavaScript transpiled version so just what we'll cover in the next 45 minutes I'll give you a brief intro to h3 in case you missed the sessions earlier on it'll be pretty brief there's a lot more about it online and in other documentation I'll go through the actual hdj s library and a few of the core functions and this is gonna be fairly code heavy all of my examples are in observable and they're all published observable if you're not familiar with it is a sort of it's basically a JavaScript version of Python notebooks so it's a live coding environment and you can go in yourself you can change the values and you can see what comes out if you have a laptop want to follow along at home feel free to do so you don't need to do that for this workshop so I'll go through what's in the library briefly I'll go through the basics of one method of rendering hexagons on a map there are lots of methods but I'll pick the easiest one and I'll I'll talk a little bit about it and then I'll build up in a couple steps towards an example app doing a suitability analysis which is the kind of thing that's actually fairly advanced GIS approach hard to do with other things very straightforward to do with h3 so what is HD GS if you saw any of the talks earlier today you might already know the answer but basically it's uber 'z hexagonal grid system so it's a geographic geospatial indexing system where you can uniquely identify any point on the earth as belonging to a grid cell which in this case is almost entirely hexagons and this allows you to essentially geo code things either at a very coarse level as you can see these sort of almost continent-sized hexagons here or down to a very fine level the finest resolution that we allow is resolution 15 which is about a square meter so it gives you this ability to identify the grid cell for anything on earth identify the position of that grid cell on the earth and this allows you to do a lot of bucketing and other transformations of your data that would be hard to do if you're just using points the way I usually think about sort of geographic grid systems is that takes all this messy difficult to work with geospatial data you know things like admin boundaries and lat/long points and turns this into it into this really clean mathematical space that's really nice for running algorithms and other kinds of analyses h3 has hierarchical resolutions which means that every course cell has a set of children at each level below it it has seven children in the next finer resolution 49 children in the resolution after that and so forth and that's also the process we use for identifying the index of a cell so the index of a cell is basically the 0 through 6 number of the cell at every point in that in that resolution hierarchy and the library comes with a whole bunch of useful algorithms I'll go through a few of them they mostly deal with things like traversal so finding the neighbors of a given cell and dealing with things like regions so transforming between polygons and hexagons so let's get started this is the short link for the the workshop itself T uber comm hv - j s - workshop or you can probably find it on the observable site and from here on in I'll be out of slides and into observable and this is the this is the actual site you can go to directly if you miss the link so quick intro h3-j s HP GS is JavaScript but it's actually not a port of the c library but a transpiled version so we used M script in to take the C code transpile it to j s and then write some bindings around that to allow you to invoke the c functions the nice thing about that is it means that all of the logic is exactly the same as is in the c library there aren't any discrepancies so you can basically use h 3gs the same way you would use h3 and any other of the language that we have bindings for the core of the library is are the functions that give you the h3 index for some geographic coordinates or give you the geographic coordinates for some h3 index on the h3 index is a 64-bit number which we usually encode as a hexadecimal string because javascript can't deal with 64 bits and you can see it looks like this and if you I can't do this with one hand you know if you give it different coordinates you'll get a different HB index this last parameter here is the resolution so this is resolution 9 it's about a city block for a while this was the sort of semi standard resolution that we use to do ber because it's about the right resolution for a lot of our data but you can use any resolution that makes sense for your use case the reverse of this gives you the center of the hexagon so it gives you a latitude longitude point for any given index you can see that this is a lossy transformation my data coming in could be much finer granularity than the center of the hexagon it falls into and so that's one thing to consider is that usually when you're transforming points to hexagons you get some data loss in terms of your accuracy but you a lot of things in terms of your ability to work with that data and your ability to bundle different points together and this last part h32 geo boundary gives you the vertices of an h3 index and that allows you to draw the hexagon on a map or using other other kinds of visualizations there's also a set of functions for dealing with a hierarchy so h3 two children gives you the descendants of a given cell every cell has as I said has either six or seven descendents this is the resolution you want to get it at so right here I've got a resolution nine hexagons going to resolution ten and I've got seven children if I changed this to resolution eleven now I've got 49 children that's the the grandchildren of my original cell and we can go in the opposite direction as well age three to parent gives you the the ancestors in this case I'm going from nine to seven I can go all the way up resolution zero and that's the the base cell for this for this hexagon these are useful for various different things and one useful property of h3 is that this is just a extremely fast operation it's a bitwise operation so even in in JavaScript the implementation of this is you know millions of operations per second and as I said we have a bunch of functions for traversal so one of the reasons we chose hexagons instead of a square grid system is that you get this really nice property of a hexagon which is that it has six unambiguous neighbors and so if you want to do anything involves moving around neighbors and you know spreading data out across your neighbors for example or buffering your cells or doing a lot of other kinds of transformations having unambiguous neighbors is very helpful if you think of a square grid and a Hama graphic for this right here but if you think of a square grid squares have four unambiguous neighbors and then these four corner neighbors that that are kind of ambiguous and it's harder to deal with them and they usually need a different treatment from the direct neighbors so kay ring basically give to the filled ring around given a given center hexagon hexerin gives you the hollow ring round as a given hexagon and that this second parameter is the radius so if I change that to radius two now I have 19 hexagons that are all in the ring around the original hexagon and the hollow ring if I change it to I've got 12 that's the hollow ring of all things that are all hexagons they're two steps away from my origin this is all a little abstract when I go through it here but it'll get concrete fairly quickly when we start rendering things I also have a function to give me the distance in grid cells between two HD indexes and I've got a bunch of functions to deal with regions in the rest of this tutorial we're actually going to usually use a utility library that kind of uses these inside of it but a ch3 polyfill is a way to go from a polygon to a set of hexagons so if you have a geographic polygon for example describing the shape of a country or a county you can fill that with hexagons get the set of hexagons that that are within it and if you have a set of hexagons and you want to get the outline you can use this h3 set to multi polygon the reason why these aren't totally symmetrical is that if you have an h3 set it might be multiple different polygons if they're not connected um there's a lot more functions in the HTS library I won't go through them all now but if you're actually here if you want to look at the github for Fraser ajs it gives a full list of the api's for everything um so now that's all fine and good if you want to look at code but if you want to look at hexagons we need to start rendering things on a map and as I mentioned before there's a lot of ways that you can render things but I'm gonna go through a fairly straightforward one using matte box GL and I won't go into all the details of using matte box GL B's matte box GL has great documentation but I'll just briefly go through one simple way of doing this you can see I'm actually rendering hexagons in two different ways here I'm rendering these colored shapes so these are filled polygons and then I'm also rendering sets of hexagons as an outline using that h3 to multi polygon function I mentioned previously the data I'm using here is just dummy data and what I've done here is simply take one hexagon take the K ring so this is the the filled ring of all hexagons within a distance three of this Center and and then I've made it into this kind of object where each each hexagon index is associated with a value and here I'm just using math.random and we'll get into sort of using more realistic values later one nice thing about math.random is that's between 0 & 1 which makes it easy to deal with the data later and as I go through this tutorial I'll be normalizing everything to 0 and 1 just be simplified rendering so now I have this object there are lots of other data structures that could work here but an object is a nice way to associate a hexagon with a value I'm gonna render a map as I said I won't go into great detail on exactly how this works this is actually taken from the you know some of it this is observable specific it's taken from the observable tutorial on how to use matte box GL but basically I'm I'm making a new map I'm attaching it to this container that I made up here and and then I'm using these render hexes and render areas function and that's where the the actual meat of rendering the hexagons is gonna occur the very TLDR here is each of these is making a geo json object out of the hexagon map and then it's using map boxes built in styles to render the geo json on the map so for every set of hexagons in this case you rendered the hexagons i use this utility library this is another open source library that that uber put out i'm called geo json to h3 it's fairly straightforward all it really does is wrap the core h3 functions to make some make dealing with geo json slightly nicer so this h3 is set to future collection basically gives you one feature / hexagon and then we can assign some properties to each feature in this case I'm assigning the property of value to that value that I pre computed earlier I add this Geo JSON source to the math and then I add a rendering layer map box has these two ideas in its style rendering there's the the the source which is your actual data and the layer which gives you paint properties and allows you to render one source multiple times in different ways I'm assigning it some some basic paint properties and I assign them some more paint properties down here this this stops syntax this is a computed color basically where I'm saying use these three colors and you know if it's zero use this use the first one interpolate between 0 and 0.5 to get to the second color and so on to the last color and then I can also change the opacity and various other properties so that gives me these these colored hexagons and if I change or rerender this set of random values I'll get different hexagons colored in a different way but using the same map scale and Andry rendering the map what I'm doing with the outlines is and I'll go through why this might be useful later but in this case I'm saying I actually care most about values above a certain threshold and I care about the regions that those hexagons define so when I go to render the areas I'm going to do this filtering operation of my hexagons I'm gonna take only what the ones above a certain threshold and then I'm gonna use those to do a ch3 set to feature which will basically turn those h3 hexagons into a polygon describing their outlines and now I have this nice way to paint an area on the map describing a set of hexagons and I basically I won't go through in detail but I use the same basic premise of adding the Geo JSON as a source to the map - the map box gl math and then adding some paint property for it in a layer I'm also referring sort of under the hood to to my basic configuration object which is just some configuration for this page and to these set of dependencies so probably should have covered these up front but they're sort of below the fold because you usually don't have to think about them much once you load them matte box GL obviously is the map library H 3 is the H 2 GS library and then goj signed age 3 as I mentioned is this utility library for dealing with geo JSON so that's the very basics of rendering hexagons as I said there are multiple ways to do this this is using built in native matte box rendering for geo json there are a lot of tutorials about how to do that you can also use deck gels rendering for geo json or you can use other approaches for using for example instance values index GL and there's a trade offs between these i tend to think that the map box one is easiest if you're not planning to do any animation or things that need to respond quickly to user input the main downside of the map box stuff it's extremely performant for things like pan and zoom but it can't animate very quickly because it requires a whole surf map refresh but for most semi static data sets the matte box tiles are a perfectly good way to go and it's nice because they don't introduce any additional dependencies on something like that GL so this is fine and good if you have your data provided in hexagons as this little hexagon devalue map but you're probably not going to be using input data that comes in hexagon form unless perhaps you work the doober and you're producing a lot of hexagon base data at the outset if you're using public data for example most of the things you're going to be dealing with are probably going to come in as polygons or come in as points um so the next question is how do we take some data set that's encoded in you know using a different geo data format and get it taxi gonns the really nice thing about using a geographic grid system at all and the h3 falls into this category as well is that once you have multiple data sets moved into a grid system you've got a common unit of analysis you can use for everything that allows you to join datasets that are otherwise difficult to join it allows you to do things that require a unit of analysis for the equal area so if you want to for example like take some population of polygons and spread it across different pieces of equal area the bh3 cells give you a really nice way to do that and it allows you to express it at different resolutions so in this case this is resolution 8 but I can re-express this at finer granularity done you know to whatever granularity makes sense for me the the trade-off is usually one of accuracy versus processing time you know the more hexagons you have the harder it is to render them the harder it is to do the data processing if I want to show this at meter resolution I couldn't do it in the browser but if I want to show it you know at roughly city block I can do it fairly quickly so you have to make your own sort of trade-off decisions about what resolution used and the other question of course is whether you have some data that's coming in at the hexagons level and you need to align with that resolution your your mileage may vary depending on your own application but it's nice to be able to choose a different resolution that works for you so the data I'm using here is polygon data that comes from uber z-- movement data set so this is a data set of travel times and in this case I've sort of grabbed travel times specifically for Auckland I think these are called now I've forgotten it Travel analysis zones there's some City Planning defined polygons and and I've grabbed travel times from Oakland to downtown San Francisco and each of these is you know a polygon and each of them has some properties including this travel time in seconds from Oakland to downtown SF so what I want to do is I want to convert these all two sets of hexagons and I'll do that over here and it's really pretty straightforward I take each of these features I run it through the inverse function from G or Jason to h3 feature to h3 set that's going to give me a set of hexagons that falls within that that polygon one really nice feature of the polyfill function the way that the algorithm works is very simply to take the center of the hexagon and if the center of the hexagon is inside the polygon then you give it that is included in the set and if it's outside the polygon it's not included in the set that means that you will have some hexagons to go outside the polygons some hexagons that are inside the polygon but it has this nice feature that if you bring in contiguous data like polygons that all align perfectly with each other you'll come out with contiguous sets of hexagons there's no hexagons that's going to be assigned to two different features here and so in this case all I'm doing is making the same kind of hexagon devalue map and I'm just copying the travel time here from the polygon - to my hexagons this makes a lot of sense if you're dealing with kind of rate data something that's you know evenly distributed across the polygon where the value is the same at every part of the polygon if we were doing dealing with something additive like population for example then I would have like counted the population as many times as I have hexagons which is probably the wrong choice and I might want to divide by the number of hexagons I have that's one approach for example so if I was using population here I'd probably use this you know divided by hexagons length oops see how bad my one hand typing is which would give me values where the the actual value was a little bit closer to what you might have in each hexagon as a side note this is something quite hard to do if you're not using a grid system is if you don't have equal area then you have to start weighting each sub part of your polygon with with its area and you have to compute the area for every part of the polygon it's really easy with hexagons because they all have the same area and the last thing I'm doing here is as I said I'm normalizing it and that's just a sort of quick process to take all values and move them between zero and one that's sort of a convenience function for me because it makes it easier to render I don't have to compute this domain when I'm when I'm going to change my my color styles but it also makes it a lot easier if you want to bring multiple data layers together and I'll get to that a little bit further so Oh yep thank you I took my well I'll move on in Minkus but now I have my now I have my polygons I think I screwed them up when I started playing with the division revert it's a nice feature of of observable that you can revert to the last published version this is what I want and great for polygons what happens if you want to use plate layers in this case I might want to think about BART stations for example BART stations come in as points I have the locations for them they're not polygons necessarily and I might want to show them on a map now obviously I start off and I'm looking at just what's the hexagons that every BART station falls into so I'm loading again a Geo JSON feature collection every item in this future collection is a point as a lat/long has some additional information about the about the BART station and they don't have any data associated with them I'm just going to treat each one as you know a single value is one and all I'm gonna use here is this serve count points thing so I go through every feature I find a lat/long encode that to an h3 index and I add one to that index now you can see that most of these just come up with a value of 1 some of them you might have places where you have two BART stations or in this case I think I might actually have entrances as well as as exits or something like that so I might have two things in the same hexagon or that might be 19th Street and 12th Street not sure but you you end up with binning so instead of you know instead of just having a single hexagon for every private station some things will get counted with more than one more than one station especially if you start making the hexagons larger now especially if I have fairly fine-grained hexagons this isn't gonna be super useful for combining with other datasets because I only have a single hexagon for every BART station and depending on what kinds of analysis I'm interested in I might care more you know the distance from that BART station I might consider a BART station is having a radius in the little example that I'll give after this the this sort of sample app I care about walking distance to a BART station and obviously you know this is this data set is only gonna tell me information if I live right in the hexagon or if I'm walking right from the hexagon so in this case I want to do what in GIS terms is usually called buffering spreading out the the data for that BART station and putting it in a larger radius this is something h3 makes very very simple it can be difficult in other kinds of systems and it's not great even in a square grid system but if I want to buffer the points yet something like this so now each of these is is buffered all I have to do is take the K ring for each one the K ring remember is that filled ring around a central hexagon so my buffer points function looks really a lot the same except instead of taking only this index the h3 index for each station I'm taking the ring for a given radius and I'm I'm adding every hexagon in that ring to my outcoming data layer you might notice here that one issue here is if I bring in a radius as a number of hexagons I say you know I want a radius of two then it's going to be very dependent on my resolution if I want to have a common distance radius I could do that pretty easily and I made a little utility function here called km to radius all it does is take this h3 edge length function which is basically a metadata function provided by the h3 library tells me what the edge length is in kilometers for any given resolution and now I can convert some kilometer input into a an edge length and I can use that for my radius and so here I'm here I'm using this km2 radius I'm saying I want everything within approximately one kilometer of of my BART station now you can imagine with something like walking distance or as it happens with something like surge you might want to have some decreasing function as you get further away from the point and again this is something that otherwise might be quite difficult h3 actually makes it really simple because we have these functions to give you increasing numbers of holo rings out from that point so if I change this from my buffer points function to this buffer points linear function I can get these sort of decreasing areas of influence out from every BART station and this might be something where if I think that the value increases is the closer you get to the bite station Briggs I think parts stations are better the closer you are to them and over here the influence isn't isn't very useful I can use this to overlay them and get these useful points of density where you're close to a lot of bite stations the way I'm doing that again is fairly straightforward now instead of taking a single ring I use this catering distances function which is going to give me the set of concentric rings coming out from that single origin and then for every ring in that in that set I'm going to add the ring to my data layer with a decreasing step function so I'm basically taking the the distance times a step and removing it from my PI data value you can easily imagine there are lots of other functions we could use here we could you know we could use other kinds of interpolation you know you could use basically any easing function it's the same thing as you would use for animation easing or any other kind of any other kind of transformation but in this case I'm just using a linear one because I'm lazy and it was simple and perhaps easy too easy to explain um so now I've got a few different ways that I can represent points and I can represent polygons and what I wanted to do in the last part of this is go through a little app that I made is an observable notebook demonstrating suitability analysis suitability analysis if anyone has a GIS background it's this fairly straightforward analytical approach I like it is I'm not a data scientist and it appeals to me because it's really pretty simple um you just take a bunch of data layers you smush them together you add up you sort of add them up vertically if you think about them is as a layer cake you add them up vertically and you show the result of that vertical addition on the on the map and then if you care more about certain layers you can weight them more or less heavily um one of the things I really like about this is it gives you this nice subjective ability to say well I care more about this than that and I can see what happens when I change the weights of my different inputs and and make different decisions based on that suitability analysis in the GIS world is very commonly used I think the classic example is store locations you know I'm Starbucks and I want to know where to put my next Starbucks and I might want to be far away from other cafes or I might want to be close to other cafes which I believe is what Starbucks used to do right across the street from your competitor you know or I have other things that might matter to me and in fact I probably have this whole set of parameters that I want to consider I want to be close to cafes I want to be close to public transportation I want to be you know whatever you have maybe I want to be in high income neighborhoods or something if your Starbucks I have no idea and basically a suitability analysis allows you to take all these different sort of inputs of interest mash them together and and come up with some composite view of the geographic area you're considering age 3 makes this really easy because it makes it very simple and straightforward to join Geographic data sets you put them all in this common unit of analysis a hexagon and a particular resolution and then you can just sort of mash them together so this this map is my my suitability analysis very ad hoc for places I might want to live in Oakland I like Oakland it's a good place to live and some places in Oakland are better to live another place is no Clinton I picked some relatively random layers of things I might care about again very idiosyncratic I've got some crime I've got schools because I have kids and I might want to live near a school I've got BART stations because I want to be close to Bart I'm taking that travel time that I had because I want to be able to have low travel times into the city because I work in the city and I also really like cafes so I added some restaurants and cafes to this as well and what I've done here is to give you a little weight slider for each one and so if I decrease the weight of schools I get a different picture of the city overall maybe I care less about being close to Bart because I know I'm always gonna have to drive or maybe I care inversely less about travel times because I know I'm going to take part or I know I'm gonna bicycle and so this gives you the ability to sort of quickly investigate a bunch of different layers together and just to go through it in a little bit of detail I'm basically using the same techniques that I just used to encode all of my data into hexagons some of these are not coming in in geo JSON format and so I'm I'm doing different kinds of transformations but they're all using pretty much the same idea I'm creating this normalized map with values between 0 and 1 for Prime I'm adding 1 but then I waited negatively because crime is bad so I don't really want it added as a good thing to my math for schools I'm giving a hex ring around it because like I care about being not necessarily right in the same hexagon as the school but close to it for Bart I'm using I'm using the same buffer points linear function that I demonstrated in the past in the in the last one for travel time I'm doing exactly what I did in the in the polygon example except I'm again I'm taking the inverse of the travel time because I want lower travel times to be better and finally for cafes I'm just again taking the only the individual HD index because I want to be right next to a cafe I want to be able to walk to it every day sit there drink my drink my cappuccino now that I have all these layers I can basically just add them together so here I have this array of all my layers and I'm associating each layer with weight obviously these could just be numeric weights the nice thing about observable is it gives you this ability to really easily make a user input that can be associated where you can use the resulting output in the rest of your your sheet so each of these is just this little slider input and this output variable crime weight is a number that this slider represents so I have this this set of all my layers I've associated a weight with each one and then I just run them through this combination function and the combination function is really straightforward I basically it's basically a reduced I do here as a for each I'm gonna run through every map player in each map player I'm gonna run through every hexagon and I'm gonna put it into this combined map with the value in that associated with that hexagon weighted by the particular weight I've given that layer I'm gonna normalize the results I have values between 0 and 1 and I have this resulting map and I can see this map at different layers of granularity very easily because none of my data comes in as hexagons it's very easy for me to change the granularity on the fly and and make different choices and I can adjust the weights and then I can investigate so when I adjust the way it's very tight I get this highly granular picture that might be too granular if I'm looking for an apartment or a home when I get a little bit less granular I get the serve broad key map of the city places I might want to live places I might be less interested in living and it also gives me some sense when I combine this with my own sort of intrinsic knowledge of the city it might give me some sense of what datasets I might be missing you know here we get very high weights around Oakland city center that's great Oakland city center is great but maybe I don't want to live right downtown maybe I care about living in a less densely populated area and I could bring in that as a separate data layer if I want to bring in for example census data and and again I could also right here I'm still including all this part stuff if I only care about Oakland I could also run this through a filtering pass so with some of the with some of the analyses we do it over we basically have a mask of hexagons one common use here is I don't want water area in my resulting data set so I'm gonna run it through this mask which is basically a set of hexagons I'm gonna filter out the really nice thing about using something like h3 from a computational standpoint is that these are all just set operations I can really easily take take get out or add new data in just by running it through a union or other kind of SATA operation [Music] and here down here I have all the sources for all the data I'm using here again these are somewhat arbitrarily chosen for the for the example but but you can imagine taking in as many additional layers as you want I had to actually resist the urge just keep on bringing in more and more layers because once I have the basic setup here it's really simple to add more and more modular layers and start getting more and more levels of detail here so that basically covers the suitability analysis and the the nice thing about this is I've done this in GIS world using you know ESRI and ArcGIS and it can be very difficult to do because when you are mashing together polygons and raster data you have to worry a lot about map projections you might have to when you're doing points you have to you know offer them but you might not be able to do this like increasing step function can be very difficult to do unless you convert everything to a raster a ch3 makes it quite simple because it just gives you this single unit of analysis at a level of granularity you choose whoops I don't know what just happened but I lost it and and you can add them all together apparently I just lost my zoom connection but that's convenient because I also responded my presentation
Info
Channel: Uber Engineering
Views: 10,448
Rating: undefined out of 5
Keywords: Uber, Uber Engineering, Engineering, Software
Id: BsMIrBHLfLE
Channel Id: undefined
Length: 38min 26sec (2306 seconds)
Published: Tue Nov 20 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.