Creating Beautiful and Meaningful Visualizations with Big Data

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
before I introduce our next and final speaker what do you think is she going to have a word count in her slides yeah how many of you think yes okay all right we are in for a treat so our next speaker Shawn hey Shawn is a senior data visualization engineer at uber she's a coder designer and a data artist Shawn is the founding member of Oberst data visualization team at uber she builds data tools and platforms to make business intelligence easy to access creates expiratory data visualizations to facilitate data analysis and modeling so you're really in for a treat there's a lot of cool moving parts that's gonna be on the screen shortly and prior to joining Ober Shawn studied at MIT and received master's in design computation while conducting research at MIT sensible city lab a state of visualization specialist her work was exhibited at Milan Design Week too in 2013 Google i/o in 2013 and Venice in 2014 so let's welcome Shawn he [Applause] and people come to me saying how do you plot charts or you do infographic or do you do like dashboards usually the answer is yes to all and just because you know data visualization is actually really expertise where we just help people better see their data and then we do all kinds of we have all kinds of design and self engineer not working in this domain so today I'm going to for those of you who may not know what data visualization is I'm going to do a little bit introduction I will also go through some of the working process that how a visualizations expertise work and then I will show some of the work we do and in the end I will also show one of the tools actually we'll be able to help people better which was their data a little bit about myself I'm a senior division engineer Hoover so my back was actually in architecture the one that actually designed buildings the anything is when I graduate from architecture background it was a computer science degree I put in the keyword self engineering architecture in a job search website and then after none of the result came ad is about architecture it's like so far educated architecture is that what you mean so no so but sometimes when people ask me what I do I say oh I like PR visualizations and sometimes I actually build tools to build visualizations and one of the days I actually have to build tools of your tools or variations I mean that's just what's our engineer do every day right so and so also I actually use Java something for Big Data sorry Holden I'll already the slides when I was listen to your talk so the reason is that you know if you want a visualization going to visualize the data in the interactive way it's many of the times you want some kind of web applications and then that's the part that our expertise might actually come in handy so we can talk after that so I over we have a lot of data we need to make sense of so ubers business is based on like our data which is our one of our biggest assets so the data visualization team actually have a lot of challenges one of them being like every day we're collecting millions of trips billions of data points how do we help our engineers and our PMS and our our developers to make sense of what is actually behind it so that you can make our business better so when it comes to major ization like people ask me why is it important this is a this is the example that I always like to show when one example is that when I first joined Wooper which is about four years ago everyone is obsessed with numbers they put as many zeros as they can as many comments as they can in their reports and everyone you know get excited just by seeing a number so and then they come to me saying oh hey what can you do with this so when I first joined I try to make them realize how how important visualization is is by taking one of like a day of trips in San Francisco and then I made SMAP so everything actually it'd be better if you I should have said if you you know sit closer you'll be better but so every single line of on this map is made of one single bird trip and then this is just one day and you guys probably know San Francisco you can probably tell that by the pattern you know every bird trip of just one day is covered every single street corner of San Francisco and what's really interesting is that because those data are completely January by our drivers by our trips there's no like massage into it it can show you a lot of information one of them being that the reason this line is really really bright other than the other one this is because that's a highway and the highway you have more lanes you have more cars and once you have more lines and more cars and more trips you actually have the density of data points overlay on top here that's why you get a lot more you know a lot more contrast over there and then another thing interesting I like to show is that you know if you can see our shirts share the deck in the end so here you actually have a lot of zigzags the reason being because those where the fit is you have a lot of tall buildings and that caused the GPS to lose their accuracy so the lines we can draw in here is really noisy but it's very valuable because in our engineer can actually go in to take a deep dive into that area to see which part we have more like GPS arrows and that's probably the part we actually have hard time to have a more smoother pickup experience so yeah so everyone's got so excited about looking at this map like the San Francisco office gonna print it out and their wall so I began to do more and this is New York this is Mexico City and this is my favorite map just because you know this part you don't have any straight lines not because that just how they build the city but because those part you have like contours those are mountains so you can't really build any straight line on mountains but you can build rows actually follow the follow the you know curve and the hillsides of mountain and this is Paris one of my favorite city and this is London if you haven't been to London this is really big Linda is huge and this map literally covers about 50 Cal ooh I don't know the number this is my blurry covers every single dishes of London and then on top of that you know making things move I took that one day of London map I actually started made this animation where every single you know moving lines is one trip and then it just one day it isn't really playing back 2016 after a single moment of London and then by looking at this it kind of immediate you have the idea of this get out operations you know instead of that like many 0 numbers this kind of gives you a much clearer idea of where our data looks like it's not the end of a presentation yet so starting from there you know the statement at the visualization team I should make a lot of maps we can and just because a lot of overseas geospatial based oh yeah so you remember I'm gonna talk about the tools yes I'm gonna talk about it later so we make maps just to gain insights from it and we we make a lot of maps this is my favorite one actually made it by just this is one I try to show how different regions of the city try to connect with each other so I draw just one single line between the pickup and the top of the trip instead of drawing the actual path so by doing this you kind of see like this kind of a count center here there's probably a satellite city or that's the airport you know that you kind of see how different part of the city connects to each other by just looking at this map this is one I did for New York trying to show the density of it I actually add a third dimension like 3d to this map and I actually have to replace the uber data with trip taxi trips but you know this kind of map we usually do try to just to show density and I also have this map showing the actual GPS pings in Jakarta but but bended by and about a 100 meter circles so the dark the brighter the color is the more dense that GPS ping is and then this is one of my favorite if you're not from New York but you know like this US NASA map like it's called like the dark night where they made this map of entire earth and then the lights being where the streetlights are so you're looking at you are like looking at the city from the space and you kind of see the city lights up so this map is actually made of all the Whooper app activities in New York City in one day and then you know I call it like you know over dark night but you know just are looking at you kind of already see the entire Manhattan Island just line up just because that's how many people actually looking at event using it and then this is actually where I'm trying to study so I just really love maps this is the one actually I got I mean a conjecture right like have something with like physical space so this is the one actually got a tree street trees of San Francisco of some mystical government open open data portal and then I draw like a kind of density of where the streets at most street trees are and then this is this really merely come up I actually don't know the name of the row but if you if you somewhere around Inner Sunset that's where I have the most abstence of street trees and then this is one of the contour map that I did just by drawing the heels the heeled the contour lines of San Francisco and this is where Tim Peake is got to impeach okay but this is you know going beyond beautiful visuals this is where I have to chop the mic so you know at this point you how Hiroshi does this I joined pretty maps and then you know at the end of the day what I actually hate the most is someone come to me saying oh hi I love you Maps can I use it as my screensaver so I got that a lot the reason being that I think visualization is not just not just about producing something pleasing and beautiful it's actually about showing you something you can actually we try to help you seeing something you cannot see from the zero and once we try to help you that you can actually gain insights you can find patterns you can find anomalies you can discover stories behind the data so that's actually what I trying to say here you know I can show you twelve maps and you got excited but I really want to show you it's actually how do we process this and create these maps and then tell you something about the data so so we actually create visualizations to gain insights from data but sometimes this process can be really painful you know in theory if you go walk through the office of uber you see some people with fancy maps you're like all those people from data visualization team that's all they do so in theory you think what they do is I write they write code they run it so like terminals and it's something nice just show up on the screen but in reality it's more like this so you know 10 out nine out of ten times the code doesn't run and then you try try try a whole day looking at council heroes and then in the end you have finally showing something on the screen but you actually don't know what it means so this is our day actually so design meaning for visualization always require a lot of trial and error by you know usually a process will be you have some idea you draw some sketch you build some mock-ups with like Photoshop Illustrator you prototype it actually writes more programs in Java switches make just change trying to see how it looks and in the end you make into production but in reality you know you got stuck in this infinite loop over and over again sometimes you don't even get a production because you know there's just nothing there for you to show you cannot fake it because it's open data so example of that hole over and over process I'm going to show you a little bit I'll show you this test that I got to create a visualization for uberpool so the idea the hypothesis of Hoover poor it's kind of simple you have less cars on the road when people pulling it's gonna be more efficient and then I take that hypothesis I started my data collection so I took one day of uber portraits in San Francisco and for all the people who poured on single trips I got the four start at the two-star and two endpoints and then running through awesome to try to get the routes just so that I want to simulate that if those people didn't pour they will write separately what were the trips look like and then I try to use the comparison just so that I can do two visualizations side-by-side just maybe in on the Google cool side I will see some signals so that's my kind of thought process so I started doing things first I collect a day of trips and then I you know running through like awesome this is I use Python by the way as well so I did all the data massage and processing in Python and then have them like clean up so the way they want and then I started doing some you know fun things the first try I did is that you know I like to make things move around so basically for those two data set oh I did is that I just animate all the trips I just have them move around the cameras and then the result isn't that good you know I I thought I would see some like oh you know on this side it should be a lot of volume other side should be less but actually doesn't show me that at all and then I think about it the reason being that I'm only using single color and you can have single color over there on top of each other you know one time a hundred times they're still gonna look exactly the same so if this didn't work the second time I used to suffer a car qg s i instead of drawing the pass I just grabbed all the GPS pings I can get even get from the two data sets and then I plotted them just static maps again this doesn't look good like I cannot see any density I know there's a lot of trips along the highways but so is there you know I just cannot see a thing down there so I have to go back to my sketchbook I started thinking about okay what is the key difference between those two operations that you can find this key difference I can use contract to make it more obvious so so the key difference obvious is traffic right on one side you have twice amount of of course so maybe color is a better way to represent this so what I did is that I basically use a write a Python script I calculated every for every five minutes how many tributes passing one singles three segments and then I a grated an O the number of trips cars on one single street segments and then use a color scale to represent the numbers and then I map that numbers back to my trips so by doing that actually were able to paint maps like this using colors so the darker it is the less Carrodus about the brighter it is the more cars data that they are on the single street segments and then when I started thinking seeing some of the difference contract I apply the same to the other map this become more promising to me because now you can see the difference where this part is lighter more and this part is less so I think you know using color is a right way to go and then a sorry to play around with the skills in the end you know there's interval using or you know instead of using linear start using Jenkins natural breaks to calculate the color and then in the end and made a video again I added motion to thus adding maps as I always like to do now I can see a lot of difference here so once I painted all my chips based on traffic volume and then I started animating them I got already promised it reads out and this is exact basically I merge both data together so I'm not just normalizing with him one I merge two of them together and calculate the domain between you know because I calculate the streets treatments how many cars are there so for each of those three sermon I calculate the domain there are more individual drivers but before last hypothesis of this visualization right before with all the other method I wasn't able to see the signal but now I use the color I start animating then I've actually be able to see the difference so yeah so see this is the common process of geospatial visualization where you know what we do is we got a bunch of datasets we started processing them in ipython or or node and then we apply all this different visualization your coding skills where we choose different major ization layers which should we choose colors we apply scales which shows different dimensions and in the end were able to get a sense of different visualization by going through the steps well over and over again so the idea behind this is being you know the horrible realization probably took me two weeks of time I use all different like d3 cameras QGIS Python those all those tools but you know in in real in real life we want everyone be able to do this without actually have to know all this like skill JavaScript and we want we want people to actually quickly apply all this methods in just one single tool just so that it can quickly get some insight from it and then move on to another dataset or actually go deeper into it once it's your signal so you know job security aside I I don't want to spend all my two weeks in doing just one major ization and we want everyone to be able to do the whole process without learning others like different languages so that's why we actually build this tool called Kevlar GL and all the way all the maps I show you before is actually made with this tool so we build ecology also we can fast explore all the million millions of geospatial data just in your browser and you know it's basically have this if you use ArcGIS or use Photoshop it has this like layered approach where you throw your data set in you can apply some of preset layers away give you like a hex pin or scatter plot or you may have 3d buildings to it and then the idea is that you can apply different colors to your data sets play around with the scales play around with sizes see which one actually can show you more signal and then it's WebGL based because you know anything big data on the browser need WebGL and GPU you also have all the presets map visualization so you don't have to write your own and he actually does like client-side on-the-fly geography maps hex pinning grids things like that so idea is to make data-driven Maps effortless and anyone can have fun and play with geospatial data and create beautiful Maps so the basic flow is that you upload any kind of data set you have to the browser you know usually you can browser can handle up to like 500 to a one gigabyte of data obviously do some clean up before that and then you apply the filters that the to provide based on whatever data type you have and then you are able to and then you will be able to create others different map layers and then interact with it so with that you were able to just with one single dataset or you were able to create our different visualizations using all the presets functions and then a little bit demo again because I'm not doing real-time demo just because I'm sharing the screen recording my screen it's gonna hurt the GPU so I'm going through the couple steps by you know first I will show you how to upload data second how to filtering 3rd 3 interaction I'll show a little bit of aggregation and then show you some layer types so upload your data so you probably know that you know most of the juice videos I come in CSV is GOG sound KML and shapefile a sample CSV will look like this you have all the rows each of the row in this case it's a trips in San Francisco so each of the rows just one single trip and then you have some data types like time timestamps and geo point lot bounce and metadata things like fair distance so all you need to do is just drag and drop you track the CSV in there and because we built to try to find geo space geo points from your data set so we were able to find that you have this latitude long picking trip laughing a trip long so we automatically draw the points for you so those are two hundred thousand trips and then those are the points are each of the points is where the trip begins and because it has this eye visualization channel so you can play around you can change the colors change the size and you can draw multiple layers at the same time right now I'm showing boughs the begin trip and then the drop off of where the ships are and then I apply this a blending method editor just so that the more dense it is more lighter it is and then you keep in mind this is two hundred thousand trips and the two points per trip making four hundred thousand points so we'll be able to use with WebGL were able to render like just four hundred points on the map and you can smoothly zoom in tomorrow pretty pretty easy so you know but this is just Photoshop right you don't just want to look at things on the map you actually want to find something in it so we have this function called filtering so future is basically based on whatever it ADA type you have in your your CSD your columns like we will automatically detect the datatype you know you know it from Caesar you know everything is a string so it actually parse all the strings and try to make sense of each columns you know this way you know for like you have if you have two thousand eighteen ten were think oh that's probably a time string and you have something look like you know San Francisco 31 points something hundred twenty two friends something oh that's probably good points so we were able to detect that and allow you to apply filters based on whatever type that we detect in this case I actually be able to apply like a time play back to my map because I have this column card begin trip temps up so let's play so I open the map legend just so that you can see the blue is where the trip begins and then the yellow is where the trip get dropped off and then I has opened this time filtering I were able to just drag so I see a time window where the morning is and then you sorry I was trying to post the video so so this is where around like in the morning between like 8 a.m. to 10 a.m. at the pattern become immediately very clear where the yellows are drop-offs right you have a lot of yellows around here financial district and then this is our headquarter and couching and you have all the pickups which is I spread out around the site map now Hiro marina and then this part and then when I started playing back in her day by just looking at about three hour window and start this like time playback thing you know I can observe how does like our patents are changing and it's very very obvious when it comes to about even income your time so yeah so about right now still there's a lot of you know blue become where the trip begins or you have a lot of rules on finding and then but yellows all around you know where you used to have all the pickups so this is fun by just you know able to actually filtering I was able to just looking at different patterns based on time then I can just say drag it back to see the difference and interaction so this is fun you know I one part of the big data visualization is that you can just see a static map if you want actually drill down to every single data points it becomes slower you actually either write another quarry or wait for like couple of seconds for the map to update but we were able to build with WebGL we actually have older like real-time interactions with every single data point and then we also have this really nice to work hard brushing where you learn can update the map based on where your mouse position is this is very interesting when I'm drawing this layer called arc or previously I show you where each of the arc is just connects to start and end of a trip and I'll just start drawing this arc layer of San Francisco but this at this moment everything is kind of intermingled you see there's like you know highlights around here and here but if you want to actually looking at every like a little street corner or just one region of the city it would you what you're not able to see what worried that one connects to so we build this interactivity in this interaction car brushing so you know I spend a lot of time playing with colors just because I want to make it like just more pleasing but we have this interaction it's gonna you know have you map set up turn on this cut brush now the you know now when I Mouse move over the maps I was actually able to highlight the trips just start and end around my mouse position usually was just a static map this kind of interaction is extremely hard it impossible to do especially when you have two hundred thousand trips on a map but was GPU calculation we actually embed the calculation into your shaders if you in into GPU calculations so that you can have this just real time and in you know very responsive interaction was was with the data-driven Maps I was trying to find SFO just because you know it's actually really really obvious where you know you know the trunk when you see where the trend transit hub is so downtown San Mississippi's big hub as I was a big club and the older trips start an end from SFO goes to all over the South Bay but here is probably where Oakland Airport is yeah so the brushing we actually have once you click it freezes the brush and then we have exports all the data that's being currently being filtered can be exports actually pretty useful where people just want to like after they eat filter to a smaller set they want to look at just a single set and in the end we have aggregations how am i doing on time from a minutes okay great so aggregation is very important because up to now everything I show you is every single trip right it's every single arc every single points but for you to study geospatial patterns or you know for trends it's actually a lot better when you try to aggregate all the single points into some kind of a geospatial pattern that's why we have this layer concrete and hex bin a heat map to allow you to actually aggregate based on single individual points so I'm turning this trip scatter plot points into a hack spin it has been heat map so by just I changing the layer type I was able to show this heat map and then in you know we allow you to actually changing the radius of your hex pin just so that you know based on the size of your map or how how detail you were looking yeah we actually allow you to change the resolution of aggregation and then if you know scale right now I'm actually looking at color based on quantile scale which is equal distance equal size bin but that's you know the I was actually trying to look for anomalies so I change the scale to quantized which is linear scale in this case once I change it to a linear scale actually be able to see where the highlights are which is obviously the obviously in this case is downtown San Francisco but you don't want to just see the obvious information right you know I have a lot of champions we have a lot of starting in from the International Airport that's all we have this function you can add like a third dimension you can actually filter the pins some pissant present ale where it's gonna where we have this present I'll slider that allow you to actually wait I'm changing the size of the bean just so that I can see what's underneath it and then then this this part is very obvious because that's where the high density of shrimp are but once I kind of exclude them I was able to swap to slide on the presenter so that I only show him between like zero to ninety percent and then redistribute the color just so that you can actually if you are actually interesting what's under in the lower percentile distribution or allow you to just filter out those those are layers and in the end this is just gonna be a quick because we have a lot of GS special specialties specialists the interesting in geospatial data visualization we also have this support for all kinds of geospatial data set like shapefile and geo Jason that we allow them to overlay like more geospatial information onto the map this is where I just chucked find us SF contour i'm dana says where each of the lines one contour line and then we I was just able to because we allow you to apply visual encoding into into numbers I was able to color the contour based on elevation because elevation is part of the field I have and then it's once you apply color it's become much obvious remember the whole bird poop oh-hoo-hoo exercise where I have to write all different code was this like I can't just drag oh no based on traffic or based on things like that yeah so this is the last one and again we are able to I was able to just apply filters so my contours based on elevation so I can see where the you know high elevation contours are so by just dragging on the slider I can see the map changings yeah so so the idea is that we want people to be able to draw maps without actually have to write any JavaScript they just need to you know have data set and then you know clock into it so since I ran out of time we if you're interesting we actually use all those different libraries that we've built tableau GL two of them's we actually wrote ourselves like deck GL react maps yeah those are all WebGL based libraries that we actually drive our team and that was open sourced and yeah so we're actually also going to open source Kepler GL in April so if you're interested in like using it you can try this link we will have you coming to the beta testing but in the end Cappadocia is just going to be after we serve round give you a link you can play around with any kind of datasets you have and in the end about us we're a team of 30 people now even though when I started just it was just me we all love hotpot and then we do it like every three or four months we do this family-style pictures all the time everyone's happy and if you're interested in the image ization you know come to talk to me after that yeah that's it thank you [Applause] yeah there's no ceilings everything is just so that's where does Gail come in handy right if I do linear is you're not going to see much difference but this guy applies actually jenkins natural break where you kind of be able to find the natural distribution and then color to calculate the color sorry I didn't yes yeah but like this is one day of trip so we kind of just think that we just I think those anomalies will just be over overlaid by the massive trends thank you [Applause] all right well so we learned today how to make big data fast thanks holder how to get out of pickles from Anya when you get into you know errors and so forth and how to make big data not only beautiful but also tell you stories so I think that was quite an achievement right let's give a big hand of applause to all our speakers so I'd really like to thank you for you know coming here and being here so late hopefully all the topics were interesting and you learned a thing or two I would like to thank our organizers for for tonight Jan and jewels and and team just a last bit of announcement before y'all can take over home so uh data Prix is hiring in every single role that you can imagine engineering product marketing sales field engineering and you know initializations yes so if you're interested we have our awesome recruiting team here would you please stand up all right I'm putting them on a spot so or in a line so we have Lucy Priya Turin Yvette and Jessica so these are recruiters if you're interested to learn about their positions we also have them posted online please talk to them they're gonna be around for a bit or you know if you are in a hurry get their contact information and exchange emails and you can get that information that way as well so thank you very much for coming really appreciated Cheers [Applause]
Info
Channel: Databricks
Views: 19,562
Rating: 4.9900498 out of 5
Keywords: Big Data, Data Visualization, Kepler.gl, WiBD, Databricks
Id: Z8E4_rOpbyw
Channel Id: undefined
Length: 39min 14sec (2354 seconds)
Published: Thu Apr 12 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.