Data hacking - data science for entrepreneurs | Kevin Novak | TEDxWakeForestU

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
dang last talk of the how we do in keeping the weight keep it with ya all right as Chris mentioned my name is Kevin Novak I'm one of the senior data scientists at uber I want to talk with you a little bit about uber a little bit about data science and essentially what data is like at the very far end of the entrepreneurial spectrum first of all by a show of hands who here has not heard of uber before awesome my marketing team is doing a good job this is great so for those of you who haven't heard and for those you who need sort of broad strokes refresher uber is essentially an on-demand transportation platform what that means is we go into a given city fine delivery services limo services taxi services anybody who does transportation well in a given marketplace and provide a technology platform for them to run their business on on the flip side we provide the smartphone app that you can download and essentially you push a button and get a ride at five minutes or less essentially we're the next version the next iteration of transportation Travis kalanick our founder likes to talk about uber as a cross between lifestyle and logistics and the transportation world lifestyle is give me what I want give me exactly what it is it's date night I need a limo I'm going up to the next TEDx event and I need a ride for me and six friends it's Saturday morning and I just need to get to the laundromat whatever you need we give you exactly what you want and logistics being give it to me right now whenever and wherever I need it so that's the company at broad strokes essentially now what I want to do is dig in a little bit more on what does it mean to do data science at uber as you can imagine when it comes to reliable transportation much like giving a TEDx talk timing is everything and essentially what that means is we've developed a predictive model that essentially says given a pickup location in a drop-off location and a time of day predict how long it would take a give a driver to come pick you up or how long it would take that driver to cover this distance for example from right here at the weight Chapel in Wake Forest to the Greensboro Airport on Saturday about 3 p.m. it would take you on the order of 29 to 31 minutes now what you're looking at here is a heat map of pickup times for San Francisco the blue area being the less than two-minute times we see downtown red area are the longer pickup times you start to see in the less populated further out suburbs of the city those are especially in the front row who are familiar with San Francisco get to see a really neat effect where most of the major streets at intersections in our city just magically appear off the map and indeed if you visualized it a little bit differently and take a step back you can actually see the entire transportation network of San Francisco just visualized off of the uber dataset this is one of the really inspiring pictures this is like my personal beautiful mind moment eta is of course are a massive part of what uber does as far as data science goes but by no means the only thing some of the other major initiatives that we work on and focus on are dynamic pricing our surge pricing algorithm essentially what we could say is given a certain amount of supply that we've partnered with it give it a certain amount of de baad what is the right price for cars at any one given point in time essentially we've developed a system to dynamically change prices as the economic conditions dictate and as well as a method to let customers know it sort of baked that to an entire product other things are map matching algorithm if you open up the app you'll see cars following the roads that's not magic that's data science essentially we're mapping lat/long points from a given GPS to a road segment other focuses include much more traditional predictions much like we do prediction of hive of travel given the drop-off location and a pickup location what's the expected fare how much am I going to have to pay right this is the key part of any sort of transportation initiative another one of our products so they start so that's the big picture where we're at as a company I founded relatively young were about we were formed in about the middle of 2010 so we've been around about three and a half years and really what I was thinking about as I was preparing this talk is all of those products I was listing all the problems that were trying to tackle are indeed very difficult but they're difficult for reasons which most people don't think of what they think of data science there's been several incredible Tate the data sides talks given seven TEDx events sub otherwise I encourage you to look them up if you're more intrigued about the concept of data sides as a field but really what comes out of that the commonalities that you figure out is that data science is hard because it's a massive amount of information we're just dealing with so much information your computer's going to grind to a halt trying to figure out some sort of cohesive story that you could understand in ten seconds or less is functionally impossible that is indeed a huge part of what the complexity of data science is but it Hoover where our complexity comes from though the problems are difficult have to do was the fact that we're in a problem space that didn't exist five years ago we don't have twenty thirty years of information to rely on it and indeed nobody's really tried to tackle these problems in the rigorous qualitative way which we d-22 rather business and so what I was thinking about is that as what comes out of this is that even though the problems base is very similar to sort of the canonical definition of data science you know distilling quantitative answers for very ambiguous complex problems especially at an early stage startup the skill set is actually fundamentally different and what I was thinking about as I was prepping this talk is calling myself a data scientist there's a bit a big us at a bit disingenuous so borrowing from the standard startup lexicon I wanted to discuss the concept of the data hacker with you guys and essentially what this is is how do we bring somebody who focuses on solving data related problems in a space which was that didn't exist you know as I was saying he's either very recently formed or the company that's only been in existed three six twelve months of the time so as with any good talk I wanted to go and give you a few habits a few best practices some of the the tools and methods and habits that I think make us successful at this brand-new field of data hacking and I think some of the best ways to teach any given point is teach through example and in this case I'm going to show you an example of a way that we did not do data hacking well this is a study of our dynamic pricing algorithm some of the early efforts designed by yours truly so I get to pick on myself for a little bit this here what you're looking at here when you when I was talking about dynamic pricing this is one of the very first iterations of our surge pricing screen that you would see that app designed by yours truly my favorite is the left aligned icon at the bottom I'm a data scientist or data hacker not a visual designer one of the great things about this this app this app besides the fact that it exists was that it was clearly designed for an engineer in mind in that the most key piece of information when it comes to dynamic pricing ie the price is buried in size 12 font in a wall of text all the engineers in the audience are like yes this is absolutely my bread and butter so where you ship this ran it through across our engineering team everybody whoo-hoo we know we are moving the company forward and we ran into experiment this was three months of data now what you're looking at is prices increasing along the x-axis and the conversion rate increasing along the y-axis and you're seeing a very peculiar economic phenomenon as I increase my prices more people want my product I call this the patron effect now what's really going on here is there's a hidden variable I'm not showing you guys and what that is is economic scarcity so the way the model is working is it's only raising prices if there are very few burrs available if there are very few burrs available I guarantee your other alternatives are not existed there are no taxis there's no public transportation it's 3 a.m. and your best friend is not coming to get you though matter how much you beg plead cajole text him and so what's happening is is you have two competing factors that go into anybody making an economic decision first of all is how much does it cost and second of all is what else are my alternatives and what you're really seeing here is somebody who's not paying attention to pricing this is a strong signal to us as it developers us as data hackers that we are not communicated key information well so what I did in a flash of insight after seeing this after putting the cigars away instead of celebrating my financial future was hire the visual designer I didn't touch a single line of code and we redesign the search screen to look something very similar to this a stupidly large multiplier in the middle in a different color and didn't touch a single eye of the model and we just said all right I want to see what will happen and what you see is a fundamentally different change in user behavior now this is nice in that it lets first of all it's economically predictive so now I can actually use pricing is a way to throttle a demand and sort of increase supply the world is rational again Adam Smith was right after all but the key moral here is that the data hacker builds a product not just a model an incredibly well-written model that's making all the right recommendations wrapped in a terrible product is indeed a terrible model the second habit and this is more about an Outlook conform your science to the startup and not the opposite one of the key buzzwords that people like to throw out is my startup is data driven data makes all the decisions we charge and that's great for the ego right like yes I am the man like everybody else has to listen to me but the reality is is that a startup whether you use the metaphor of a fighter jet or a ship or whatever you know your convenient buzzword is is that data science is only going to be making the decision if data science is moving at a speed that can make the decision I like to describe a start-up as a school of fish where the fish obvious cool obviously has a very directed motion to it but the decisions are all being made by the fish they happens to swim the fastest it is at the further pack if you want to be a data team that steers your company that wants to drive the data from progress you need to be swimming at the front of the pack data habit number three and this is about people who want to come into data signs you've read the Forbes articles you've seen the TEDx talks you're inspired by what I'm telling you I don't want to be a data hacker well I would say to you be a boat builder and not a sailor and what that means is you need to cultivate a problem-solving wide-set instead of a process-oriented mindset if you come in and say I want to use the latest MCFC algorithm I wanna I know sixteen programming languages that I want to use the ball oh say that's awesome I don't have a job for you what I have a job for is somebody who could solve ambiguous hard to understand complex data problems because that's frankly what the work is this is what I'm talking about when I say the programming question people say what is how much programming do I need to know in my my smart answer is as much as gets the job done because really that's what's going to make you successful as a data professional and finally data hacker habit number four and this is where we get a little bit into the philosophy the role of the data scientist the role of the data hacker at any one given point in time in a company what your job is as the person who's distilling complex information to discipline sirs you have an ethical obligation of philosophical obligation do not only give the best answer you have but to also make sure that you communicate your clarity your confidence your sincerity at your answer to the person who's consuming or information and when I tell them what people has always quote the error bars sometimes it's yourself I'm a data person I do financial model league I'm going to start up of course I want to see things go up into the right and 20x right that's a very natural perception bias that sometimes you don't even realize you're baking and so one of the good habits that you're always cultivating is first of all have a good friend have a good co-worker have a pair partner so that you're always bouncing information off each other gut checking it but also try and remain skeptical of the work here this is just sort of a good healthy attitude to have in any form of science data hacking or otherwise but when you're talking about quoting an answer to a company to your CEO to your CFO in startups this is not about getting a promotion and this isn't about advancing the company and this isn't about getting a big bonus at the end of the day startups are the business form of Darwinism survival of the fittest we make decisions and businesses are making decisions our data questions at data answers you're giving which may or may not put them out of business and it's your obligation to them to communicate situations we're using data is very good or your solutions are very good or vice-versa where your solutions are very bad because frankly your job depends on it and it's not the promotion that's the company goes out of business if you get the wrong answer so I want to talk a little bit about what's next for data science data hacking uber in general this is actually a promotion we did I wish I had DeLoreans every day by far and away the most the most successful promotion we ever ran with the nerdy crowd but data science is an incredibly ambiguous field as I was mentioning as 15 different people you'll get 15 different opinions about what a data science person is I only have three and a half minutes left there's no way I'm going to try and take a stab at nailing this thing down speaking historically you could argue that phone companies in the 70s and 80s were doing what is called data science modern banking was revolutionized in the 90 by something very similar to data side but really we're at right now is with the advent of cloud computing with the advent the advances were making it a processor design and computing horsepower you have a huge amount of relatively affordable computational horsepower made available to a segment of the technology user population in a way that just didn't exist three four or five years ago and I think what's happening is sort of the Baader consciousness is all of a sudden start to figure out oh man these are all the cool things I can do with it right that's that's kind of what's coryza bothered data sides mythology and so what I think and what I predicted is going to happen as this field matures coalesce this a little bit more is you're going to have specializations develop the concept just like you know if you think 200 years ago every scientist was a philosopher right that's where the PhD comes from when we specialized in when there were sciences that there were physicists that there were nuclear physicist a cosmic physicist etc they designs is good follow the exact same suit but what I think is going to be really cool about that is whether whether the job is called data hacking I hope it is because then I get to be that guy who is like yeah I'm 10 years ahead of everybody else but whether whatever it's called this intersection of entrepreneurship and data is going to be huge is huge and will continue to be there as with any wave in business or otherwise people who are out there who are daring to take it out wrestle it to the ground start the next couple to build the decks dig are going to be right on the bleeding edge of it and being the person who does data sides and brings that skill set to object readership means you're going to be literally on the bleeding edge of the bleeding edge if it's interesting you of it if you get this pup desire vite you all let's get to hacking everybody thank you very much you
Info
Channel: TEDx Talks
Views: 41,449
Rating: undefined out of 5
Keywords: ted talks, tedx, tedx talk, ted x, ted, tedx talks, ted talk, TEDx
Id: FL9Y0YvNjq8
Channel Id: undefined
Length: 17min 11sec (1031 seconds)
Published: Fri Mar 21 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.