Translator: Timothy Duffy
Reviewer: Denise RQ Hi everyone, as it says, I am Ben Wellington,
and I am a data story teller. A data story teller. If you had asked me a year ago
what a data story teller was I would probably say I have no idea. So today, I am going to tell you
about my journey over the last year, where I have accidentally became
a data story teller. I'll tell you about what
I learned along the way, and maybe convince some of you
that you too can be data story tellers if you are curious and want to. A little about my background. First, I work at an investment
and tech company called Two Sigma where I do data science. So that is sort of one part of my world. But I also married an urban planner, so I've got my computer science
and the urban planning world, and for most of the time, these two sections of my life
have been pretty separate, and that was just the way it was. Until something interesting
happened here in NYC, in 2011, then Mayor Bloomberg
signed this legislation called the Open Data Laws in New York. The Open Data Laws are really exciting
for people like me because it takes data that is inside City Government,
and suddenly allows anyone to look at it. Whereas before the government
would analyze something and tell us, "Hey, this neighborhood has
this many accidents," now we can see data point by data point
what is happening at a very local level. When these two things
came together, they ended up-- by the way, there was an open data portal
I should point out, that anyone can go to, it is the NYC open data portal, and there are data sets
on all sorts of things. In fact, there is one on the size of the televisions in Times Square
and their locations. I don't know what to do with that,
but it's really cool. There are data sets
of all different types, in fact over 1,200 data sets so far. And it is growing all the time. I kind of took this data science work, and my interest in urban planning
from discussions with my wife and put it together in this blog,
called I QUANT NY. Awesome, thank you. One of the first things
I did was this map. And this is a map of cycling
injuries in New York City. (Laughter) Hilarious injuries-- Red areas are areas where people
were getting in more cyclists accidents. Then I found this through
some public data and mapped it all. I notice a few things: one,
that on the East Side of Manhattan is there were more cycling
accidents, more injuries, because that's where there are
more cyclists coming off the bridges. But also there were some other hot spots, like Williamsburg in Brooklyn,
or Roosevelt Avenue in Queens. I wrote about that
and posted it on the blog - and it was more just for me
to learn how to do mapping; it was this open source software
called QGIS, I wanted to learn it - and when I did, something
interesting happened, people started to write about it. Gothamist covered it, and Brokelyn claimed
that it was a "death trap", which is not exactly what I said. (Laughter) Streets Blog, and then,
even in The Atlantic. This is just sort of from a blog
that I put on Tumblr, and it had no followers,
and that was really interesting. Over the time from there I started to say,
"Why did people write about that? I'm not the first one to analyze cyclists
accidents or to make maps like this." What I did wasn't that complicated,
what was it that made it spread? And I thought about it,
and I worked over the next few posts, I got to see what was moving around, and I realized that there was a third part
to this that was really, really important. That is, you probably are not going
to see it coming: improv comedy. Yes, improv comedy. I have been doing improv since I was
in a summer camp called French Woods - Yeah! French Woods alumni! - in upstate New York. I did improv since I was about 13,
I've been doing it ever since. I have learned a lot of things in improv
I realized I was bringing into my writing, into this data science
to make people more interested in it. I think in order to spread science
you need to be able to tell the stories, so I am going to tell you
about why improv relates to data science, and how they can come together
to tell better stories. That is why I call it data story telling. First, in stories you want to connect
with people's experiences, right? If you are doing an improv scene, you learn that if you are
brushing your teeth next to your wife, something that people can relate to, how they look
when they're brushing their teeth, you can do a scene through that. People relate to it
because they have experiences. I tried to write about things
that New Yorkers experience, and I figured, "What do New Yorkers
experience more than Duane Reade?" (Laughter) Right? I thought this might be interesting,
it turned out it was. I mapped every single building
in New York to the closest pharmacy, and I colored it by the pharmacy. Orange is Duane Reade, red is CVS,
blue is Walgreens, yellow is Rite Aid. First of all, this neighborhood,
it's Duane Reade country. No question. I also learned that CVS and Rite Aid
are attacking from the water. (Laughter) Good strategy CVS,
come in from the Hudson. Duane Reade will not see you coming. (Laughter) I thought to myself, "Is Duane Reade
really a New York thing?" It turns out, no, it's a Manhattan thing. If we zoom out even more,
the Bronx is Rite Aid country. Brooklyn and Queens are patchwork. If you work for one of those companies
it's probably very interesting. For me, it's interesting just wondering
what our experiences are, and how we can quantify them
and tell stories of our lives. That's something we can all relate to, and that's part of story telling,
even in data analysis. I noticed that you want
to focus on a single idea. In an improv scene, you can try to have seven ideas going,
but things can get lost pretty quickly. In my work I try to focus on one idea,
and I went and looked at Citi Bike data. It is interesting, there are people
leaving and coming to stations, there is a lot of data. What if we take one idea,
and that idea is gender? Here, I mapped the percent of male
and female Citi Bike riders in NYC. What we can see is in this neighborhood,
over 80% of the riders are male. This is a very male-dominant
Citi Bike neighborhood. What does that tell us? It could be about
our transportation infrastructure. It could be a study
of gender here in the city. Also, if you are looking
to meet a girl on a Citi Bike, go to Brooklyn, that's important. (Laughter) The important thing here
is that it is one idea, it's just gender. There are many columns, it is
a big data set, but let's study just one. The other thing is keep it simple. Not just one idea, but one simple idea. Ideas can be very complex
and you find yourself in improv thinking, "This guy goes here, and this, and this,"
and you'll lose everybody very quickly. So I also try to keep it simple. So when people hear that I do math
they often think that I do this. But it is more like this. (Laughter) I mean, I just count things. I do sum, maybe I do a percentage. This is all just high school math,
it is not crazy math. People can really do this if they stop
and start to ask the questions. An example: I looked
at the percent of parking tickets from out-of-state plates
in every precinct in New York. We see in this neighborhood,
there is a higher percent of people coming in from out-of-state
and getting tickets. Which is telling; that's people driving
into Midtown more likely, and as we get farther out in our boroughs there are less travelers,
which also makes sense. I also wanted to do this per state.
So, first, I did New Jersey. Midtown, yes, it shows in the data that people from New Jersey
are driving into Midtown. Connecticut,
a completely different picture. Coming in from the north.
You can actually see it. They are going to
the Botanical Gardens in the Bronx. And my absolute favorite, Californians,
where do you think they hang out? The hippest parts of Brooklyn:
Williamsburg, Bushwick, Green Point, that is where people
from California get parking tickets. It can tell us so much about our city
by looking at our data. Also, explore the things
that you know best. You all come from different fields. You know the things that you know. You know the area you study
very well, the area you work in. I am learning New York,
I have lived here for over a decade, so I focus on New York. In an improv if you are a lawyer,
and you go to a scene as a lawyer, that scene will be good,
because you know all the vocabulary. You can play it to the very top
of your intelligence and just hit it. I tried with other cities, but it is hard
because I do not have any context. For today, I did an analysis
of Times Square, and I thought, "What can we all relate to?"
Maybe catching a cab. I was curious where people catch cabs
around Times Square. That is 8th Avenue over here,
you can see that people, in general, are catching cabs at 8th Ave.
when they are leaving this district. Less so heading south on 7th Ave. You can see that the big yellow blurb
in the lower left is the Port Authority, so that makes sense. But really people are heading
towards the avenues. What is interesting here
is that is where people catch cabs. Where do you think people get out of cabs? They get out more on cross streets.
It is much more of a grid. If you are catching a cab somewhere,
especially if you are a tourist, you give the person the address
and they bring you there. They're not dropping you off on the corner
and saying, "Good luck, buddy". That doesn't happen. What I really, really liked about this was depending on the direction of the street,
people got out of cabs differently. There is 7th Ave: if you come in
on 46th, you are going west to east, you seem to get off
on the west side of the avenue. Why? You ever get a cab stuck
in traffic, you just get out. You can actually see it in the data;
if you come in the other side, you can see people get out
on that side of the avenue. Depending on the direction of the street,
people are getting out of taxis because they are probably
waiting at lights. And this is interesting, right? If you do advertising in the district,
you might want to know where to welcome them,
where they get out most often. We can start to study
this with our public data. With this, you want to try
and make an impact. I've tried to make it in City Government,
by doing some of this work. Each of you have your own ways
you can make impacts. In particular, I did one proof,
a mathematical proof, that no matter how many times you ride
the subway and refill using their buttons, you cannot get a $0.00 balance. (Laughter) Literally, it is not just you,
or you, or you. You literally cannot get a $0.00 balance,
if you use their buttons. There is a trick, you can type
$19.05 and get a balance. When I wrote about this,
the MTA responded. I said, alright maybe this is an impact. And they said: "These machines do not hold
an infinite amount of change and the denominations are suggested
to insure there is ample change to accommodate customers
who pay with cash. That being said, we will certainly look
at this as part of the process involved in rolling out the next scheduled
fare increase [slated for next year]." (Laughter) So, the fare increase is coming. Imagine in March, where you say, "I want $20 on a MetroCard,"
and you get one. "How much would that be?"
And they say, "$18.43" And you pay. As opposed to now, "I'm going to pay $20". "We'll give you a random
amount above that." Imagine if we switched that,
we could run our city better. We'll see if the MTA follows through,
I'd love to make an impact there. I also found something strange,
which is in half of the city cabs, the tipping is based on
just the fare and the surcharge. So if you get into a Verifone cab
and you hit the 20% button, you are actually paying 20%
on top of the taxi fare, and a little bit of a surcharge. If it's run by Creative Mobile Technology,
the other half of the cabs, and you hit the 20% button,
you are paying on top of taxes and tolls. So for two different computers, you are paying more tips in one
of the computer set ups than the other, because it's calculating tip
on top of tolls. Is this all a big deal? Well, those drivers are making
$250 more a year in tips, by this little bit of rounding. We have half our cabs
where we are all paying a little more, and the drivers as well, which isn't
a bad thing but it is kind of inequitable. When I pointed this out
to the TLC, they said: "We appreciate the work
that went into this analysis, and we're giving it a thorough read." (Laughter) Impact. (Laughter) I'm working on it. And my favorite was this:
I mapped fire hydrants in New York City. These aren't just any fire hydrants. These are the fire hydrants mapped by the amount of parking ticket
revenue they're creating. These are the top 250 culprits in NYC. First, on the Upper East Side watch out, the 19th Precinct will ticket you
no matter where you park for a hydrant. More interesting were these two hydrants
that were down on the Lower East Side, and they were generating
$55,000 a year in tickets. Two hydrants, $55,000
for about 5 or 6 years. Finally, the data is public, I had a look, and when I went
to figure out what was happening, it turns out that there is basically
a hydrant, then a bike lane, and then a parking spot. So you go thinking you are not
in a front of the curb or the hydrant, there is a bike lane between us. It turns out that while the DOT painted
a parking spot, the NYPD disagreed. So they would ticket the spot
for years and years. This is actually a shot
from the Google StreetView car going by and it caught the ticket,
which I really appreciated. I wrote about that,
and I heard from the city again: "While DOT has not received
any complaints about this location, we will review the roadway markings
and make any appropriate alterations." That is an action statement,
we are getting better. I thought to myself, "Well Government,
I'll keep trying to make an impact". And, suddenly, they repainted the spot. Yeah! Impact! (Applause) Somebody is listening, this is great. You can look at your data
and you can have an impact. Sometimes, your message
doesn't get through, but where I know a statement was made, I know I've at least changed
some thinking in these agencies, that somebody thinks about those things. I consider those to be making
an impact as well. To do that, once again, I think you really need
to think about story telling, like connect with people,
try to convey one idea, keep it simple, and explore the things that you know best. In case you think
this whole data thing is not for me, that it is for computer people,
the open data portal is easy to use. I teach a class in statistics
at Pratt, for urban planners, and in our second day of class
one of my students turned this, which is a list of accidents
in the neighborhood around Pratt, this analysis of injuries by vehicle type. And this is just from one
or two classes using Excel, this is not some crazy programming thing. If you are scared of that, this is a data set of graffiti
at 311 complaints. And a student in a program
called City Term - which brings students
to learn about the city, it is a high school semester program - a student named Abby created this map. So if you are scared
of computers, that's OK, too. Understanding where graffiti is in NYC. It is not about being a computer person,
anyone can take data, you just have to know what questions
to ask and try to tell your own story. I just hope everyone realizes
they can be a data story teller. Thank you. (Applause)
r/datascience
For mobile and non-RES users | More info | -1 to Remove | Ignore Sub