-
Not Synced
Six thousand miles of road,
-
Not Synced
600 miles of subway track,
-
Not Synced
400 miles of bike lanes,
-
Not Synced
and a half a mile of tram track,
-
Not Synced
if you've ever been to Roosevelt Island.
-
Not Synced
So these are the numbers that make up
the infrastructure of New York City.
-
Not Synced
These are the statistics
of our infrastructure.
-
Not Synced
They're the kind of numbers you can find
released in reports by city agencies.
-
Not Synced
For example, the Department
of Transportation will probably tell you
-
Not Synced
how many miles of road they maintain.
-
Not Synced
The MTA will boast how many miles
of subway track there are.
-
Not Synced
But most city agencies give us statistics.
-
Not Synced
This is from a report this year
-
Not Synced
from the Taxi and Limousine Commission,
-
Not Synced
where we learn that there's about,
13,500 taxis here in New York City.
-
Not Synced
Pretty interesting, right?
-
Not Synced
But did you ever think about
where these numbers came from?
-
Not Synced
Because for these numbers to exist,
someone at the city agency
-
Not Synced
had to stop and say, "Hmm, here's a number
that somebody might want want to know."
-
Not Synced
So they go back to their raw data,
-
Not Synced
they count, they add, they calculate,
-
Not Synced
and then they put out reports,
-
Not Synced
and those reports
will have numbers like this.
-
Not Synced
The problem is, how do they know
all of our questions?
-
Not Synced
We have lots of questions.
-
Not Synced
In fact, in some ways there's literally
an infinite number of questions
-
Not Synced
that we can ask about our city.
-
Not Synced
So the agencies can never keep up.
-
Not Synced
So the paradigm isn't exactly working,
and I think our policymakers realize that,
-
Not Synced
because in 2012, Mayor Bloomberg
signed into law what he called
-
Not Synced
the most ambitious and comprehensive
open data legislation in the country.
-
Not Synced
In a lot of ways, he's right.
-
Not Synced
In the last two years,
the city has released
-
Not Synced
a thousand data sets
on our open data portal,
-
Not Synced
and it's pretty awesome.
-
Not Synced
So you go and look at data like this,
and instead of just counting
-
Not Synced
the number of cabs,
-
Not Synced
we can start to ask different questions.
-
Not Synced
So I had a question.
-
Not Synced
When's rush hour in New York City?
-
Not Synced
I mean, it can be pretty bothersome.
When is rush hour exactly?
-
Not Synced
And I thought to myself,
well, these cabs aren't just numbers,
-
Not Synced
these are GPS recorders
driving around in our city streets
-
Not Synced
recording each and every ride they take.
-
Not Synced
There's data there,
-
Not Synced
and I looked at that data,
and I made a plot
-
Not Synced
of the average speed of taxis
in New York City throughout the day.
-
Not Synced
Well, you can see that from about midnight
to around 5:18 in the morning,
-
Not Synced
speed increases, and at that point,
-
Not Synced
things turn around,
-
Not Synced
and they get slower and slower and slower
until about 8:35 in the morning,
-
Not Synced
when they end up at around
11 and a half miles and hour.
-
Not Synced
The average taxi is going 11 and a half
miles per hour on our city streets,
-
Not Synced
and it turns out stays that way
-
Not Synced
for the entire day.
-
Not Synced
The entire day. (Laughter)
-
Not Synced
So I said to myself, I guess
there's no rush hour in New York City.
-
Not Synced
There's just a rush day.
-
Not Synced
Makes sense. But this is important
for a couple of reasons.
-
Not Synced
If you're a transportation planner,
this might be pretty interesting to know,
-
Not Synced
but if you want to get somewhere quickly,
-
Not Synced
you now know to set your alarm
for 4:45 in the morning, your'e all set.
-
Not Synced
New York, right?
-
Not Synced
But there's a story behind this data.
-
Not Synced
This data wasn't
just available, it turns out.
-
Not Synced
It actually came from something called
a Freedom of Information law request,
-
Not Synced
or a FOILrequest.
-
Not Synced
This is a form you can find on the
Taxi and Limousines Commission website.
-
Not Synced
In order to access this data,
you need to go get this form,
-
Not Synced
fill it out, and they will notify you,
-
Not Synced
and a guy named Chris Wong
did exactly that.
-
Not Synced
Chris went down, and they told him,
"Just a bring a hard drive down,
-
Not Synced
a brand new hard drive,
bring it to our office,
-
Not Synced
leave it here for five hours,
we'll copy data and you take it back."
-
Not Synced
And that's where this data came from.
-
Not Synced
Now, Chris is the kind of guy
who wants to make the data public,
-
Not Synced
and so it ended up online for all to use,
and that's where this graph came from.
-
Not Synced
And the fact that it exists is amazing.
These GPS recorders, really cool.
-
Not Synced
But the fact that we have citizens
walking around with hard drives
-
Not Synced
picking up data from city agencies
to make it public,
-
Not Synced
where it was already kind of public,
you could get to it, but it was "public,"
-
Not Synced
but it wasn't public.
-
Not Synced
And we can do better
than that as a city, right?
-
Not Synced
We don't need our citizens
walking around with hard drives.
-
Not Synced
Now, not every data set
is behind a FOIL request. Right?
-
Not Synced
So here is a map I made with the most
dangerous intersections in New York City,
-
Not Synced
based on cyclist accidents.
-
Not Synced
So the red areas are more dangerous,
and what it shows is first,
-
Not Synced
the East side of Manhattan,
especially in the lower area of Manhattan,
-
Not Synced
has more cyclist accidents.
-
Not Synced
That might make sense,
because there are more cyclists
-
Not Synced
coming off the bridges there.
-
Not Synced
But there are other hotspots
worth studying, right?
-
Not Synced
There's Williamsburg.
There's Roosevelt Avenue in Queens.
-
Not Synced
And this is exactly the kind of data
we need for Vision Zero.
-
Not Synced
This is exactly what we're looking for.
-
Not Synced
But there's a story
behind this data as well.
-
Not Synced
This data didn't just appear.
-
Not Synced
How many of you guys know this logo?
-
Not Synced
Yeah, I see some shakes.
-
Not Synced
Have you ever tried to copy
and paste data out of a PDF
-
Not Synced
and make sense with it?
-
Not Synced
I see more shakes.
-
Not Synced
More of you tried copying and pasting
than knew the logo. I like that.
-
Not Synced
Well, so what happened is, the data
that you just saw was actually on a PDF.
-
Not Synced
In fact, hundreds and hundreds
and hundreds of pages of PDF
-
Not Synced
put out by our very own NYPD,
-
Not Synced
and in order to access it,
you would either have to copy and paste
-
Not Synced
for hundreds and hundreds of hours,
-
Not Synced
or you could be John Krause.
-
Not Synced
John Krause was like,
I'm not going to copy and paste this data.
-
Not Synced
I'm going to write a program.
-
Not Synced
It's called the NYPD Crash Data Band-Aid,
-
Not Synced
and it goes to the NYPD's website
and it would download PDFs.
-
Not Synced
Every day it would search:
if it found a PDF, it would download it
-
Not Synced
and then it would run
some PDF-scraping program,
-
Not Synced
and out would come the text,
-
Not Synced
and it would go on the Internet,
and then people could make maps like that.
-
Not Synced
And the fact that the data's here,
once again, the fact that we have access to it
-
Not Synced
-- Every accident, by the way,
is a row in this table,
-
Not Synced
every single accident, you can imagine
how many PDFs that is --
-
Not Synced
the fact that we
have access to that is great,
-
Not Synced
but let's not release it in PDF form,
-
Not Synced
because then we're having our citizens
write PDF scrapers.
-
Not Synced
It's not the best use
of our citizens' time,
-
Not Synced
and we as a city can do better than that.
-
Not Synced
Now, the good news is that
the de Blasio Administration
-
Not Synced
actually recently released this data
a few months ago,
-
Not Synced
and so now we can
actually have access to it,
-
Not Synced
but there's a lot of data
still entombed in PDF.
-
Not Synced
For example, our crime data
is still only available in PDF.
-
Not Synced
And not just our crime data:
-
Not Synced
our own city budget.
-
Not Synced
Our city budget is own readable
right now in PDF form,
-
Not Synced
and it's not just us
that can't analyze it:
-
Not Synced
our own legislators
who vote for the budget
-
Not Synced
also only get it in PDF.
-
Not Synced
So our legislators cannot
analyze the budget
-
Not Synced
that they are voting for.
-
Not Synced
And I think as a city we can do
a little better than that as well.
-
Not Synced
Now, there's a lot of data
that's not hidden in PDFs.
-
Not Synced
This is an example of a map I made,
-
Not Synced
and this is the dirtiest waterways
in New York City.
-
Not Synced
Now, how do I measure dirty?
-
Not Synced
Well, it's kind of a little weird,
-
Not Synced
but I looked at the level
of fecal coliform