-
Six thousand miles of road,
-
600 miles of subway track,
-
400 miles of bike lanes,
-
and a half a mile of tram track,
-
if you've ever been to Roosevelt Island.
-
So these are the numbers that make up
the infrastructure of New York City.
-
These are the statistics
of our infrastructure.
-
They're the kind of numbers you can find
released in reports by city agencies.
-
For example, the Department
of Transportation will probably tell you
-
how many miles of road they maintain.
-
The MTA will boast how many miles
of subway track there are.
-
Most city agencies give us statistics.
-
This is from a report this year
-
from the Taxi and Limousine Commission,
-
where we learn that there's about
13,500 taxis here in New York City.
-
Pretty interesting, right?
-
But did you ever think about
where these numbers came from?
-
Because for these numbers to exist,
someone at the city agency
-
had to stop and say, "Hmm, here's a number
that somebody might want want to know."
-
"Here's a number
that our citizens want to know."
-
So they go back to their raw data,
-
they count, they add, they calculate,
-
and then they put out reports,
-
and those reports
will have numbers like this.
-
The problem is, how do they know
all of our questions?
-
We have lots of questions.
-
In fact, in some ways there's literally
an infinite number of questions
-
that we can ask about our city.
-
So the agencies can never keep up.
-
So the paradigm isn't exactly working,
and I think our policymakers realize that,
-
because in 2012, Mayor Bloomberg
signed into law what he called
-
the most ambitious and comprehensive
open data legislation in the country.
-
In a lot of ways, he's right.
-
In the last two years,
the city has released
-
a thousand data sets
on our open data portal,
-
and it's pretty awesome.
-
So you go and look at data like this,
-
and instead of just counting
the number of cabs,
-
we can start to ask different questions.
-
So I had a question.
-
When's rush hour in New York City?
-
I mean, it can be pretty bothersome.
When is rush hour exactly?
-
And I thought to myself,
well, these cabs aren't just numbers,
-
these are GPS recorders
driving around in our city streets
-
recording each and every ride they take.
-
There's data there,
-
and I looked at that data,
and I made a plot
-
of the average speed of taxis
in New York City throughout the day.
-
Well, you can see that from about midnight
to around 5:18 in the morning,
-
speed increases, and at that point,
things turn around,
-
and they get slower and slower and slower
until about 8:35 in the morning,
-
when they end up at around
11 and a half miles and hour.
-
The average taxi is going 11 and a half
miles per hour on our city streets,
-
and it turns out stays that way
-
for the entire day.
-
The entire day. (Laughter)
-
So I said to myself, I guess
there's no rush hour in New York City.
-
There's just a rush day.
-
Makes sense. But this is important
for a couple of reasons.
-
If you're a transportation planner,
this might be pretty interesting to know,
-
but if you want to get somewhere quickly,
-
you now know to set your alarm
for 4:45 in the morning, you're all set.
-
New York, right?
-
But there's a story behind this data.
-
This data wasn't
just available, it turns out.
-
It actually came from something called
a Freedom of Information law request,
-
or a FOILrequest.
-
This is a form you can find on the
Taxi and Limousines Commission website.
-
In order to access this data,
you need to go get this form,
-
fill it out, and they will notify you,
-
and a guy named Chris Wong
did exactly that.
-
Chris went down, and they told him,
"Just a bring a hard drive down,
-
a brand new hard drive,
bring it to our office,
-
leave it here for five hours,
we'll copy data and you take it back."
-
And that's where this data came from.
-
Now, Chris is the kind of guy
who wants to make the data public,
-
and so it ended up online for all to use,
and that's where this graph came from.
-
And the fact that it exists is amazing.
These GPS recorders, really cool.
-
But the fact that we have citizens
walking around with hard drives
-
picking up data from city agencies
to make it public,
-
but it was already kind of public,
you could get to it,
-
but it was "public,"
it wasn't public.
-
And we can do better than that as a city.
-
We don't need our citizens
walking around with hard drives.
-
Now, not every data set
is behind a FOIL request. Right?
-
So here is a map I made with the most
dangerous intersections in New York City
-
based on cyclist accidents.
-
So the red areas are more dangerous,
and what it shows is first,
-
the East side of Manhattan,
especially in the lower area of Manhattan,
-
has more cyclist accidents.
-
That might make sense,
because there are more cyclists
-
coming off the bridges there.
-
But there are other hotspots
worth studying.
-
There's Williamsburg.
There's Roosevelt Avenue in Queens.
-
And this is exactly the kind of data
we need for Vision Zero.
-
This is exactly what we're looking for.
-
But there's a story
behind this data as well.
-
This data didn't just appear.
-
How many of you guys know this logo?
-
Yeah, I see some shakes.
-
Have you ever tried to copy
and paste data out of a PDF
-
and make sense of it?
-
I see more shakes.
-
More of you tried copying and pasting
than knew the logo. I like that.
-
Well, so what happened is, the data
that you just saw was actually on a PDF.
-
In fact, hundreds and hundreds
and hundreds of pages of PDF
-
put out by our very own NYPD,
-
and in order to access it,
you would either have to copy and paste
-
for hundreds and hundreds of hours,
-
or you could be John Krause.
-
John Krause was like,
-
I'm not going to copy and paste this data.
I'm going to write a program.
-
It's called the NYPD Crash Data Band-Aid,
-
and it goes to the NYPD's website
and it would download PDFs.
-
Every day it would search:
if it found a PDF, it would download it
-
and then it would run
some PDF-scraping program,
-
and out would come the text,
-
and it would go on the Internet,
and then people could make maps like that.
-
And the fact that the data's here,
the fact that we have access to it
-
-- Every accident, by the way,
is a row in this table,
-
every single accident, you can imagine
how many PDFs that is --
-
the fact that we
have access to that is great,
-
but let's not release it in PDF form,
-
because then we're having our citizens
write PDF scrapers.
-
It's not the best use
of our citizens' time,
-
and we as a city can do better than that.
-
Now, the good news is that
the de Blasio Administration
-
actually recently released this data
a few months ago,
-
and so now we can
actually have access to it,
-
but there's a lot of data
still entombed in PDF.
-
For example, our crime data
is still only available in PDF.
-
And not just our crime data:
-
our own city budget.
-
Our city budget is own readable
right now in PDF form,
-
and it's not just us
that can't analyze it:
-
our own legislators
who vote for the budget
-
also only get it in PDF.
-
So our legislators cannot
analyze the budget
-
that they are voting for.
-
And I think as a city we can do
a little better than that as well.
-
Now, there's a lot of data
that's not hidden in PDFs.
-
This is an example of a map I made,
-
and this is the dirtiest waterways
in New York City.
-
Now, how do I measure dirty?
-
Well, it's kind of a little weird,
-
but I looked at the level
of fecal coliform,
-
which is a measurement of fecal matter
in each of our waterways.
-
The larger the circle,
the dirtier the water,
-
so the large circles are dirty water,
the small circles are cleaner.
-
What you see is, inland waterways,
-
this is all data that was sampled
by the city over the last five years,
-
and inland waterways are,
in general, dirtier.
-
That makes sense, right?
-
And the bigger circles are dirty.
And I learned a few things from this.
-
Number one: never swim in anything
that ends in "creek" or "canal."
-
But number two, I also found
the dirtiest waterway in New York City,
-
by this measure, one measure.
-
In Coney Island Creek, which is not
the Coney Island you swim in, luckily.
-
It's on the other side.
-
But Coney Island Creek, 94 percent
of samples taken over the last five years
-
have had fecal levels so high
-
that it would be against state law
to swim in the water.
-
And this is not the kind of fact
that you're going to see
-
boasted in a city report, right?
-
It's not going to be
the front page on nyc.gov.
-
You're not going to see it there,
-
but the fact that we can get
to that data is awesome.
-
But once again, it wasn't super-easy,
-
because this data was not
on the open data portal.
-
If you were to go to the open data portal,
-
you'd see just a snippet of it,
a year or a few months.
-
It was actually on the Department
of Environmental Protection's website.
-
And each one of these links is an Excel
sheet, and each Excel sheet is different.
-
Every heading is different:
you copy, paste, reorganize.
-
When you do, you can make maps
and that's great but once again,
-
we can do better than that
as a city, we can normalize things.
-
And we're getting there, because
there's this website that Socrata makes
-
called the Open Data Portal
on New York City.
-
This is where 1,100 data sets
that don't suffer
-
from the things I just told you live,
-
and that number is growing,
and that's great.
-
You can download data in any format,
be it CSV or PDF or Excel document.
-
Whatever you want,
you can download the data that way.
-
The problem is, once you do,
-
you will find that each agency
codes their addresses differently.
-
So one is street name,
intersection street,
-
street, borough, address, building,
building, address,
-
so once again, you're spending time,
even when we have this portal,
-
you're spending time
normalizing our address field.
-
And that's not the best use
of our citizens' time.
-
We can do better than that as a city.
-
We can standardize our addresses,
-
and if we do,
we can get more maps like this.
-
This is a map of fire hydrants
in New York City,
-
but not just any fire hydrants:
-
these are the top 250 grossing fire
hydrants in terms of parking tickets.
-
So I learned a few things from this map,
and I really like this map.
-
Number one, just don't park
on the Upper East Side.
-
Just don't. It doesn't matter where
you park, you will get a hydrant ticket.
-
Number two, I found the two highest
grossing hydrants in all of New York City,
-
and they're on the Lower East Side,
-
and they were bringing in
over $55,000 a year, a year,
-
in parking tickets.
-
And that seemed a little strange
to me when I noticed it,
-
so I did a little digging and it turns out
what you had is a hydrant
-
and then something called
a curb extension,
-
which is like a seven-foot
space to walk on,
-
and then a parking spot.
-
And so these cars came along,
and the hydrant,
-
"It's all the way over there, I'm fine,"
-
and there was actually a parking spot
painted there beautifully for them.
-
They would park there, and the NYPD
disagreed with this designation
-
and would ticket them.
-
And it wasn't just me
who found a parking ticket.
-
This is the Google Street
View Car driving by,
-
finding a same parking ticket.
-
So I wrote about this on my blog,
on I Quant NY, and the DOT responded,
-
and they said,
-
"While the DOT has not received
any complaints about this location,
-
we will review the roadway markings
and make any appropriate alterations."
-
And I thought to myself,
typical government response,
-
all right, moved on with my life.
-
But then, a few weeks later,
something incredible happened.
-
They repainted the spot,
-
and for a second I thought I saw
the future of open data,
-
because think about what happened here.
-
For five years, for five years,
-
this spot was being ticketed,
and it was confusing,
-
and then a citizen found something,
they told the city, and within a few weeks
-
the problem was fixed.
-
It's amazing, and a lot of people
see open data as being a watchdog.
-
It's not, it's about being a partner.
-
We can empower our citizens
to be better partners for government,
-
and it's not that hard.
-
All we need are a few changes.
-
If you're FOILing data,
-
if you're seeing your data
being FOILed over and over again,
-
let's release it to the public, that's
a sign that it should be made public.
-
And if you're a government agency
releasing a PDF,
-
let's pass legislation that requires you
to post it with the underlying data,
-
because that data
is coming from somewhere.
-
I don't know where, but it's
coming from somewhere,
-
and you can release it with the PDF.
-
And let's adopt and share
some open data standards.
-
Let's start with our addresses
here in New York City.
-
Let's just start
normalizing our addresses.
-
Because New York is a leader in open data.
-
Despite all this, we are absolutely
a leader in open data,
-
and if we start normalizing things,
and set an open data standard,
-
others will follow. The state will follow,
and maybe the federal government,
-
Other countries could follow,
-
and we're not that far off from a time
where you could write one program
-
and map information from 100 countries.
-
It's not science fiction.
We're actually quite close.
-
And by the way, who are we
empowering with this?
-
Because it's not just John Krause
and it's not just Chris Wong.
-
There are hundreds of meetups
going on in New York City right now,
-
active meetups.
-
There are thousands of people
attending these meetups.
-
These people are going after work
and on weekends,
-
and they're attending these meetups
to look at open data
-
and make our city a better place.
-
Groups like BetaNYC, who last week,
just last week released something
-
called citygram.nyc.
-
That allows you to subscribe
to 311 complaints
-
around your own home,
or around your office.
-
You put in your address,
you get local complaints.
-
And it's not just the tech community
that are after these things.
-
It's urban planners like
the students I teach at Pratt.
-
It's policy advocates, it's everyone,
-
it's citizens from a diverse
set of backgrounds.
-
And with some small, incremental changes,
-
we can unlock the passion
and the ability of our citizens
-
to harness open data
and make our city even better,
-
whether it's one data set,
or one parking spot at a time.
-
Thank you.
-
(Applause)