-
Not Synced
Six thousand miles of road,
-
Not Synced
600 miles of subway track,
-
Not Synced
400 miles of bike lanes,
-
Not Synced
and a half a mile of tram track,
-
Not Synced
if you've ever been to Roosevelt Island.
-
Not Synced
So these are the numbers that make up
the infrastructure of New York City.
-
Not Synced
These are the statistics
of our infrastructure.
-
Not Synced
They're the kind of numbers you can find
released in reports by city agencies.
-
Not Synced
For example, the Department
of Transportation will probably tell you
-
Not Synced
how many miles of road they maintain.
-
Not Synced
The MTA will boast how many miles
of subway track there are.
-
Not Synced
But most city agencies give us statistics.
-
Not Synced
This is from a report this year
-
Not Synced
from the Taxi and Limousine Commission,
-
Not Synced
where we learn that there's about,
13,500 taxis here in New York City.
-
Not Synced
Pretty interesting, right?
-
Not Synced
But did you ever think about
where these numbers came from?
-
Not Synced
Because for these numbers to exist,
someone at the city agency
-
Not Synced
had to stop and say, "Hmm, here's a number
that somebody might want want to know."
-
Not Synced
So they go back to their raw data,
-
Not Synced
they count, they add, they calculate,
-
Not Synced
and then they put out reports,
-
Not Synced
and those reports
will have numbers like this.
-
Not Synced
The problem is, how do they know
all of our questions?
-
Not Synced
We have lots of questions.
-
Not Synced
In fact, in some ways there's literally
an infinite number of questions
-
Not Synced
that we can ask about our city.
-
Not Synced
So the agencies can never keep up.
-
Not Synced
So the paradigm isn't exactly working,
and I think our policymakers realize that,
-
Not Synced
because in 2012, Mayor Bloomberg
signed into law what he called
-
Not Synced
the most ambitious and comprehensive
open data legislation in the country.
-
Not Synced
In a lot of ways, he's right.
-
Not Synced
In the last two years,
the city has released
-
Not Synced
a thousand data sets
on our open data portal,
-
Not Synced
and it's pretty awesome.
-
Not Synced
So you go and look at data like this,
and instead of just counting
-
Not Synced
the number of cabs,
-
Not Synced
we can start to ask different questions.
-
Not Synced
So I had a question.
-
Not Synced
When's rush hour in New York City?
-
Not Synced
I mean, it can be pretty bothersome.
When is rush hour exactly?
-
Not Synced
And I thought to myself,
well, these cabs aren't just numbers,
-
Not Synced
these are GPS recorders
driving around in our city streets
-
Not Synced
recording each and every ride they take.
-
Not Synced
There's data there,
-
Not Synced
and I looked at that data,
and I made a plot
-
Not Synced
of the average speed of taxis
in New York City throughout the day.
-
Not Synced
Well, you can see that from about midnight
to around 5:18 in the morning,
-
Not Synced
speed increases, and at that point,
-
Not Synced
things turn around,
-
Not Synced
and they get slower and slower and slower
until about 8:35 in the morning,
-
Not Synced
when they end up at around
11 and a half miles and hour.
-
Not Synced
The average taxi is going 11 and a half
miles per hour on our city streets,
-
Not Synced
and it turns out stays that way
-
Not Synced
for the entire day.
-
Not Synced
The entire day. (Laughter)
-
Not Synced
So I said to myself, I guess
there's no rush hour in New York City.
-
Not Synced
There's just a rush day.
-
Not Synced
Makes sense. But this is important
for a couple of reasons.
-
Not Synced
If you're a transportation planner,
this might be pretty interesting to know,
-
Not Synced
but if you want to get somewhere quickly,
-
Not Synced
you now know to set your alarm
for 4:45 in the morning, your'e all set.
-
Not Synced
New York, right?
-
Not Synced
But there's a story behind this data.
-
Not Synced
This data wasn't
just available, it turns out.
-
Not Synced
It actually came from something called
a Freedom of Information law request,
-
Not Synced
or a FOILrequest.
-
Not Synced
This is a form you can find on the
Taxi and Limousines Commission website.
-
Not Synced
In order to access this data,
you need to go get this form,
-
Not Synced
fill it out, and they will notify you,
-
Not Synced
and a guy named Chris Wong
did exactly that.
-
Not Synced
Chris went down, and they told him,
"Just a bring a hard drive down,
-
Not Synced
a brand new hard drive,
bring it to our office,
-
Not Synced
leave it here for five hours,
we'll copy data and you take it back."
-
Not Synced
And that's where this data came from.
-
Not Synced
Now, Chris is the kind of guy
who wants to make the data public,
-
Not Synced
and so it ended up online for all to use,
and that's where this graph came from.
-
Not Synced
And the fact that it exists is amazing.
These GPS recorders, really cool.
-
Not Synced
But the fact that we have citizens
walking around with hard drives
-
Not Synced
picking up data from city agencies
to make it public,
-
Not Synced
where it was already kind of public,
you could get to it, but it was "public,"
-
Not Synced
but it wasn't public.
-
Not Synced
And we can do better
than that as a city, right?
-
Not Synced
We don't need our citizens
walking around with hard drives.
-
Not Synced
Now, not every data set
is behind a FOIL request. Right?
-
Not Synced
So here is a map I made with the most
dangerous intersections in New York City,
-
Not Synced
based on cyclist accidents.
-
Not Synced
So the red areas are more dangerous,
and what it shows is first,
-
Not Synced
the East side of Manhattan,
especially in the lower area of Manhattan,
-
Not Synced
has more cyclist accidents.
-
Not Synced
That might make sense,
because there are more cyclists
-
Not Synced
coming off the bridges there.
-
Not Synced
But there are other hotspots
worth studying, right?
-
Not Synced
There's Williamsburg.
There's Roosevelt Avenue in Queens.
-
Not Synced
And this is exactly the kind of data
we need for Vision Zero.
-
Not Synced
This is exactly what we're looking for.
-
Not Synced
But there's a story
behind this data as well.
-
Not Synced
This data didn't just appear.
-
Not Synced
How many of you guys know this logo?
-
Not Synced
Yeah, I see some shakes.
-
Not Synced
Have you ever tried to copy
and paste data out of a PDF
-
Not Synced
and make sense with it?
-
Not Synced
I see more shakes.
-
Not Synced
More of you tried copying and pasting
than knew the logo. I like that.
-
Not Synced
Well, so what happened is, the data
that you just saw was actually on a PDF.
-
Not Synced
In fact, hundreds and hundreds
and hundreds of pages of PDF
-
Not Synced
put out by our very own NYPD,
-
Not Synced
and in order to access it,
you would either have to copy and paste
-
Not Synced
for hundreds and hundreds of hours,
-
Not Synced
or you could be John Krause.
-
Not Synced
John Krause was like,
I'm not going to copy and paste this data.
-
Not Synced
I'm going to write a program.
-
Not Synced
It's called the NYPD Crash Data Band-Aid,
-
Not Synced
and it goes to the NYPD's website
and it would download PDFs.
-
Not Synced
Every day it would search:
if it found a PDF, it would download it
-
Not Synced
and then it would run
some PDF-scraping program,
-
Not Synced
and out would come the text,
-
Not Synced
and it would go on the Internet,
and then people could make maps like that.
-
Not Synced
And the fact that the data's here,
once again, the fact that we have access to it
-
Not Synced
-- Every accident, by the way,
is a row in this table,
-
Not Synced
every single accident, you can imagine
how many PDFs that is --
-
Not Synced
the fact that we
have access to that is great,
-
Not Synced
but let's not release it in PDF form,
-
Not Synced
because then we're having our citizens
write PDF scrapers.
-
Not Synced
It's not the best use
of our citizens' time,
-
Not Synced
and we as a city can do better than that.
-
Not Synced
Now, the good news is that
the de Blasio Administration
-
Not Synced
actually recently released this data
a few months ago,
-
Not Synced
and so now we can
actually have access to it,
-
Not Synced
but there's a lot of data
still entombed in PDF.
-
Not Synced
For example, our crime data
is still only available in PDF.
-
Not Synced
And not just our crime data:
-
Not Synced
our own city budget.
-
Not Synced
Our city budget is own readable
right now in PDF form,
-
Not Synced
and it's not just us
that can't analyze it:
-
Not Synced
our own legislators
who vote for the budget
-
Not Synced
also only get it in PDF.
-
Not Synced
So our legislators cannot
analyze the budget
-
Not Synced
that they are voting for.
-
Not Synced
And I think as a city we can do
a little better than that as well.
-
Not Synced
Now, there's a lot of data
that's not hidden in PDFs.
-
Not Synced
This is an example of a map I made,
-
Not Synced
and this is the dirtiest waterways
in New York City.
-
Not Synced
Now, how do I measure dirty?
-
Not Synced
Well, it's kind of a little weird,
-
Not Synced
but I looked at the level
of fecal coliform,
-
Not Synced
which is a measurement of fecal matter
in each of our waterways.
-
Not Synced
The larger the circle,
the dirtier the water,
-
Not Synced
so the large circles are dirty water,
the small circles are cleaner.
-
Not Synced
What you see is, inland waterways,
-
Not Synced
this is all data that was sampled
by the city over the last five years,
-
Not Synced
and inland waterways are,
in general, dirtier.
-
Not Synced
That makes sense, right?
-
Not Synced
And the bigger circles are dirty.
And I learned a few things like this.
-
Not Synced
Number one: never swim in anything
that ends in "creek" or "canal."
-
Not Synced
But number two, I also found
the dirtiest waterway in New York City,
-
Not Synced
by this measure, one measure.
-
Not Synced
In Coney Island Creek, which is not
the Coney Island you swim in, luckily.
-
Not Synced
It's on the other side.
-
Not Synced
But Coney Island Creek, 94 percent
of samples taken over the last five years
-
Not Synced
have had fecal levels so high
-
Not Synced
that it would be against state law
to swim in the water.
-
Not Synced
-
Not Synced
And this is not the kind of fact
that you're going to see
-
Not Synced
boasted in a city report, right?
-
Not Synced
It's not going to be
the front page on nyc.gov.
-
Not Synced
You're not going to see it there,
-
Not Synced
but the fact that we
can get to that data is awesome.
-
Not Synced
But once again, it wasn't super-easy,
-
Not Synced
because this data was not
on the open data portal.
-
Not Synced
If you were to go to the open data portal,
-
Not Synced
you'd see just a snippet of it,
-
Not Synced
a year or a few months.
-
Not Synced
It was actually on the Department
of Environmental Protection's website.
-
Not Synced
And each one of these links
is an Excel sheet,
-
Not Synced
and each Excel sheet is different.
-
Not Synced
Every heading is different:
you copy, paste, reorganize, reorder.
-
Not Synced
And when you do, you can make maps,
and that's great, but once again,
-
Not Synced
we can do better than that as a city,
we can normalize things.
-
Not Synced
And we're getting there, because
there's this website that Socrata makes
-
Not Synced
called the Open Data Portal
on New York City.
-
Not Synced
This is where 1,100 data sets
that don't suffer
-
Not Synced
from all those things
I just told you live,
-
Not Synced
and that number is growing,
and that's great.
-
Not Synced
You can download data
in any format you want,
-
Not Synced
be it CSV or PDF if for some reason
that's what you want, or Excel document.
-
Not Synced
Whatever you want,
you can download the data that way.
-
Not Synced
The problem is, once you do,
-
Not Synced
you will find that each agency
codes their addresses differently.
-
Not Synced
So one is street name, intersection street,
street, borough, address, building,
-
Not Synced
building, address, and so once again,
you're spending time,
-
Not Synced
even when we have this portal,
you're spending time
-
Not Synced
normalizing our address field.
-
Not Synced
And I think that's not
the best use of our citizens' time, right?
-
Not Synced
We can do better than that as a city.
-
Not Synced
We can standardize our addresses,
and if we do, we can get more maps like this.
-
Not Synced
This is a map of fire hydrants
in New York City,
-
Not Synced
but not just any fire hydrants:
-
Not Synced
these are the top 250 grossing fire
hydrants in terms of parking tickets.
-
Not Synced
So I learned a few things from this map,
and I really like this map.
-
Not Synced
Number one, just don't park
on the Upper East Side.
-
Not Synced
Just don't. It doesn't matter where
you park, you will get a hydrant ticket.
-
Not Synced
Number two, I found the two highest
grossing hydrants in all of New York City,
-
Not Synced
and they're on the Lower East Side,
-
Not Synced
and they were bringing in
over $55,000 a year, a year,
-
Not Synced
in parking tickets.
-
Not Synced
And that seemed a little strange
to me when I noticed it,
-
Not Synced
so I did a little digging and it turns out
what you had is a hydrant
-
Not Synced
and then something called
a curb extension,
-
Not Synced
which is like a seven-foot
space to walk on,
-
Not Synced
and then a parking spot.
-
Not Synced
And so these cars came along,
and the hydrant,
-
Not Synced
"It's all the way over there, I'm fine,"
-
Not Synced
and there was actually a parking spot
painted there beautifully for them.
-
Not Synced
They would park there, and the NYPD
disagreed with this designation
-
Not Synced
and would ticket them.
-
Not Synced
And it wasn't just me who found
a parking ticket, right?
-
Not Synced
This is the Google Street
View Car driving by,
-
Not Synced
finding a same parking ticket.
-
Not Synced
So I wrote about this on my blog,
on I Quant NY, and the DOT responded,
-
Not Synced
and they said, "While the DOT has not
received any complaints about this location,
-
Not Synced
we will review the roadway markings
and make any appropriate alterations."
-
Not Synced
And I thought to myself,
typical government response,
-
Not Synced
all right, moved on with my life.
-
Not Synced
But then, a few weeks later,
something incredible happened.
-
Not Synced
They repainted the spot,
-
Not Synced
and for a second I thought I saw
the future of open data,
-
Not Synced
because think about what happened here.
-
Not Synced
For five years, five years,
-
Not Synced
this spot was being ticketed,
and it was confusing,
-
Not Synced
and then a citizen found something,
they told the city, and within a few weeks
-
Not Synced
the problem was fixed.
-
Not Synced
It's amazing, and a lot of people
see open data as being a watchdog.
-
Not Synced
It's not, it's about being a partner.
-
Not Synced
We can empower our citizens
to be better partners for government,
-
Not Synced
and it's not hard.
-
Not Synced
All we need are a few changes.
-
Not Synced
If you're FOILing data, if you're seeing
your data being FOILed over and over again,
-
Not Synced
let's release it to the public, that's
a sign that it should be made public.
-
Not Synced
And if we're going to release a PDF,
-
Not Synced
if you're a government agency
releasing a PDF,
-
Not Synced
let's pass legislation that requires you
to post it with the underlying data,
-
Not Synced
because that data
is coming from somewhere.
-
Not Synced
I don't know where, but it's
coming from somewhere,
-
Not Synced
and you can release it with the PDF.
-
Not Synced
And let's adopt and share
some open data standards.
-
Not Synced
Let's start with our addresses
here in New York City.
-
Not Synced
Let's just start
normalizing our addresses.
-
Not Synced
Because you know what?
New York is a leader in open data.
-
Not Synced
Despite all this, we are absolutely
a leader in open data,
-
Not Synced
and if we start normalizing things,
and we set an open data standard,
-
Not Synced
others will follow. The state will follow,
and maybe the federal government,
-
Not Synced
and I know it's crazy,
but other countries could follow,
-
Not Synced
and we're not that far off from a time
where you could write one program
-
Not Synced
and map information from 100 countries.
-
Not Synced
It's not science fiction.
We're actually quite close.
-
Not Synced
And by the way, who are we
empowering with this?
-
Not Synced
Because it's not just John Krause
and it's not just Chris Wong.
-
Not Synced
There are hundreds of meetups
going around in New York City,
-
Not Synced
going on in New York City right now,
-
Not Synced
active meetups.
-
Not Synced
There are thousands of people
attending these meetups.
-
Not Synced
These people are going after work
and on weekends,
-
Not Synced
and they're attending these meetups
to look at open data
-
Not Synced
and make our city a better place.
-
Not Synced
Groups like BetaNYC, who last week,
just last week released something
-
Not Synced
called citygram.nyc.
-
Not Synced
That allows you to subscribe
to 311 complaints
-
Not Synced
around your own home,
or around your office.
-
Not Synced
You put in your address,
you get local complaints.
-
Not Synced
And it's not just the tech community
that are after these things, right?
-
Not Synced
It's urban planners like the students
I teach at Pratt.
-
Not Synced
It's policy advocates, it's everyone,
-
Not Synced
it's citizens from a diverse
set of backgrounds.
-
Not Synced
And with some small, incremental changes,
-
Not Synced
we can unlock the passion
and the ability of our citizens
-
Not Synced
to harness open data
-
Not Synced
and make our city even better,
-
Not Synced
whether it's one data set,
or one parking spot at a time.
-
Not Synced
Thank you.
-
Not Synced
(Applause)