Return to Video

How we found the worst place to park in New York City — using big data

  • 0:01 - 0:04
    Six thousand miles of road,
  • 0:04 - 0:06
    600 miles of subway track,
  • 0:06 - 0:08
    400 miles of bike lanes,
  • 0:08 - 0:09
    and a half a mile of tram track,
  • 0:09 - 0:11
    if you've ever been to Roosevelt Island.
  • 0:11 - 0:15
    So these are the numbers that make up
    the infrastructure of New York City.
  • 0:15 - 0:17
    These are the statistics
    of our infrastructure.
  • 0:17 - 0:21
    They're the kind of numbers you can find
    released in reports by city agencies.
  • 0:21 - 0:23
    For example, the Department
    of Transportation will probably tell you
  • 0:23 - 0:24
    how many miles of road they maintain.
  • 0:24 - 0:26
    The MTA will boast how many miles
    of subway track there are.
  • 0:26 - 0:29
    Most city agencies give us statistics.
  • 0:29 - 0:31
    This is from a report this year
  • 0:31 - 0:32
    from the Taxi and Limousine Commission,
  • 0:32 - 0:36
    where we learn that there's about
    13,500 taxis here in New York City.
  • 0:36 - 0:39
    Pretty interesting, right?
  • 0:39 - 0:41
    But did you ever think about
    where these numbers came from?
  • 0:41 - 0:44
    Because for these numbers to exist,
    someone at the city agency
  • 0:44 - 0:47
    had to stop and say, "Hmm, here's a number
    that somebody might want want to know."
  • 0:47 - 0:51
    So they go back to their raw data,
  • 0:51 - 0:53
    they count, they add, they calculate,
  • 0:53 - 0:55
    and then they put out reports,
  • 0:55 - 0:57
    and those reports
    will have numbers like this.
  • 0:57 - 1:00
    The problem is, how do they know
    all of our questions?
  • 1:00 - 1:02
    We have lots of questions.
  • 1:02 - 1:04
    In fact, in some ways there's literally
    an infinite number of questions
  • 1:04 - 1:06
    that we can ask about our city.
  • 1:06 - 1:08
    So the agencies can never keep up.
  • 1:08 - 1:12
    So the paradigm isn't exactly working,
    and I think our policymakers realize that,
  • 1:12 - 1:16
    because in 2012, Mayor Bloomberg
    signed into law what he called
  • 1:16 - 1:20
    the most ambitious and comprehensive
    open data legislation in the country.
  • 1:20 - 1:21
    In a lot of ways, he's right.
  • 1:21 - 1:23
    In the last two years,
    the city has released
  • 1:23 - 1:26
    a thousand data sets
    on our open data portal,
  • 1:26 - 1:28
    and it's pretty awesome.
  • 1:28 - 1:30
    So you go and look at data like this,
    and instead of just counting
  • 1:30 - 1:32
    the number of cabs,
  • 1:32 - 1:33
    we can start to ask different questions.
  • 1:33 - 1:35
    So I had a question.
  • 1:35 - 1:36
    When's rush hour in New York City?
  • 1:36 - 1:39
    I mean, it can be pretty bothersome.
    When is rush hour exactly?
  • 1:39 - 1:41
    And I thought to myself,
    well, these cabs aren't just numbers,
  • 1:41 - 1:44
    these are GPS recorders
    driving around in our city streets
  • 1:44 - 1:46
    recording each and every ride they take.
  • 1:46 - 1:48
    There's data there,
  • 1:48 - 1:50
    and I looked at that data,
    and I made a plot
  • 1:50 - 1:52
    of the average speed of taxis
    in New York City throughout the day.
  • 1:52 - 1:56
    Well, you can see that from about midnight
    to around 5:18 in the morning,
  • 1:56 - 1:58
    speed increases, and at that point,
  • 1:58 - 2:00
    things turn around,
  • 2:00 - 2:04
    and they get slower and slower and slower
    until about 8:35 in the morning,
  • 2:04 - 2:06
    when they end up at around
    11 and a half miles and hour.
  • 2:06 - 2:09
    The average taxi is going 11 and a half
    miles per hour on our city streets,
  • 2:09 - 2:11
    and it turns out stays that way
  • 2:11 - 2:15
    for the entire day.
  • 2:15 - 2:17
    The entire day. (Laughter)
  • 2:17 - 2:19
    So I said to myself, I guess
    there's no rush hour in New York City.
  • 2:19 - 2:22
    There's just a rush day.
  • 2:22 - 2:24
    Makes sense. But this is important
    for a couple of reasons.
  • 2:24 - 2:28
    If you're a transportation planner,
    this might be pretty interesting to know,
  • 2:28 - 2:29
    but if you want to get somewhere quickly,
  • 2:29 - 2:33
    you now know to set your alarm
    for 4:45 in the morning, you're all set.
  • 2:33 - 2:35
    New York, right?
  • 2:35 - 2:36
    But there's a story behind this data.
  • 2:36 - 2:38
    This data wasn't
    just available, it turns out.
  • 2:38 - 2:41
    It actually came from something called
    a Freedom of Information law request,
  • 2:41 - 2:42
    or a FOILrequest.
  • 2:42 - 2:45
    This is a form you can find on the
    Taxi and Limousines Commission website.
  • 2:45 - 2:48
    In order to access this data,
    you need to go get this form,
  • 2:48 - 2:50
    fill it out, and they will notify you,
  • 2:50 - 2:53
    and a guy named Chris Wong
    did exactly that.
  • 2:53 - 2:56
    Chris went down, and they told him,
    "Just a bring a hard drive down,
  • 2:56 - 2:58
    a brand new hard drive,
    bring it to our office,
  • 2:58 - 3:01
    leave it here for five hours,
    we'll copy data and you take it back."
  • 3:01 - 3:03
    And that's where this data came from.
  • 3:03 - 3:06
    Now, Chris is the kind of guy
    who wants to make the data public,
  • 3:06 - 3:10
    and so it ended up online for all to use,
    and that's where this graph came from.
  • 3:10 - 3:13
    And the fact that it exists is amazing.
    These GPS recorders, really cool.
  • 3:13 - 3:16
    But the fact that we have citizens
    walking around with hard drives
  • 3:16 - 3:18
    picking up data from city agencies
    to make it public,
  • 3:18 - 3:21
    where it was already kind of public,
    you could get to it, but it was "public,"
  • 3:21 - 3:23
    but it wasn't public.
  • 3:23 - 3:25
    And we can do better
    than that as a city, right?
  • 3:25 - 3:27
    We don't need our citizens
    walking around with hard drives.
  • 3:27 - 3:31
    Now, not every data set
    is behind a FOIL request. Right?
  • 3:31 - 3:34
    So here is a map I made with the most
    dangerous intersections in New York City
  • 3:34 - 3:36
    based on cyclist accidents.
  • 3:36 - 3:39
    So the red areas are more dangerous,
    and what it shows is first,
  • 3:39 - 3:43
    the East side of Manhattan,
    especially in the lower area of Manhattan,
  • 3:43 - 3:45
    has more cyclist accidents.
  • 3:45 - 3:47
    That might make sense,
    because there are more cyclists
  • 3:47 - 3:48
    coming off the bridges there.
  • 3:48 - 3:50
    But there are other hotspots
    worth studying, right?
  • 3:50 - 3:53
    There's Williamsburg.
    There's Roosevelt Avenue in Queens.
  • 3:53 - 3:56
    And this is exactly the kind of data
    we need for Vision Zero.
  • 3:56 - 3:57
    This is exactly what we're looking for.
  • 3:57 - 4:00
    But there's a story
    behind this data as well.
  • 4:00 - 4:02
    This data didn't just appear.
  • 4:02 - 4:04
    How many of you guys know this logo?
  • 4:04 - 4:06
    Yeah, I see some shakes.
  • 4:06 - 4:08
    Have you ever tried to copy
    and paste data out of a PDF
  • 4:08 - 4:10
    and make sense of it?
  • 4:10 - 4:11
    I see more shakes.
  • 4:11 - 4:14
    More of you tried copying and pasting
    than knew the logo. I like that.
  • 4:14 - 4:17
    Well, so what happened is, the data
    that you just saw was actually on a PDF.
  • 4:17 - 4:21
    In fact, hundreds and hundreds
    and hundreds of pages of PDF
  • 4:21 - 4:23
    put out by our very own NYPD,
  • 4:23 - 4:25
    and in order to access it,
    you would either have to copy and paste
  • 4:25 - 4:27
    for hundreds and hundreds of hours,
  • 4:27 - 4:29
    or you could be John Krause.
  • 4:29 - 4:31
    John Krause was like,
    I'm not going to copy and paste this data.
  • 4:31 - 4:33
    I'm going to write a program.
  • 4:33 - 4:36
    It's called the NYPD Crash Data Band-Aid,
  • 4:36 - 4:39
    and it goes to the NYPD's website
    and it would download PDFs.
  • 4:39 - 4:42
    Every day it would search:
    if it found a PDF, it would download it
  • 4:42 - 4:45
    and then it would run
    some PDF-scraping program,
  • 4:45 - 4:46
    and out would come the text,
  • 4:46 - 4:49
    and it would go on the Internet,
    and then people could make maps like that.
  • 4:49 - 4:52
    And the fact that the data's here,
    once again, the fact that we have access to it
  • 4:52 - 4:54
    -- Every accident, by the way,
    is a row in this table,
  • 4:54 - 4:57
    every single accident, you can imagine
    how many PDFs that is --
  • 4:57 - 4:59
    the fact that we
    have access to that is great,
  • 4:59 - 5:01
    but let's not release it in PDF form,
  • 5:01 - 5:04
    because then we're having our citizens
    write PDF scrapers.
  • 5:04 - 5:05
    It's not the best use
    of our citizens' time,
  • 5:05 - 5:08
    and we as a city can do better than that.
  • 5:08 - 5:11
    Now, the good news is that
    the de Blasio Administration
  • 5:11 - 5:14
    actually recently released this data
    a few months ago,
  • 5:14 - 5:15
    and so now we can
    actually have access to it,
  • 5:15 - 5:18
    but there's a lot of data
    still entombed in PDF.
  • 5:18 - 5:22
    For example, our crime data
    is still only available in PDF.
  • 5:22 - 5:23
    And not just our crime data:
  • 5:23 - 5:25
    our own city budget.
  • 5:25 - 5:29
    Our city budget is own readable
    right now in PDF form,
  • 5:29 - 5:31
    and it's not just us
    that can't analyze it:
  • 5:31 - 5:34
    our own legislators
    who vote for the budget
  • 5:34 - 5:36
    also only get it in PDF.
  • 5:36 - 5:38
    So our legislators cannot
    analyze the budget
  • 5:38 - 5:40
    that they are voting for.
  • 5:40 - 5:43
    And I think as a city we can do
    a little better than that as well.
  • 5:43 - 5:45
    Now, there's a lot of data
    that's not hidden in PDFs.
  • 5:45 - 5:47
    This is an example of a map I made,
  • 5:47 - 5:50
    and this is the dirtiest waterways
    in New York City.
  • 5:50 - 5:52
    Now, how do I measure dirty?
  • 5:52 - 5:54
    Well, it's kind of a little weird,
  • 5:54 - 5:56
    but I looked at the level
    of fecal coliform,
  • 5:56 - 5:59
    which is a measurement of fecal matter
    in each of our waterways.
  • 5:59 - 6:03
    The larger the circle,
    the dirtier the water,
  • 6:03 - 6:06
    so the large circles are dirty water,
    the small circles are cleaner.
  • 6:06 - 6:08
    What you see is, inland waterways,
  • 6:08 - 6:11
    this is all data that was sampled
    by the city over the last five years,
  • 6:11 - 6:14
    and inland waterways are,
    in general, dirtier.
  • 6:14 - 6:15
    That makes sense, right?
  • 6:15 - 6:17
    And the bigger circles are dirty.
    And I learned a few things like this.
  • 6:17 - 6:22
    Number one: never swim in anything
    that ends in "creek" or "canal."
  • 6:22 - 6:26
    But number two, I also found
    the dirtiest waterway in New York City,
  • 6:26 - 6:28
    by this measure, one measure.
  • 6:28 - 6:31
    In Coney Island Creek, which is not
    the Coney Island you swim in, luckily.
  • 6:31 - 6:32
    It's on the other side.
  • 6:32 - 6:36
    But Coney Island Creek, 94 percent
    of samples taken over the last five years
  • 6:36 - 6:39
    have had fecal levels so high
  • 6:39 - 6:43
    that it would be against state law
    to swim in the water.
  • 6:43 - 6:44
    And this is not the kind of fact
    that you're going to see
  • 6:44 - 6:46
    boasted in a city report, right?
  • 6:46 - 6:48
    It's not going to be
    the front page on nyc.gov.
  • 6:48 - 6:50
    You're not going to see it there,
  • 6:50 - 6:52
    but the fact that we
    can get to that data is awesome.
  • 6:52 - 6:54
    But once again, it wasn't super-easy,
  • 6:54 - 6:56
    because this data was not
    on the open data portal.
  • 6:56 - 6:57
    If you were to go to the open data portal,
  • 6:57 - 6:59
    you'd see just a snippet of it,
  • 6:59 - 7:01
    a year or a few months.
  • 7:01 - 7:03
    It was actually on the Department
    of Environmental Protection's website.
  • 7:03 - 7:05
    And each one of these links
    is an Excel sheet,
  • 7:05 - 7:07
    and each Excel sheet is different.
  • 7:07 - 7:10
    Every heading is different:
    you copy, paste, reorganize, reorder.
  • 7:10 - 7:12
    And when you do, you can make maps,
    and that's great, but once again,
  • 7:12 - 7:15
    we can do better than that as a city,
    we can normalize things.
  • 7:15 - 7:18
    And we're getting there, because
    there's this website that Socrata makes
  • 7:18 - 7:20
    called the Open Data Portal
    on New York City.
  • 7:20 - 7:22
    This is where 1,100 data sets
    that don't suffer
  • 7:22 - 7:23
    from all those things
    I just told you live,
  • 7:23 - 7:25
    and that number is growing,
    and that's great.
  • 7:25 - 7:27
    You can download data
    in any format you want,
  • 7:27 - 7:30
    be it CSV or PDF if for some reason
    that's what you want, or Excel document.
  • 7:30 - 7:34
    Whatever you want,
    you can download the data that way.
  • 7:34 - 7:35
    The problem is, once you do,
  • 7:35 - 7:39
    you will find that each agency
    codes their addresses differently.
  • 7:39 - 7:42
    So one is street name, intersection street,
    street, borough, address, building,
  • 7:42 - 7:45
    building, address, and so once again,
    you're spending time,
  • 7:45 - 7:47
    even when we have this portal,
    you're spending time
  • 7:47 - 7:49
    normalizing our address field.
  • 7:49 - 7:51
    And I think that's not
    the best use of our citizens' time, right?
  • 7:51 - 7:53
    We can do better than that as a city.
  • 7:53 - 7:57
    We can standardize our addresses,
    and if we do, we can get more maps like this.
  • 7:57 - 8:00
    This is a map of fire hydrants
    in New York City,
  • 8:00 - 8:02
    but not just any fire hydrants:
  • 8:02 - 8:06
    these are the top 250 grossing fire
    hydrants in terms of parking tickets.
  • 8:06 - 8:11
    So I learned a few things from this map,
    and I really like this map.
  • 8:11 - 8:14
    Number one, just don't park
    on the Upper East Side.
  • 8:14 - 8:17
    Just don't. It doesn't matter where
    you park, you will get a hydrant ticket.
  • 8:17 - 8:22
    Number two, I found the two highest
    grossing hydrants in all of New York City,
  • 8:22 - 8:23
    and they're on the Lower East Side,
  • 8:23 - 8:27
    and they were bringing in
    over $55,000 a year, a year,
  • 8:27 - 8:29
    in parking tickets.
  • 8:29 - 8:31
    And that seemed a little strange
    to me when I noticed it,
  • 8:31 - 8:34
    so I did a little digging and it turns out
    what you had is a hydrant
  • 8:34 - 8:36
    and then something called
    a curb extension,
  • 8:36 - 8:38
    which is like a seven-foot
    space to walk on,
  • 8:38 - 8:40
    and then a parking spot.
  • 8:40 - 8:41
    And so these cars came along,
    and the hydrant,
  • 8:41 - 8:44
    "It's all the way over there, I'm fine,"
  • 8:44 - 8:46
    and there was actually a parking spot
    painted there beautifully for them.
  • 8:46 - 8:49
    They would park there, and the NYPD
    disagreed with this designation
  • 8:49 - 8:51
    and would ticket them.
  • 8:51 - 8:53
    And it wasn't just me who found
    a parking ticket, right?
  • 8:53 - 8:55
    This is the Google Street
    View Car driving by,
  • 8:55 - 8:57
    finding a same parking ticket.
  • 8:57 - 9:02
    So I wrote about this on my blog,
    on I Quant NY, and the DOT responded,
  • 9:02 - 9:07
    and they said, "While the DOT has not
    received any complaints about this location,
  • 9:07 - 9:11
    we will review the roadway markings
    and make any appropriate alterations."
  • 9:11 - 9:14
    And I thought to myself,
    typical government response,
  • 9:14 - 9:16
    all right, moved on with my life.
  • 9:16 - 9:20
    But then, a few weeks later,
    something incredible happened.
  • 9:20 - 9:22
    They repainted the spot,
  • 9:22 - 9:25
    and for a second I thought I saw
    the future of open data,
  • 9:25 - 9:27
    because think about what happened here.
  • 9:27 - 9:30
    For five years, for five years,
  • 9:30 - 9:33
    this spot was being ticketed,
    and it was confusing,
  • 9:33 - 9:36
    and then a citizen found something,
    they told the city, and within a few weeks
  • 9:36 - 9:38
    the problem was fixed.
  • 9:38 - 9:42
    It's amazing, and a lot of people
    see open data as being a watchdog.
  • 9:42 - 9:43
    It's not, it's about being a partner.
  • 9:43 - 9:46
    We can empower our citizens
    to be better partners for government,
  • 9:46 - 9:48
    and it's not that hard.
  • 9:48 - 9:49
    All we need are a few changes.
  • 9:49 - 9:53
    If you're FOILing data, if you're seeing
    your data being FOILed over and over again,
  • 9:53 - 9:56
    let's release it to the public, that's
    a sign that it should be made public.
  • 9:56 - 9:58
    And if we're going to release a PDF,
  • 9:58 - 9:59
    if you're a government agency
    releasing a PDF,
  • 9:59 - 10:03
    let's pass legislation that requires you
    to post it with the underlying data,
  • 10:03 - 10:05
    because that data
    is coming from somewhere.
  • 10:05 - 10:06
    I don't know where, but it's
    coming from somewhere,
  • 10:06 - 10:08
    and you can release it with the PDF.
  • 10:08 - 10:10
    And let's adopt and share
    some open data standards.
  • 10:10 - 10:12
    Let's start with our addresses
    here in New York City.
  • 10:12 - 10:14
    Let's just start
    normalizing our addresses.
  • 10:14 - 10:16
    Because you know what?
    New York is a leader in open data.
  • 10:16 - 10:19
    Despite all this, we are absolutely
    a leader in open data,
  • 10:19 - 10:21
    and if we start normalizing things,
    and we set an open data standard,
  • 10:21 - 10:24
    others will follow. The state will follow,
    and maybe the federal government,
  • 10:24 - 10:27
    and I know it's crazy,
    but other countries could follow,
  • 10:27 - 10:30
    and we're not that far off from a time
    where you could write one program
  • 10:30 - 10:33
    and map information from 100 countries.
  • 10:33 - 10:35
    It's not science fiction.
    We're actually quite close.
  • 10:35 - 10:37
    And by the way, who are we
    empowering with this?
  • 10:37 - 10:41
    Because it's not just John Krause
    and it's not just Chris Wong.
  • 10:41 - 10:43
    There are hundreds of meetups
    going around in New York City,
  • 10:43 - 10:45
    going on in New York City right now,
  • 10:45 - 10:46
    active meetups.
  • 10:46 - 10:48
    There are thousands of people
    attending these meetups.
  • 10:48 - 10:51
    These people are going after work
    and on weekends,
  • 10:51 - 10:53
    and they're attending these meetups
    to look at open data
  • 10:53 - 10:55
    and make our city a better place.
  • 10:55 - 10:58
    Groups like BetaNYC, who last week,
    just last week released something
  • 10:58 - 11:00
    called citygram.nyc.
  • 11:00 - 11:02
    That allows you to subscribe
    to 311 complaints
  • 11:02 - 11:04
    around your own home,
    or around your office.
  • 11:04 - 11:06
    You put in your address,
    you get local complaints.
  • 11:06 - 11:09
    And it's not just the tech community
    that are after these things, right?
  • 11:09 - 11:12
    It's urban planners like the students
    I teach at Pratt.
  • 11:12 - 11:14
    It's policy advocates, it's everyone,
  • 11:14 - 11:17
    it's citizens from a diverse
    set of backgrounds.
  • 11:17 - 11:19
    And with some small, incremental changes,
  • 11:19 - 11:22
    we can unlock the passion
    and the ability of our citizens
  • 11:22 - 11:25
    to harness open data
  • 11:25 - 11:26
    and make our city even better,
  • 11:26 - 11:29
    whether it's one data set,
    or one parking spot at a time.
  • 11:29 - 11:32
    Thank you.
  • 11:32 - 11:35
    (Applause)
Title:
How we found the worst place to park in New York City — using big data
Speaker:
Ben Wellington
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
11:48

English subtitles

Revisions Compare revisions