Return to Video

How we found the worst place to park in New York City — using big data

  • Not Synced
    Six thousand miles of road,
  • Not Synced
    600 miles of subway track,
  • Not Synced
    400 miles of bike lanes,
  • Not Synced
    and a half a mile of tram track,
  • Not Synced
    if you've ever been to Roosevelt Island.
  • Not Synced
    So these are the numbers that make up
    the infrastructure of New York City.
  • Not Synced
    These are the statistics
    of our infrastructure.
  • Not Synced
    They're the kind of numbers you can find
    released in reports by city agencies.
  • Not Synced
    For example, the Department
    of Transportation will probably tell you
  • Not Synced
    how many miles of road they maintain.
  • Not Synced
    The MTA will boast how many miles
    of subway track there are.
  • Not Synced
    But most city agencies give us statistics.
  • Not Synced
    This is from a report this year
  • Not Synced
    from the Taxi and Limousine Commission,
  • Not Synced
    where we learn that there's about,
    13,500 taxis here in New York City.
  • Not Synced
    Pretty interesting, right?
  • Not Synced
    But did you ever think about
    where these numbers came from?
  • Not Synced
    Because for these numbers to exist,
    someone at the city agency
  • Not Synced
    had to stop and say, "Hmm, here's a number
    that somebody might want want to know."
  • Not Synced
    So they go back to their raw data,
  • Not Synced
    they count, they add, they calculate,
  • Not Synced
    and then they put out reports,
  • Not Synced
    and those reports
    will have numbers like this.
  • Not Synced
    The problem is, how do they know
    all of our questions?
  • Not Synced
    We have lots of questions.
  • Not Synced
    In fact, in some ways there's literally
    an infinite number of questions
  • Not Synced
    that we can ask about our city.
  • Not Synced
    So the agencies can never keep up.
  • Not Synced
    So the paradigm isn't exactly working,
    and I think our policymakers realize that,
  • Not Synced
    because in 2012, Mayor Bloomberg
    signed into law what he called
  • Not Synced
    the most ambitious and comprehensive
    open data legislation in the country.
  • Not Synced
    In a lot of ways, he's right.
  • Not Synced
    In the last two years,
    the city has released
  • Not Synced
    a thousand data sets
    on our open data portal,
  • Not Synced
    and it's pretty awesome.
  • Not Synced
    So you go and look at data like this,
    and instead of just counting
  • Not Synced
    the number of cabs,
  • Not Synced
    we can start to ask different questions.
  • Not Synced
    So I had a question.
  • Not Synced
    When's rush hour in New York City?
  • Not Synced
    I mean, it can be pretty bothersome.
    When is rush hour exactly?
  • Not Synced
    And I thought to myself,
    well, these cabs aren't just numbers,
  • Not Synced
    these are GPS recorders
    driving around in our city streets
  • Not Synced
    recording each and every ride they take.
  • Not Synced
    There's data there,
  • Not Synced
    and I looked at that data,
    and I made a plot
  • Not Synced
    of the average speed of taxis
    in New York City throughout the day.
  • Not Synced
    Well, you can see that from about midnight
    to around 5:18 in the morning,
  • Not Synced
    speed increases, and at that point,
  • Not Synced
    things turn around,
  • Not Synced
    and they get slower and slower and slower
    until about 8:35 in the morning,
  • Not Synced
    when they end up at around
    11 and a half miles and hour.
  • Not Synced
    The average taxi is going 11 and a half
    miles per hour on our city streets,
  • Not Synced
    and it turns out stays that way
  • Not Synced
    for the entire day.
  • Not Synced
    The entire day. (Laughter)
  • Not Synced
    So I said to myself, I guess
    there's no rush hour in New York City.
  • Not Synced
    There's just a rush day.
  • Not Synced
    Makes sense. But this is important
    for a couple of reasons.
  • Not Synced
    If you're a transportation planner,
    this might be pretty interesting to know,
  • Not Synced
    but if you want to get somewhere quickly,
  • Not Synced
    you now know to set your alarm
    for 4:45 in the morning, your'e all set.
  • Not Synced
    New York, right?
  • Not Synced
    But there's a story behind this data.
  • Not Synced
    This data wasn't
    just available, it turns out.
  • Not Synced
    It actually came from something called
    a Freedom of Information law request,
  • Not Synced
    or a FOILrequest.
  • Not Synced
    This is a form you can find on the
    Taxi and Limousines Commission website.
  • Not Synced
    In order to access this data,
    you need to go get this form,
  • Not Synced
    fill it out, and they will notify you,
  • Not Synced
    and a guy named Chris Wong
    did exactly that.
  • Not Synced
    Chris went down, and they told him,
    "Just a bring a hard drive down,
  • Not Synced
    a brand new hard drive,
    bring it to our office,
  • Not Synced
    leave it here for five hours,
    we'll copy data and you take it back."
  • Not Synced
    And that's where this data came from.
  • Not Synced
    Now, Chris is the kind of guy
    who wants to make the data public,
  • Not Synced
    and so it ended up online for all to use,
    and that's where this graph came from.
  • Not Synced
    And the fact that it exists is amazing.
    These GPS recorders, really cool.
  • Not Synced
    But the fact that we have citizens
    walking around with hard drives
  • Not Synced
    picking up data from city agencies
    to make it public,
  • Not Synced
    where it was already kind of public,
    you could get to it, but it was "public,"
  • Not Synced
    but it wasn't public.
  • Not Synced
    And we can do better
    than that as a city, right?
  • Not Synced
    We don't need our citizens
    walking around with hard drives.
  • Not Synced
    Now, not every data set
    is behind a FOIL request. Right?
  • Not Synced
    So here is a map I made with the most
    dangerous intersections in New York City,
  • Not Synced
    based on cyclist accidents.
  • Not Synced
    So the red areas are more dangerous,
    and what it shows is first,
  • Not Synced
    the East side of Manhattan,
    especially in the lower area of Manhattan,
  • Not Synced
    has more cyclist accidents.
  • Not Synced
    That might make sense,
    because there are more cyclists
  • Not Synced
    coming off the bridges there.
  • Not Synced
    But there are other hotspots
    worth studying, right?
  • Not Synced
    There's Williamsburg.
    There's Roosevelt Avenue in Queens.
  • Not Synced
    And this is exactly the kind of data
    we need for Vision Zero.
  • Not Synced
    This is exactly what we're looking for.
  • Not Synced
    But there's a story
    behind this data as well.
  • Not Synced
    This data didn't just appear.
  • Not Synced
    How many of you guys know this logo?
  • Not Synced
    Yeah, I see some shakes.
  • Not Synced
    Have you ever tried to copy
    and paste data out of a PDF
  • Not Synced
    and make sense with it?
  • Not Synced
    I see more shakes.
  • Not Synced
    More of you tried copying and pasting
    than knew the logo. I like that.
  • Not Synced
    Well, so what happened is, the data
    that you just saw was actually on a PDF.
  • Not Synced
    In fact, hundreds and hundreds
    and hundreds of pages of PDF
  • Not Synced
    put out by our very own NYPD,
  • Not Synced
    and in order to access it,
    you would either have to copy and paste
  • Not Synced
    for hundreds and hundreds of hours,
  • Not Synced
    or you could be John Krause.
  • Not Synced
    John Krause was like,
    I'm not going to copy and paste this data.
  • Not Synced
    I'm going to write a program.
  • Not Synced
    It's called the NYPD Crash Data Band-Aid,
  • Not Synced
    and it goes to the NYPD's website
    and it would download PDFs.
  • Not Synced
    Every day it would search:
    if it found a PDF, it would download it
  • Not Synced
    and then it would run
    some PDF-scraping program,
  • Not Synced
    and out would come the text,
  • Not Synced
    and it would go on the Internet,
    and then people could make maps like that.
  • Not Synced
    And the fact that the data's here,
    once again, the fact that we have access to it
  • Not Synced
    -- Every accident, by the way,
    is a row in this table,
  • Not Synced
    every single accident, you can imagine
    how many PDFs that is --
  • Not Synced
    the fact that we
    have access to that is great,
  • Not Synced
    but let's not release it in PDF form,
  • Not Synced
    because then we're having our citizens
    write PDF scrapers.
  • Not Synced
    It's not the best use
    of our citizens' time,
  • Not Synced
    and we as a city can do better than that.
  • Not Synced
    Now, the good news is that
    the de Blasio Administration
  • Not Synced
    actually recently released this data
    a few months ago,
  • Not Synced
    and so now we can
    actually have access to it,
  • Not Synced
    but there's a lot of data
    still entombed in PDF.
  • Not Synced
    For example, our crime data
    is still only available in PDF.
  • Not Synced
    And not just our crime data:
  • Not Synced
    our own city budget.
  • Not Synced
    Our city budget is own readable
    right now in PDF form,
  • Not Synced
    and it's not just us
    that can't analyze it:
  • Not Synced
    our own legislators
    who vote for the budget
  • Not Synced
    also only get it in PDF.
  • Not Synced
    So our legislators cannot
    analyze the budget
  • Not Synced
    that they are voting for.
  • Not Synced
    And I think as a city we can do
    a little better than that as well.
  • Not Synced
    Now, there's a lot of data
    that's not hidden in PDFs.
  • Not Synced
    This is an example of a map I made,
  • Not Synced
    and this is the dirtiest waterways
    in New York City.
  • Not Synced
    Now, how do I measure dirty?
  • Not Synced
    Well, it's kind of a little weird,
  • Not Synced
    but I looked at the level
    of fecal coliform
Title:
How we found the worst place to park in New York City — using big data
Speaker:
Ben Wellington
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
11:48

English subtitles

Revisions Compare revisions