Return to Video

How we found the worst place to park in New York City — using big data

  • Not Synced
    Six thousand miles of road,
  • Not Synced
    600 miles of subway track,
  • Not Synced
    400 miles of bike lanes,
  • Not Synced
    and a half a mile of tram track,
  • Not Synced
    if you've ever been to Roosevelt Island.
  • Not Synced
    So these are the numbers that make up
    the infrastructure of New York City.
  • Not Synced
    These are the statistics
    of our infrastructure.
  • Not Synced
    They're the kind of numbers you can find
    released in reports by city agencies.
  • Not Synced
    For example, the Department
    of Transportation will probably tell you
  • Not Synced
    how many miles of road they maintain.
  • Not Synced
    The MTA will boast how many miles
    of subway track there are.
  • Not Synced
    But most city agencies give us statistics.
  • Not Synced
    This is from a report this year
  • Not Synced
    from the Taxi and Limousine Commission,
  • Not Synced
    where we learn that there's about,
    13,500 taxis here in New York City.
  • Not Synced
    Pretty interesting, right?
  • Not Synced
    But did you ever think about
    where these numbers came from?
  • Not Synced
    Because for these numbers to exist,
    someone at the city agency
  • Not Synced
    had to stop and say, "Hmm, here's a number
    that somebody might want want to know."
  • Not Synced
    So they go back to their raw data,
  • Not Synced
    they count, they add, they calculate,
  • Not Synced
    and then they put out reports,
  • Not Synced
    and those reports
    will have numbers like this.
  • Not Synced
    The problem is, how do they know
    all of our questions?
  • Not Synced
    We have lots of questions.
  • Not Synced
    In fact, in some ways there's literally
    an infinite number of questions
  • Not Synced
    that we can ask about our city.
  • Not Synced
    So the agencies can never keep up.
  • Not Synced
    So the paradigm isn't exactly working,
    and I think our policymakers realize that,
  • Not Synced
    because in 2012, Mayor Bloomberg
    signed into law what he called
  • Not Synced
    the most ambitious and comprehensive
    open data legislation in the country.
  • Not Synced
    In a lot of ways, he's right.
  • Not Synced
    In the last two years,
    the city has released
  • Not Synced
    a thousand data sets
    on our open data portal,
  • Not Synced
    and it's pretty awesome.
  • Not Synced
    So you go and look at data like this,
    and instead of just counting
  • Not Synced
    the number of cabs,
  • Not Synced
    we can start to ask different questions.
  • Not Synced
    So I had a question.
  • Not Synced
    When's rush hour in New York City?
  • Not Synced
    I mean, it can be pretty bothersome.
    When is rush hour exactly?
  • Not Synced
    And I thought to myself,
    well, these cabs aren't just numbers,
  • Not Synced
    these are GPS recorders
    driving around in our city streets
  • Not Synced
    recording each and every ride they take.
  • Not Synced
    There's data there,
  • Not Synced
    and I looked at that data,
    and I made a plot
  • Not Synced
    of the average speed of taxis
    in New York City throughout the day.
  • Not Synced
    Well, you can see that from about midnight
    to around 5:18 in the morning,
  • Not Synced
    speed increases, and at that point,
  • Not Synced
    things turn around,
  • Not Synced
    and they get slower and slower and slower
    until about 8:35 in the morning,
  • Not Synced
    when they end up at around
    11 and a half miles and hour.
  • Not Synced
    The average taxi is going 11 and a half
    miles per hour on our city streets,
  • Not Synced
    and it turns out stays that way
  • Not Synced
    for the entire day.
  • Not Synced
    The entire day. (Laughter)
  • Not Synced
    So I said to myself, I guess
    there's no rush hour in New York City.
  • Not Synced
    There's just a rush day.
  • Not Synced
    Makes sense. But this is important
    for a couple of reasons.
  • Not Synced
    If you're a transportation planner,
    this might be pretty interesting to know,
  • Not Synced
    but if you want to get somewhere quickly,
  • Not Synced
    you now know to set your alarm
    for 4:45 in the morning, your'e all set.
  • Not Synced
    New York, right?
  • Not Synced
    But there's a story behind this data.
  • Not Synced
    This data wasn't
    just available, it turns out.
  • Not Synced
    It actually came from something called
    a Freedom of Information law request,
  • Not Synced
    or a FOILrequest.
  • Not Synced
    This is a form you can find on the
    Taxi and Limousines Commission website.
  • Not Synced
    In order to access this data,
    you need to go get this form,
  • Not Synced
    fill it out, and they will notify you,
  • Not Synced
    and a guy named Chris Wong
    did exactly that.
  • Not Synced
    Chris went down, and they told him,
    "Just a bring a hard drive down,
  • Not Synced
    a brand new hard drive,
    bring it to our office,
  • Not Synced
    leave it here for five hours,
    we'll copy data and you take it back."
  • Not Synced
    And that's where this data came from.
  • Not Synced
    Now, Chris is the kind of guy
    who wants to make the data public,
  • Not Synced
    and so it ended up online for all to use,
    and that's where this graph came from.
  • Not Synced
    And the fact that it exists is amazing.
    These GPS recorders, really cool.
  • Not Synced
    But the fact that we have citizens
    walking around with hard drives
  • Not Synced
    picking up data from city agencies
    to make it public,
  • Not Synced
    where it was already kind of public,
    you could get to it, but it was "public,"
  • Not Synced
    but it wasn't public.
  • Not Synced
    And we can do better
    than that as a city, right?
  • Not Synced
    We don't need our citizens
    walking around with hard drives.
  • Not Synced
    Now, not every data set
    is behind a FOIL request. Right?
  • Not Synced
    So here is a map I made with the most
    dangerous intersections in New York City,
  • Not Synced
    based on cyclist accidents.
  • Not Synced
    So the red areas are more dangerous,
    and what it shows is first,
  • Not Synced
    the East side of Manhattan,
    especially in the lower area of Manhattan,
  • Not Synced
    has more cyclist accidents.
  • Not Synced
    That might make sense,
    because there are more cyclists
  • Not Synced
    coming off the bridges there.
  • Not Synced
    But there are other hotspots
    worth studying, right?
  • Not Synced
    There's Williamsburg.
    There's Roosevelt Avenue in Queens.
  • Not Synced
    And this is exactly the kind of data
    we need for Vision Zero.
  • Not Synced
    This is exactly what we're looking for.
  • Not Synced
    But there's a story
    behind this data as well.
  • Not Synced
    This data didn't just appear.
  • Not Synced
    How many of you guys know this logo?
  • Not Synced
    Yeah, I see some shakes.
  • Not Synced
    Have you ever tried to copy
    and paste data out of a PDF
  • Not Synced
    and make sense with it?
  • Not Synced
    I see more shakes.
  • Not Synced
    More of you tried copying and pasting
    than knew the logo. I like that.
  • Not Synced
    Well, so what happened is, the data
    that you just saw was actually on a PDF.
  • Not Synced
    In fact, hundreds and hundreds
    and hundreds of pages of PDF
  • Not Synced
    put out by our very own NYPD,
  • Not Synced
    and in order to access it,
    you would either have to copy and paste
  • Not Synced
    for hundreds and hundreds of hours,
  • Not Synced
    or you could be John Krause.
  • Not Synced
    John Krause was like,
    I'm not going to copy and paste this data.
  • Not Synced
    I'm going to write a program.
  • Not Synced
    It's called the NYPD Crash Data Band-Aid,
  • Not Synced
    and it goes to the NYPD's website
    and it would download PDFs.
  • Not Synced
    Every day it would search:
    if it found a PDF, it would download it
  • Not Synced
    and then it would run
    some PDF-scraping program,
  • Not Synced
    and out would come the text,
  • Not Synced
    and it would go on the Internet,
    and then people could make maps like that.
  • Not Synced
    And the fact that the data's here,
    once again, the fact that we have access to it
  • Not Synced
    -- Every accident, by the way,
    is a row in this table,
  • Not Synced
    every single accident, you can imagine
    how many PDFs that is --
  • Not Synced
    the fact that we
    have access to that is great,
  • Not Synced
    but let's not release it in PDF form,
  • Not Synced
    because then we're having our citizens
    write PDF scrapers.
  • Not Synced
    It's not the best use
    of our citizens' time,
  • Not Synced
    and we as a city can do better than that.
  • Not Synced
    Now, the good news is that
    the de Blasio Administration
  • Not Synced
    actually recently released this data
    a few months ago,
  • Not Synced
    and so now we can
    actually have access to it,
  • Not Synced
    but there's a lot of data
    still entombed in PDF.
  • Not Synced
    For example, our crime data
    is still only available in PDF.
  • Not Synced
    And not just our crime data:
  • Not Synced
    our own city budget.
  • Not Synced
    Our city budget is own readable
    right now in PDF form,
  • Not Synced
    and it's not just us
    that can't analyze it:
  • Not Synced
    our own legislators
    who vote for the budget
  • Not Synced
    also only get it in PDF.
  • Not Synced
    So our legislators cannot
    analyze the budget
  • Not Synced
    that they are voting for.
  • Not Synced
    And I think as a city we can do
    a little better than that as well.
  • Not Synced
    Now, there's a lot of data
    that's not hidden in PDFs.
  • Not Synced
    This is an example of a map I made,
  • Not Synced
    and this is the dirtiest waterways
    in New York City.
  • Not Synced
    Now, how do I measure dirty?
  • Not Synced
    Well, it's kind of a little weird,
  • Not Synced
    but I looked at the level
    of fecal coliform,
  • Not Synced
    which is a measurement of fecal matter
    in each of our waterways.
  • Not Synced
    The larger the circle,
    the dirtier the water,
  • Not Synced
    so the large circles are dirty water,
    the small circles are cleaner.
  • Not Synced
    What you see is, inland waterways,
  • Not Synced
    this is all data that was sampled
    by the city over the last five years,
  • Not Synced
    and inland waterways are,
    in general, dirtier.
  • Not Synced
    That makes sense, right?
  • Not Synced
    And the bigger circles are dirty.
    And I learned a few things like this.
  • Not Synced
    Number one: never swim in anything
    that ends in "creek" or "canal."
  • Not Synced
    But number two, I also found
    the dirtiest waterway in New York City,
  • Not Synced
    by this measure, one measure.
  • Not Synced
    In Coney Island Creek, which is not
    the Coney Island you swim in, luckily.
  • Not Synced
    It's on the other side.
  • Not Synced
    But Coney Island Creek, 94 percent
    of samples taken over the last five years
  • Not Synced
    have had fecal levels so high
  • Not Synced
    that it would be against state law
    to swim in the water.
  • Not Synced
  • Not Synced
    And this is not the kind of fact
    that you're going to see
  • Not Synced
    boasted in a city report, right?
  • Not Synced
    It's not going to be
    the front page on nyc.gov.
  • Not Synced
    You're not going to see it there,
  • Not Synced
    but the fact that we
    can get to that data is awesome.
  • Not Synced
    But once again, it wasn't super-easy,
  • Not Synced
    because this data was not
    on the open data portal.
  • Not Synced
    If you were to go to the open data portal,
  • Not Synced
    you'd see just a snippet of it,
  • Not Synced
    a year or a few months.
  • Not Synced
    It was actually on the Department
    of Environmental Protection's website.
  • Not Synced
    And each one of these links
    is an Excel sheet,
  • Not Synced
    and each Excel sheet is different.
  • Not Synced
    Every heading is different:
    you copy, paste, reorganize, reorder.
  • Not Synced
    And when you do, you can make maps,
    and that's great, but once again,
  • Not Synced
    we can do better than that as a city,
    we can normalize things.
  • Not Synced
    And we're getting there, because
    there's this website that Socrata makes
  • Not Synced
    called the Open Data Portal
    on New York City.
  • Not Synced
    This is where 1,100 data sets
    that don't suffer
  • Not Synced
    from all those things
    I just told you live,
  • Not Synced
    and that number is growing,
    and that's great.
  • Not Synced
    You can download data
    in any format you want,
  • Not Synced
    be it CSV or PDF if for some reason
    that's what you want, or Excel document.
  • Not Synced
    Whatever you want,
    you can download the data that way.
  • Not Synced
    The problem is, once you do,
  • Not Synced
    you will find that each agency
    codes their addresses differently.
  • Not Synced
    So one is street name, intersection street,
    street, borough, address, building,
  • Not Synced
    building, address, and so once again,
    you're spending time,
  • Not Synced
    even when we have this portal,
    you're spending time
  • Not Synced
    normalizing our address field.
  • Not Synced
    And I think that's not
    the best use of our citizens' time, right?
  • Not Synced
    We can do better than that as a city.
  • Not Synced
    We can standardize our addresses,
    and if we do, we can get more maps like this.
  • Not Synced
    This is a map of fire hydrants
    in New York City,
  • Not Synced
    but not just any fire hydrants:
  • Not Synced
    these are the top 250 grossing fire
    hydrants in terms of parking tickets.
  • Not Synced
    So I learned a few things from this map,
    and I really like this map.
  • Not Synced
    Number one, just don't park
    on the Upper East Side.
  • Not Synced
    Just don't. It doesn't matter where
    you park, you will get a hydrant ticket.
  • Not Synced
    Number two, I found the two highest
    grossing hydrants in all of New York City,
  • Not Synced
    and they're on the Lower East Side,
  • Not Synced
    and they were bringing in
    over $55,000 a year, a year,
  • Not Synced
    in parking tickets.
  • Not Synced
    And that seemed a little strange
    to me when I noticed it,
  • Not Synced
    so I did a little digging and it turns out
    what you had is a hydrant
  • Not Synced
    and then something called
    a curb extension,
  • Not Synced
    which is like a seven-foot
    space to walk on,
  • Not Synced
    and then a parking spot.
  • Not Synced
    And so these cars came along,
    and the hydrant,
  • Not Synced
    "It's all the way over there, I'm fine,"
  • Not Synced
    and there was actually a parking spot
    painted there beautifully for them.
  • Not Synced
    They would park there, and the NYPD
    disagreed with this designation
  • Not Synced
    and would ticket them.
  • Not Synced
    And it wasn't just me who found
    a parking ticket, right?
  • Not Synced
    This is the Google Street
    View Car driving by,
  • Not Synced
    finding a same parking ticket.
  • Not Synced
    So I wrote about this on my blog,
    on I Quant NY, and the DOT responded,
  • Not Synced
    and they said, "While the DOT has not
    received any complaints about this location,
  • Not Synced
    we will review the roadway markings
    and make any appropriate alterations."
  • Not Synced
    And I thought to myself,
    typical government response,
  • Not Synced
    all right, moved on with my life.
  • Not Synced
    But then, a few weeks later,
    something incredible happened.
  • Not Synced
    They repainted the spot,
  • Not Synced
    and for a second I thought I saw
    the future of open data,
  • Not Synced
    because think about what happened here.
  • Not Synced
    For five years, five years,
  • Not Synced
    this spot was being ticketed,
    and it was confusing,
  • Not Synced
    and then a citizen found something,
    they told the city, and within a few weeks
  • Not Synced
    the problem was fixed.
  • Not Synced
    It's amazing, and a lot of people
    see open data as being a watchdog.
  • Not Synced
    It's not, it's about being a partner.
  • Not Synced
    We can empower our citizens
    to be better partners for government,
  • Not Synced
    and it's not hard.
  • Not Synced
    All we need are a few changes.
  • Not Synced
    If you're FOILing data, if you're seeing
    your data being FOILed over and over again,
  • Not Synced
    let's release it to the public, that's
    a sign that it should be made public.
  • Not Synced
    And if we're going to release a PDF,
  • Not Synced
    if you're a government agency
    releasing a PDF,
  • Not Synced
    let's pass legislation that requires you
    to post it with the underlying data,
  • Not Synced
    because that data
    is coming from somewhere.
  • Not Synced
    I don't know where, but it's
    coming from somewhere,
  • Not Synced
    and you can release it with the PDF.
  • Not Synced
    And let's adopt and share
    some open data standards.
  • Not Synced
    Let's start with our addresses
    here in New York City.
  • Not Synced
    Let's just start
    normalizing our addresses.
  • Not Synced
    Because you know what?
    New York is a leader in open data.
  • Not Synced
    Despite all this, we are absolutely
    a leader in open data,
  • Not Synced
    and if we start normalizing things,
    and we set an open data standard,
  • Not Synced
    others will follow. The state will follow,
    and maybe the federal government,
  • Not Synced
    and I know it's crazy,
    but other countries could follow,
  • Not Synced
    and we're not that far off from a time
    where you could write one program
  • Not Synced
    and map information from 100 countries.
  • Not Synced
    It's not science fiction.
    We're actually quite close.
  • Not Synced
    And by the way, who are we
    empowering with this?
  • Not Synced
    Because it's not just John Krause
    and it's not just Chris Wong.
  • Not Synced
    There are hundreds of meetups
    going around in New York City,
  • Not Synced
    going on in New York City right now,
  • Not Synced
    active meetups.
  • Not Synced
    There are thousands of people
    attending these meetups.
  • Not Synced
    These people are going after work
    and on weekends,
  • Not Synced
    and they're attending these meetups
    to look at open data
  • Not Synced
    and make our city a better place.
  • Not Synced
    Groups like BetaNYC, who last week,
    just last week released something
  • Not Synced
    called citygram.nyc.
  • Not Synced
    That allows you to subscribe
    to 311 complaints
  • Not Synced
    around your own home,
    or around your office.
  • Not Synced
    You put in your address,
    you get local complaints.
  • Not Synced
    And it's not just the tech community
    that are after these things, right?
  • Not Synced
    It's urban planners like the students
    I teach at Pratt.
  • Not Synced
    It's policy advocates, it's everyone,
  • Not Synced
    it's citizens from a diverse
    set of backgrounds.
  • Not Synced
    And with some small, incremental changes,
  • Not Synced
    we can unlock the passion
    and the ability of our citizens
  • Not Synced
    to harness open data
  • Not Synced
    and make our city even better,
  • Not Synced
    whether it's one data set,
    or one parking spot at a time.
  • Not Synced
    Thank you.
  • Not Synced
    (Applause)
Title:
How we found the worst place to park in New York City — using big data
Speaker:
Ben Wellington
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
11:48

English subtitles

Revisions Compare revisions