Return to Video

How we found the worst place to park in New York City — using big data

  • 0:01 - 0:04
    Six thousand miles of road,
  • 0:04 - 0:06
    600 miles of subway track,
  • 0:06 - 0:07
    400 miles of bike lanes
  • 0:07 - 0:09
    and a half a mile of tram track,
  • 0:09 - 0:11
    if you've ever been to Roosevelt Island.
  • 0:11 - 0:14
    These are the numbers that make up
    the infrastructure of New York City.
  • 0:14 - 0:17
    These are the statistics
    of our infrastructure.
  • 0:17 - 0:21
    They're the kind of numbers you can find
    released in reports by city agencies.
  • 0:21 - 0:24
    For example, the Department
    of Transportation will probably tell you
  • 0:24 - 0:26
    how many miles of road they maintain.
  • 0:26 - 0:29
    The MTA will boast how many miles
    of subway track there are.
  • 0:29 - 0:30
    Most city agencies give us statistics.
  • 0:30 - 0:32
    This is from a report this year
  • 0:32 - 0:34
    from the Taxi and Limousine Commission,
  • 0:34 - 0:37
    where we learn that there's about
    13,500 taxis here in New York City.
  • 0:37 - 0:38
    Pretty interesting, right?
  • 0:38 - 0:41
    But did you ever think about
    where these numbers came from?
  • 0:41 - 0:44
    Because for these numbers to exist,
    someone at the city agency
  • 0:44 - 0:48
    had to stop and say, hmm, here's a number
    that somebody might want want to know.
  • 0:48 - 0:50
    Here's a number
    that our citizens want to know.
  • 0:50 - 0:52
    So they go back to their raw data,
  • 0:52 - 0:54
    they count, they add, they calculate,
  • 0:54 - 0:55
    and then they put out reports,
  • 0:55 - 0:57
    and those reports
    will have numbers like this.
  • 0:57 - 1:00
    The problem is, how do they know
    all of our questions?
  • 1:00 - 1:01
    We have lots of questions.
  • 1:01 - 1:05
    In fact, in some ways there's literally
    an infinite number of questions
  • 1:05 - 1:06
    that we can ask about our city.
  • 1:06 - 1:08
    The agencies can never keep up.
  • 1:08 - 1:12
    So the paradigm isn't exactly working,
    and I think our policymakers realize that,
  • 1:12 - 1:16
    because in 2012, Mayor Bloomberg
    signed into law what he called
  • 1:16 - 1:20
    the most ambitious and comprehensive
    open data legislation in the country.
  • 1:20 - 1:21
    In a lot of ways, he's right.
  • 1:21 - 1:24
    In the last two years,
    the city has released 1,000 datasets
  • 1:24 - 1:26
    on our open data portal,
  • 1:26 - 1:27
    and it's pretty awesome.
  • 1:27 - 1:29
    So you go and look at data like this,
  • 1:29 - 1:32
    and instead of just counting
    the number of cabs,
  • 1:32 - 1:34
    we can start to ask different questions.
  • 1:34 - 1:35
    So I had a question.
  • 1:35 - 1:36
    When's rush hour in New York City?
  • 1:36 - 1:39
    It can be pretty bothersome.
    When is rush hour exactly?
  • 1:39 - 1:42
    And I thought to myself,
    these cabs aren't just numbers,
  • 1:42 - 1:44
    these are GPS recorders
    driving around in our city streets
  • 1:44 - 1:46
    recording each and every ride they take.
  • 1:46 - 1:49
    There's data there,
    and I looked at that data,
  • 1:49 - 1:53
    and I made a plot of the average speed of
    taxis in New York City throughout the day.
  • 1:53 - 1:56
    You can see that from about midnight
    to around 5:18 in the morning,
  • 1:56 - 2:00
    speed increases, and at that point,
    things turn around,
  • 2:00 - 2:04
    and they get slower and slower and slower
    until about 8:35 in the morning,
  • 2:04 - 2:06
    when they end up at around
    11 and a half miles per hour.
  • 2:06 - 2:10
    The average taxi is going 11 and a half
    miles per hour on our city streets,
  • 2:10 - 2:12
    and it turns out stays that way
  • 2:12 - 2:15
    for the entire day.
  • 2:15 - 2:16
    (Laughter)
  • 2:16 - 2:20
    So I said to myself, I guess
    there's no rush hour in New York City.
  • 2:20 - 2:21
    There's just a rush day.
  • 2:21 - 2:24
    Makes sense. And this is important
    for a couple of reasons.
  • 2:24 - 2:28
    If you're a transportation planner,
    this might be pretty interesting to know.
  • 2:28 - 2:30
    But if you want to get somewhere quickly,
  • 2:30 - 2:33
    you now know to set your alarm for
    4:45 in the morning and you're all set.
  • 2:33 - 2:34
    New York, right?
  • 2:34 - 2:36
    But there's a story behind this data.
  • 2:36 - 2:38
    This data wasn't
    just available, it turns out.
  • 2:38 - 2:42
    It actually came from something called
    a Freedom of Information Law Request,
  • 2:42 - 2:43
    or a FOIL Request.
  • 2:43 - 2:46
    This is a form you can find on the
    Taxi and Limousine Commission website.
  • 2:46 - 2:49
    In order to access this data,
    you need to go get this form,
  • 2:49 - 2:51
    fill it out, and they will notify you,
  • 2:51 - 2:53
    and a guy named Chris Whong
    did exactly that.
  • 2:53 - 2:55
    Chris went down, and they told him,
  • 2:55 - 2:58
    "Just bring a brand new hard drive
    down to our office,
  • 2:58 - 3:01
    leave it here for five hours,
    we'll copy the data and you take it back."
  • 3:01 - 3:03
    And that's where this data came from.
  • 3:03 - 3:07
    Now, Chris is the kind of guy
    who wants to make the data public,
  • 3:07 - 3:11
    and so it ended up online for all to use,
    and that's where this graph came from.
  • 3:11 - 3:14
    And the fact that it exists is amazing.
    These GPS recorders, really cool.
  • 3:14 - 3:17
    But the fact that we have citizens
    walking around with hard drives
  • 3:17 - 3:20
    picking up data from city agencies
    to make it public,
  • 3:20 - 3:22
    but it was already kind of public,
    you could get to it,
  • 3:22 - 3:24
    but it was "public,"
    it wasn't public.
  • 3:24 - 3:26
    And we can do better than that as a city.
  • 3:26 - 3:29
    We don't need our citizens
    walking around with hard drives.
  • 3:29 - 3:31
    Now, not every data set
    is behind a FOIL request. Right?
  • 3:31 - 3:35
    So here is a map I made with the most
    dangerous intersections in New York City
  • 3:35 - 3:36
    based on cyclist accidents.
  • 3:36 - 3:39
    So the red areas are more dangerous,
    and what it shows is first,
  • 3:39 - 3:43
    the East side of Manhattan,
    especially in the lower area of Manhattan,
  • 3:43 - 3:44
    has more cyclist accidents.
  • 3:44 - 3:47
    That might make sense,
    because there are more cyclists
  • 3:47 - 3:48
    coming off the bridges there.
  • 3:48 - 3:50
    But there are other hotspots
    worth studying.
  • 3:50 - 3:53
    There's Williamsburg.
    There's Roosevelt Avenue in Queens.
  • 3:53 - 3:56
    And this is exactly the kind of data
    we need for Vision Zero.
  • 3:56 - 3:58
    This is exactly what we're looking for.
  • 3:58 - 4:00
    But there's a story
    behind this data as well.
  • 4:00 - 4:02
    This data didn't just appear.
  • 4:02 - 4:04
    How many of you guys know this logo?
  • 4:04 - 4:06
    Yeah, I see some shakes.
  • 4:06 - 4:08
    Have you ever tried to copy
    and paste data out of a PDF
  • 4:08 - 4:10
    and make sense of it?
  • 4:10 - 4:11
    I see more shakes.
  • 4:11 - 4:14
    More of you tried copying and pasting
    than knew the logo. I like that.
  • 4:14 - 4:18
    Well, so what happened is, the data
    that you just saw was actually on a PDF.
  • 4:18 - 4:21
    In fact, hundreds and hundreds
    and hundreds of pages of PDF
  • 4:21 - 4:23
    put out by our very own NYPD,
  • 4:23 - 4:26
    and in order to access it,
    you would either have to copy and paste
  • 4:26 - 4:28
    for hundreds and hundreds of hours,
  • 4:28 - 4:29
    or you could be John Krause.
  • 4:29 - 4:30
    John Krause was like,
  • 4:30 - 4:34
    I'm not going to copy and paste this data.
    I'm going to write a program.
  • 4:34 - 4:36
    It's called the NYPD Crash Data Band-Aid,
  • 4:36 - 4:39
    and it goes to the NYPD's website
    and it would download PDFs.
  • 4:39 - 4:42
    Every day it would search:
    if it found a PDF, it would download it
  • 4:42 - 4:44
    and then it would run
    some PDF-scraping program,
  • 4:44 - 4:46
    and out would come the text,
  • 4:46 - 4:49
    and it would go on the Internet,
    and then people could make maps like that.
  • 4:49 - 4:52
    And the fact that the data's here,
    the fact that we have access to it
  • 4:52 - 4:55
    -- Every accident, by the way,
    is a row in this table,
  • 4:55 - 4:58
    every single accident, you can imagine
    how many PDFs that is --
  • 4:58 - 5:00
    the fact that we
    have access to that is great,
  • 5:00 - 5:02
    but let's not release it in PDF form,
  • 5:02 - 5:05
    because then we're having our citizens
    write PDF scrapers.
  • 5:05 - 5:07
    It's not the best use
    of our citizens' time,
  • 5:07 - 5:09
    and we as a city can do better than that.
  • 5:09 - 5:11
    Now, the good news is that
    the de Blasio Administration
  • 5:11 - 5:14
    actually recently released this data
    a few months ago,
  • 5:14 - 5:16
    and so now we can
    actually have access to it,
  • 5:16 - 5:18
    but there's a lot of data
    still entombed in PDF.
  • 5:18 - 5:22
    For example, our crime data
    is still only available in PDF.
  • 5:22 - 5:23
    And not just our crime data:
  • 5:23 - 5:25
    our own city budget.
  • 5:25 - 5:29
    Our city budget is own readable
    right now in PDF form,
  • 5:29 - 5:31
    and it's not just us
    that can't analyze it:
  • 5:31 - 5:34
    our own legislators
    who vote for the budget
  • 5:34 - 5:36
    also only get it in PDF.
  • 5:36 - 5:38
    So our legislators cannot
    analyze the budget
  • 5:38 - 5:40
    that they are voting for.
  • 5:40 - 5:43
    And I think as a city we can do
    a little better than that as well.
  • 5:43 - 5:46
    Now, there's a lot of data
    that's not hidden in PDFs.
  • 5:46 - 5:47
    This is an example of a map I made,
  • 5:47 - 5:50
    and this is the dirtiest waterways
    in New York City.
  • 5:50 - 5:52
    Now, how do I measure dirty?
  • 5:52 - 5:54
    Well, it's kind of a little weird,
  • 5:54 - 5:56
    but I looked at the level
    of fecal coliform,
  • 5:56 - 5:59
    which is a measurement of fecal matter
    in each of our waterways.
  • 5:59 - 6:03
    The larger the circle,
    the dirtier the water,
  • 6:03 - 6:06
    so the large circles are dirty water,
    the small circles are cleaner.
  • 6:06 - 6:08
    What you see is, inland waterways,
  • 6:08 - 6:11
    this is all data that was sampled
    by the city over the last five years,
  • 6:11 - 6:14
    and inland waterways are,
    in general, dirtier.
  • 6:14 - 6:15
    That makes sense, right?
  • 6:15 - 6:18
    And the bigger circles are dirty.
    And I learned a few things from this.
  • 6:18 - 6:21
    Number one: never swim in anything
    that ends in "creek" or "canal."
  • 6:21 - 6:26
    But number two, I also found
    the dirtiest waterway in New York City,
  • 6:26 - 6:28
    by this measure, one measure.
  • 6:28 - 6:31
    In Coney Island Creek, which is not
    the Coney Island you swim in, luckily.
  • 6:31 - 6:32
    It's on the other side.
  • 6:32 - 6:36
    But Coney Island Creek, 94 percent
    of samples taken over the last five years
  • 6:36 - 6:39
    have had fecal levels so high
  • 6:39 - 6:42
    that it would be against state law
    to swim in the water.
  • 6:42 - 6:45
    And this is not the kind of fact
    that you're going to see
  • 6:45 - 6:46
    boasted in a city report, right?
  • 6:46 - 6:48
    It's not going to be
    the front page on nyc.gov.
  • 6:48 - 6:50
    You're not going to see it there,
  • 6:50 - 6:52
    but the fact that we can get
    to that data is awesome.
  • 6:52 - 6:54
    But once again, it wasn't super-easy,
  • 6:54 - 6:57
    because this data was not
    on the open data portal.
  • 6:57 - 6:59
    If you were to go to the open data portal,
  • 6:59 - 7:01
    you'd see just a snippet of it,
    a year or a few months.
  • 7:01 - 7:05
    It was actually on the Department
    of Environmental Protection's website.
  • 7:05 - 7:09
    And each one of these links is an Excel
    sheet, and each Excel sheet is different.
  • 7:09 - 7:11
    Every heading is different:
    you copy, paste, reorganize.
  • 7:11 - 7:14
    When you do, you can make maps
    and that's great but once again,
  • 7:14 - 7:17
    we can do better than that
    as a city, we can normalize things.
  • 7:17 - 7:21
    And we're getting there, because
    there's this website that Socrata makes
  • 7:21 - 7:23
    called the Open Data Portal
    on New York City.
  • 7:23 - 7:25
    This is where 1,100 data sets
    that don't suffer
  • 7:25 - 7:27
    from the things I just told you live,
  • 7:27 - 7:29
    and that number is growing,
    and that's great.
  • 7:29 - 7:33
    You can download data in any format,
    be it CSV or PDF or Excel document.
  • 7:33 - 7:35
    Whatever you want,
    you can download the data that way.
  • 7:35 - 7:37
    The problem is, once you do,
  • 7:37 - 7:40
    you will find that each agency
    codes their addresses differently.
  • 7:40 - 7:42
    So one is street name,
    intersection street,
  • 7:42 - 7:44
    street, borough, address, building,
    building, address,
  • 7:44 - 7:48
    so once again, you're spending time,
    even when we have this portal,
  • 7:48 - 7:50
    you're spending time
    normalizing our address field.
  • 7:50 - 7:52
    And that's not the best use
    of our citizens' time.
  • 7:52 - 7:54
    We can do better than that as a city.
  • 7:54 - 7:56
    We can standardize our addresses,
  • 7:56 - 7:58
    and if we do,
    we can get more maps like this.
  • 7:58 - 8:00
    This is a map of fire hydrants
    in New York City,
  • 8:00 - 8:02
    but not just any fire hydrants:
  • 8:02 - 8:07
    these are the top 250 grossing fire
    hydrants in terms of parking tickets.
  • 8:08 - 8:11
    So I learned a few things from this map,
    and I really like this map.
  • 8:11 - 8:14
    Number one, just don't park
    on the Upper East Side.
  • 8:14 - 8:17
    Just don't. It doesn't matter where
    you park, you will get a hydrant ticket.
  • 8:17 - 8:21
    Number two, I found the two highest
    grossing hydrants in all of New York City,
  • 8:21 - 8:23
    and they're on the Lower East Side,
  • 8:23 - 8:27
    and they were bringing in
    over $55,000 a year, a year,
  • 8:27 - 8:28
    in parking tickets.
  • 8:28 - 8:31
    And that seemed a little strange
    to me when I noticed it,
  • 8:31 - 8:34
    so I did a little digging and it turns out
    what you had is a hydrant
  • 8:34 - 8:36
    and then something called
    a curb extension,
  • 8:36 - 8:38
    which is like a seven-foot
    space to walk on,
  • 8:38 - 8:40
    and then a parking spot.
  • 8:40 - 8:42
    And so these cars came along,
    and the hydrant,
  • 8:42 - 8:44
    "It's all the way over there, I'm fine,"
  • 8:44 - 8:47
    and there was actually a parking spot
    painted there beautifully for them.
  • 8:47 - 8:50
    They would park there, and the NYPD
    disagreed with this designation
  • 8:50 - 8:51
    and would ticket them.
  • 8:51 - 8:54
    And it wasn't just me
    who found a parking ticket.
  • 8:54 - 8:56
    This is the Google Street
    View Car driving by,
  • 8:56 - 8:57
    finding a same parking ticket.
  • 8:57 - 9:02
    So I wrote about this on my blog,
    on I Quant NY, and the DOT responded,
  • 9:02 - 9:03
    and they said,
  • 9:03 - 9:06
    "While the DOT has not received
    any complaints about this location,
  • 9:06 - 9:11
    we will review the roadway markings
    and make any appropriate alterations."
  • 9:11 - 9:14
    And I thought to myself,
    typical government response,
  • 9:14 - 9:16
    all right, moved on with my life.
  • 9:16 - 9:20
    But then, a few weeks later,
    something incredible happened.
  • 9:20 - 9:22
    They repainted the spot,
  • 9:22 - 9:25
    and for a second I thought I saw
    the future of open data,
  • 9:25 - 9:27
    because think about what happened here.
  • 9:27 - 9:29
    For five years, for five years,
  • 9:29 - 9:32
    this spot was being ticketed,
    and it was confusing,
  • 9:32 - 9:36
    and then a citizen found something,
    they told the city, and within a few weeks
  • 9:36 - 9:38
    the problem was fixed.
  • 9:38 - 9:41
    It's amazing, and a lot of people
    see open data as being a watchdog.
  • 9:41 - 9:43
    It's not, it's about being a partner.
  • 9:43 - 9:46
    We can empower our citizens
    to be better partners for government,
  • 9:46 - 9:48
    and it's not that hard.
  • 9:48 - 9:49
    All we need are a few changes.
  • 9:49 - 9:50
    If you're FOILing data,
  • 9:50 - 9:53
    if you're seeing your data
    being FOILed over and over again,
  • 9:53 - 9:57
    let's release it to the public, that's
    a sign that it should be made public.
  • 9:57 - 9:59
    And if you're a government agency
    releasing a PDF,
  • 9:59 - 10:03
    let's pass legislation that requires you
    to post it with the underlying data,
  • 10:03 - 10:05
    because that data
    is coming from somewhere.
  • 10:05 - 10:07
    I don't know where, but it's
    coming from somewhere,
  • 10:07 - 10:09
    and you can release it with the PDF.
  • 10:09 - 10:11
    And let's adopt and share
    some open data standards.
  • 10:11 - 10:14
    Let's start with our addresses
    here in New York City.
  • 10:14 - 10:16
    Let's just start
    normalizing our addresses.
  • 10:16 - 10:18
    Because New York is a leader in open data.
  • 10:18 - 10:21
    Despite all this, we are absolutely
    a leader in open data,
  • 10:21 - 10:24
    and if we start normalizing things,
    and set an open data standard,
  • 10:24 - 10:28
    others will follow. The state will follow,
    and maybe the federal government,
  • 10:28 - 10:29
    Other countries could follow,
  • 10:29 - 10:32
    and we're not that far off from a time
    where you could write one program
  • 10:32 - 10:34
    and map information from 100 countries.
  • 10:34 - 10:37
    It's not science fiction.
    We're actually quite close.
  • 10:37 - 10:39
    And by the way, who are we
    empowering with this?
  • 10:39 - 10:42
    Because it's not just John Krause
    and it's not just Chris Wong.
  • 10:42 - 10:45
    There are hundreds of meetups
    going on in New York City right now,
  • 10:45 - 10:46
    active meetups.
  • 10:46 - 10:48
    There are thousands of people
    attending these meetups.
  • 10:48 - 10:51
    These people are going after work
    and on weekends,
  • 10:51 - 10:53
    and they're attending these meetups
    to look at open data
  • 10:53 - 10:55
    and make our city a better place.
  • 10:55 - 10:58
    Groups like BetaNYC, who last week,
    just last week released something
  • 10:58 - 10:59
    called citygram.nyc.
  • 10:59 - 11:02
    That allows you to subscribe
    to 311 complaints
  • 11:02 - 11:04
    around your own home,
    or around your office.
  • 11:04 - 11:06
    You put in your address,
    you get local complaints.
  • 11:06 - 11:09
    And it's not just the tech community
    that are after these things.
  • 11:09 - 11:12
    It's urban planners like
    the students I teach at Pratt.
  • 11:12 - 11:14
    It's policy advocates, it's everyone,
  • 11:14 - 11:17
    it's citizens from a diverse
    set of backgrounds.
  • 11:17 - 11:19
    And with some small, incremental changes,
  • 11:19 - 11:23
    we can unlock the passion
    and the ability of our citizens
  • 11:23 - 11:26
    to harness open data
    and make our city even better,
  • 11:26 - 11:29
    whether it's one data set,
    or one parking spot at a time.
  • 11:29 - 11:32
    Thank you.
  • 11:32 - 11:35
    (Applause)
Title:
How we found the worst place to park in New York City — using big data
Speaker:
Ben Wellington
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
11:48

English subtitles

Revisions Compare revisions