Return to Video

The era of blind faith in big data must end

  • 0:01 - 0:03
    Algorithms are everywhere.
  • 0:04 - 0:07
    They sort and separate
    the winners from the losers.
  • 0:08 - 0:10
    The winners get the job
  • 0:10 - 0:12
    or a good credit card offer.
  • 0:12 - 0:15
    The losers don't even get an interview
  • 0:16 - 0:17
    or they pay more for insurance.
  • 0:18 - 0:22
    We're being scored with secret formulas
    that we don't understand
  • 0:23 - 0:26
    that often don't have systems of appeal.
  • 0:27 - 0:29
    That begs the question:
  • 0:29 - 0:31
    What if the algorithms are wrong?
  • 0:33 - 0:35
    To build an algorithm you need two things:
  • 0:35 - 0:37
    you need data, what happened in the past,
  • 0:37 - 0:39
    and a definition of success,
  • 0:39 - 0:41
    the thing you're looking for
    and often hoping for.
  • 0:41 - 0:46
    You train an algorithm
    by looking, figuring out.
  • 0:46 - 0:50
    The algorithm figures out
    what is associated with success.
  • 0:50 - 0:52
    What situation leads to success?
  • 0:53 - 0:55
    Actually, everyone uses algorithms.
  • 0:55 - 0:57
    They just don't formalize them
    in written code.
  • 0:57 - 0:59
    Let me give you an example.
  • 0:59 - 1:02
    I use an algorithm every day
    to make a meal for my family.
  • 1:02 - 1:04
    The data I use
  • 1:04 - 1:06
    is the ingredients in my kitchen,
  • 1:06 - 1:08
    the time I have,
  • 1:08 - 1:09
    the ambition I have,
  • 1:09 - 1:11
    and I curate that data.
  • 1:11 - 1:15
    I don't count those little packages
    of ramen noodles as food.
  • 1:15 - 1:17
    (Laughter)
  • 1:17 - 1:19
    My definition of success is:
  • 1:19 - 1:21
    a meal is successful
    if my kids eat vegetables.
  • 1:22 - 1:25
    It's very different
    from if my youngest son were in charge.
  • 1:25 - 1:28
    He'd say success is if
    he gets to eat lots of Nutella.
  • 1:29 - 1:31
    But I get to choose success.
  • 1:31 - 1:34
    I am in charge. My opinion matters.
  • 1:34 - 1:37
    That's the first rule of algorithms.
  • 1:37 - 1:40
    Algorithms are opinions embedded in code.
  • 1:42 - 1:45
    It's really different from what you think
    most people think of algorithms.
  • 1:45 - 1:50
    They think algorithms are objective
    and true and scientific.
  • 1:50 - 1:52
    That's a marketing trick.
  • 1:53 - 1:55
    It's also a marketing trick
  • 1:55 - 1:59
    to intimidate you with algorithms,
  • 1:59 - 2:02
    to make you trust and fear algorithms
  • 2:02 - 2:04
    because you trust and fear mathematics.
  • 2:06 - 2:10
    A lot can go wrong when we put
    blind faith in big data.
  • 2:12 - 2:15
    This is Kiri Soares.
    She's a high school principal in Brooklyn.
  • 2:15 - 2:18
    In 2011, she told me
    her teachers were being scored
  • 2:18 - 2:20
    with a complex, secret algorithm
  • 2:20 - 2:22
    called the "value-added model."
  • 2:23 - 2:26
    I told her, "Well, figure out
    what the formula is, show it to me.
  • 2:26 - 2:27
    I'm going to explain it to you."
  • 2:27 - 2:29
    She said, "Well, I tried
    to get the formula,
  • 2:29 - 2:32
    but my Department of Education contact
    told me it was math
  • 2:32 - 2:34
    and I wouldn't understand it."
  • 2:35 - 2:37
    It gets worse.
  • 2:37 - 2:40
    The New York Post filed
    a Freedom of Information Act request,
  • 2:40 - 2:43
    got all the teachers' names
    and all their scores
  • 2:43 - 2:46
    and they published them
    as an act of teacher-shaming.
  • 2:47 - 2:51
    When I tried to get the formulas,
    the source code, through the same means,
  • 2:51 - 2:53
    I was told I couldn't.
  • 2:53 - 2:54
    I was denied.
  • 2:54 - 2:56
    I later found out
  • 2:56 - 2:58
    that nobody in New York City
    had access to that formula.
  • 2:58 - 3:00
    No one understood it.
  • 3:02 - 3:05
    Then someone really smart
    got involved, Gary Rubinstein.
  • 3:05 - 3:09
    He found 665 teachers
    from that New York Post data
  • 3:09 - 3:11
    that actually had two scores.
  • 3:11 - 3:13
    That could happen if they were teaching
  • 3:13 - 3:15
    seventh grade math and eighth grade math.
  • 3:15 - 3:17
    He decided to plot them.
  • 3:17 - 3:19
    Each dot represents a teacher.
  • 3:19 - 3:21
    (Laughter)
  • 3:22 - 3:23
    What is that?
  • 3:23 - 3:24
    (Laughter)
  • 3:24 - 3:28
    That should never have been used
    for individual assessment.
  • 3:28 - 3:30
    It's almost a random number generator.
  • 3:30 - 3:33
    (Applause)
  • 3:33 - 3:34
    But it was.
  • 3:34 - 3:35
    This is Sarah Wysocki.
  • 3:35 - 3:37
    She got fired, along
    with 205 other teachers,
  • 3:37 - 3:40
    from the Washington, DC school district,
  • 3:40 - 3:43
    even though she had great
    recommendations from her principal
  • 3:43 - 3:44
    and the parents of her kids.
  • 3:45 - 3:47
    I know what a lot
    of you guys are thinking,
  • 3:47 - 3:50
    especially the data scientists,
    the AI experts here.
  • 3:50 - 3:54
    You're thinking, "Well, I would never make
    an algorithm that inconsistent."
  • 3:55 - 3:57
    But algorithms can go wrong,
  • 3:57 - 4:01
    even have deeply destructive effects
    with good intentions.
  • 4:03 - 4:05
    And whereas an airplane
    that's designed badly
  • 4:05 - 4:07
    crashes to the earth and everyone sees it,
  • 4:07 - 4:09
    an algorithm designed badly
  • 4:10 - 4:14
    can go on for a long time,
    silently wreaking havoc.
  • 4:16 - 4:17
    This is Roger Ailes.
  • 4:17 - 4:19
    (Laughter)
  • 4:21 - 4:23
    He founded Fox News in 1996.
  • 4:23 - 4:26
    More than 20 women complained
    about sexual harassment.
  • 4:26 - 4:29
    They said they weren't allowed
    to succeed at Fox News.
  • 4:29 - 4:32
    He was ousted last year,
    but we've seen recently
  • 4:32 - 4:35
    that the problems have persisted.
  • 4:36 - 4:37
    That begs the question:
  • 4:37 - 4:40
    What should Fox News do
    to turn over another leaf?
  • 4:41 - 4:44
    Well, what if they replaced
    their hiring process
  • 4:44 - 4:46
    with a machine-learning algorithm?
  • 4:46 - 4:48
    That sounds good, right?
  • 4:48 - 4:49
    Think about it.
  • 4:49 - 4:51
    The data, what would the data be?
  • 4:51 - 4:56
    A reasonable choice would be the last
    21 years of applications to Fox News.
  • 4:56 - 4:58
    Reasonable.
  • 4:58 - 4:59
    What about the definition of success?
  • 5:00 - 5:01
    Reasonable choice would be,
  • 5:01 - 5:03
    well, who is successful at Fox News?
  • 5:03 - 5:07
    I guess someone who, say,
    stayed there for four years
  • 5:07 - 5:08
    and was promoted at least once.
  • 5:09 - 5:10
    Sounds reasonable.
  • 5:10 - 5:13
    And then the algorithm would be trained.
  • 5:13 - 5:17
    It would be trained to look for people
    to learn what led to success,
  • 5:17 - 5:22
    what kind of applications
    historically led to success
  • 5:22 - 5:23
    by that definition.
  • 5:24 - 5:26
    Now think about what would happen
  • 5:26 - 5:29
    if we applied that
    to a current pool of applicants.
  • 5:29 - 5:31
    It would filter out women
  • 5:32 - 5:36
    because they do not look like people
    who were successful in the past.
  • 5:40 - 5:42
    Algorithms don't make things fair
  • 5:42 - 5:45
    if you just blithely,
    blindly apply algorithms.
  • 5:45 - 5:47
    They don't make things fair.
  • 5:47 - 5:49
    They repeat our past practices,
  • 5:49 - 5:50
    our patterns.
  • 5:50 - 5:52
    They automate the status quo.
  • 5:53 - 5:55
    That would be great
    if we had a perfect world,
  • 5:56 - 5:57
    but we don't.
  • 5:57 - 6:01
    And I'll add that most companies
    don't have embarrassing lawsuits,
  • 6:02 - 6:05
    but the data scientists in those companies
  • 6:05 - 6:07
    are told to follow the data,
  • 6:07 - 6:09
    to focus on accuracy.
  • 6:10 - 6:12
    Think about what that means.
  • 6:12 - 6:16
    Because we all have bias,
    it means they could be codifying sexism
  • 6:16 - 6:18
    or any other kind of bigotry.
  • 6:19 - 6:21
    Thought experiment,
  • 6:21 - 6:22
    because I like them:
  • 6:24 - 6:27
    an entirely segregated society --
  • 6:28 - 6:32
    racially segregated, all towns,
    all neighborhoods
  • 6:32 - 6:35
    and where we send the police
    only to the minority neighborhoods
  • 6:35 - 6:36
    to look for crime.
  • 6:36 - 6:39
    The arrest data would be very biased.
  • 6:40 - 6:42
    What if, on top of that,
    we found the data scientists
  • 6:42 - 6:47
    and paid the data scientists to predict
    where the next crime would occur?
  • 6:47 - 6:49
    Minority neighborhood.
  • 6:49 - 6:52
    Or to predict who the next
    criminal would be?
  • 6:53 - 6:54
    A minority.
  • 6:56 - 6:59
    The data scientists would brag
    about how great and how accurate
  • 7:00 - 7:01
    their model would be,
  • 7:01 - 7:02
    and they'd be right.
  • 7:04 - 7:09
    Now, reality isn't that drastic,
    but we do have severe segregations
  • 7:09 - 7:10
    in many cities and towns,
  • 7:10 - 7:12
    and we have plenty of evidence
  • 7:12 - 7:15
    of biased policing
    and justice system data.
  • 7:16 - 7:18
    And we actually do predict hotspots,
  • 7:18 - 7:20
    places where crimes will occur.
  • 7:20 - 7:24
    And we do predict, in fact,
    the individual criminality,
  • 7:24 - 7:26
    the criminality of individuals.
  • 7:27 - 7:31
    The news organization ProPublica
    recently looked into
  • 7:31 - 7:33
    one of those "recidivism risk" algorithms,
  • 7:33 - 7:34
    as they're called,
  • 7:34 - 7:37
    being used in Florida
    during sentencing by judges.
  • 7:38 - 7:42
    Bernard, on the left, the black man,
    was scored a 10 out of 10.
  • 7:43 - 7:45
    Dylan, on the right, 3 out of 10.
  • 7:45 - 7:48
    10 out of 10, high risk.
    3 out of 10, low risk.
  • 7:49 - 7:51
    They were both brought in
    for drug possession.
  • 7:51 - 7:52
    They both had records,
  • 7:52 - 7:55
    but Dylan had a felony
  • 7:55 - 7:56
    but Bernard didn't.
  • 7:58 - 8:01
    This matters, because
    the higher score you are,
  • 8:01 - 8:04
    the more likely you're being given
    a longer sentence.
  • 8:06 - 8:08
    What's going on?
  • 8:09 - 8:10
    Data laundering.
  • 8:11 - 8:15
    It's a process by which
    technologists hide ugly truths
  • 8:15 - 8:17
    inside black box algorithms
  • 8:17 - 8:19
    and call them objective;
  • 8:19 - 8:21
    call them meritocratic.
  • 8:23 - 8:26
    When they're secret,
    important and destructive,
  • 8:26 - 8:28
    I've coined a term for these algorithms:
  • 8:28 - 8:30
    "weapons of math destruction."
  • 8:30 - 8:32
    (Laughter)
  • 8:32 - 8:35
    (Applause)
  • 8:35 - 8:37
    They're everywhere,
    and it's not a mistake.
  • 8:38 - 8:41
    These are private companies
    building private algorithms
  • 8:41 - 8:43
    for private ends.
  • 8:43 - 8:46
    Even the ones I talked about
    for teachers and the public police,
  • 8:46 - 8:48
    those were built by private companies
  • 8:48 - 8:51
    and sold to the government institutions.
  • 8:51 - 8:52
    They call it their "secret sauce" --
  • 8:52 - 8:55
    that's why they can't tell us about it.
  • 8:55 - 8:57
    It's also private power.
  • 8:58 - 9:03
    They are profiting for wielding
    the authority of the inscrutable.
  • 9:05 - 9:08
    Now you might think,
    since all this stuff is private
  • 9:08 - 9:09
    and there's competition,
  • 9:09 - 9:12
    maybe the free market
    will solve this problem.
  • 9:12 - 9:13
    It won't.
  • 9:13 - 9:16
    There's a lot of money
    to be made in unfairness.
  • 9:17 - 9:20
    Also, we're not economic rational agents.
  • 9:21 - 9:22
    We all are biased.
  • 9:23 - 9:26
    We're all racist and bigoted
    in ways that we wish we weren't,
  • 9:26 - 9:28
    in ways that we don't even know.
  • 9:29 - 9:32
    We know this, though, in aggregate,
  • 9:32 - 9:36
    because sociologists
    have consistently demonstrated this
  • 9:36 - 9:37
    with these experiments they build,
  • 9:37 - 9:40
    where they send a bunch
    of applications to jobs out,
  • 9:40 - 9:42
    equally qualified but some
    have white-sounding names
  • 9:43 - 9:44
    and some have black-sounding names,
  • 9:44 - 9:47
    and it's always disappointing,
    the results -- always.
  • 9:48 - 9:49
    So we are the ones that are biased,
  • 9:49 - 9:53
    and we are injecting those biases
    into the algorithms
  • 9:53 - 9:55
    by choosing what data to collect,
  • 9:55 - 9:57
    like I chose not to think
    about ramen noodles --
  • 9:57 - 9:59
    I decided it was irrelevant.
  • 9:59 - 10:05
    But by trusting the data that's actually
    picking up on past practices
  • 10:05 - 10:07
    and by choosing the definition of success,
  • 10:07 - 10:11
    how can we expect the algorithms
    to emerge unscathed?
  • 10:11 - 10:13
    We can't. We have to check them.
  • 10:14 - 10:16
    We have to check them for fairness.
  • 10:16 - 10:19
    The good news is,
    we can check them for fairness.
  • 10:19 - 10:22
    Algorithms can be interrogated,
  • 10:22 - 10:24
    and they will tell us
    the truth every time.
  • 10:24 - 10:27
    And we can fix them.
    We can make them better.
  • 10:27 - 10:29
    I call this an algorithmic audit,
  • 10:29 - 10:31
    and I'll walk you through it.
  • 10:31 - 10:33
    First, data integrity check.
  • 10:34 - 10:37
    For the recidivism risk
    algorithm I talked about,
  • 10:38 - 10:41
    a data integrity check would mean
    we'd have to come to terms with the fact
  • 10:41 - 10:45
    that in the US, whites and blacks
    smoke pot at the same rate
  • 10:45 - 10:47
    but blacks are far more likely
    to be arrested --
  • 10:47 - 10:50
    four or five times more likely,
    depending on the area.
  • 10:51 - 10:54
    What is that bias looking like
    in other crime categories,
  • 10:54 - 10:56
    and how do we account for it?
  • 10:56 - 10:59
    Second, we should think about
    the definition of success,
  • 10:59 - 11:01
    audit that.
  • 11:01 - 11:03
    Remember -- with the hiring
    algorithm? We talked about it.
  • 11:03 - 11:07
    Someone who stays for four years
    and is promoted once?
  • 11:07 - 11:08
    Well, that is a successful employee,
  • 11:08 - 11:11
    but it's also an employee
    that is supported by their culture.
  • 11:12 - 11:14
    That said, also it can be quite biased.
  • 11:14 - 11:16
    We need to separate those two things.
  • 11:16 - 11:19
    We should look to
    the blind orchestra audition
  • 11:19 - 11:20
    as an example.
  • 11:20 - 11:23
    That's where the people auditioning
    are behind a sheet.
  • 11:23 - 11:25
    What I want to think about there
  • 11:25 - 11:28
    is the people who are listening
    have decided what's important
  • 11:28 - 11:30
    and they've decided what's not important,
  • 11:30 - 11:32
    and they're not getting
    distracted by that.
  • 11:33 - 11:36
    When the blind orchestra
    auditions started,
  • 11:36 - 11:39
    the number of women in orchestras
    went up by a factor of five.
  • 11:40 - 11:42
    Next, we have to consider accuracy.
  • 11:43 - 11:47
    This is where the value-added model
    for teachers would fail immediately.
  • 11:48 - 11:50
    No algorithm is perfect, of course,
  • 11:51 - 11:54
    so we have to consider
    the errors of every algorithm.
  • 11:55 - 11:59
    How often are there errors,
    and for whom does this model fail?
  • 12:00 - 12:02
    What is the cost of that failure?
  • 12:02 - 12:05
    And finally, we have to consider
  • 12:06 - 12:08
    the long-term effects of algorithms,
  • 12:09 - 12:11
    the feedback loops that are engendering.
  • 12:12 - 12:13
    That sounds abstract,
  • 12:13 - 12:16
    but imagine if Facebook engineers
    had considered that
  • 12:16 - 12:21
    before they decided to show us
    only things that our friends had posted.
  • 12:22 - 12:25
    I have two more messages,
    one for the data scientists out there.
  • 12:25 - 12:29
    Data scientists: we should
    not be the arbiters of truth.
  • 12:30 - 12:33
    We should be translators
    of ethical discussions that happen
  • 12:33 - 12:35
    in larger society.
  • 12:36 - 12:38
    (Applause)
  • 12:38 - 12:39
    And the rest of you,
  • 12:40 - 12:41
    the non-data scientists:
  • 12:41 - 12:43
    this is not a math test.
  • 12:44 - 12:45
    This is a political fight.
  • 12:47 - 12:50
    We need to demand accountability
    for our algorithmic overlords.
  • 12:52 - 12:54
    (Applause)
  • 12:54 - 12:58
    The era of blind faith
    in big data must end.
  • 12:58 - 12:59
    Thank you very much.
  • 12:59 - 13:04
    (Applause)
Title:
The era of blind faith in big data must end
Speaker:
Cathy O'Neil
Description:

Algorithms decide who gets a loan, who gets a job interview, who gets insurance and much more -- but they don't automatically make things fair. Mathematician and data scientist Cathy O'Neil coined a term for algorithms that are secret, important and harmful: "weapons of math destruction." Learn more about the hidden agendas behind the formulas.

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
13:18

English subtitles

Revisions Compare revisions