Return to Video

The era of blind faith in big data must end

  • 0:01 - 0:03
    Algorithms are everywhere.
  • 0:03 - 0:08
    They sort and separate
    the winners from the losers.
  • 0:08 - 0:10
    The winners get the job
  • 0:10 - 0:12
    or a good credit card offer.
  • 0:12 - 0:15
    The losers don't even get an interview,
  • 0:15 - 0:18
    or they pay more for insurance.
  • 0:18 - 0:21
    We're being scored with secret formulas
    that we don't understand
  • 0:21 - 0:27
    that often don't have systems of appeal.
  • 0:27 - 0:29
    That begs the question,
  • 0:29 - 0:33
    what if the algorithms are wrong?
  • 0:33 - 0:35
    To build an algorithm you need two things.
  • 0:35 - 0:37
    You need data, what happened in the past,
  • 0:37 - 0:39
    and a definition of success,
  • 0:39 - 0:41
    the thing you're looking for
    and often hoping for.
  • 0:41 - 0:44
    You train an algorithm
  • 0:44 - 0:47
    by looking, figuring out.
  • 0:47 - 0:50
    The algorithm figures out
    what is associated with success.
  • 0:50 - 0:53
    What situation leads to success?
  • 0:53 - 0:55
    Actually, everyone uses algorithms.
  • 0:55 - 0:58
    They just don't formalize them
    in written code.
  • 0:58 - 0:58
    Let me give you an example.
  • 0:58 - 1:01
    I use an algorithm every day
    to make a meal for my family.
  • 1:01 - 1:04
    The data I use
  • 1:04 - 1:06
    is the ingredients in my kitchen,
  • 1:06 - 1:09
    the time I have, the ambition I have,
  • 1:09 - 1:11
    and I curate that data.
  • 1:11 - 1:17
    I don't count those little
    packages of ramen noodles as food.
  • 1:17 - 1:19
    My definition of success is,
  • 1:19 - 1:22
    a meal is successful
    if my kids eat vegetables.
  • 1:22 - 1:25
    It's very different from
    if my youngest son were in charge.
  • 1:25 - 1:29
    He'd say success is
    if he gets to eat lots of Nutella.
  • 1:29 - 1:31
    But I get to choose success.
  • 1:31 - 1:34
    I am in charge. My opinion matters.
  • 1:34 - 1:37
    That's the first rule of algorithms.
  • 1:37 - 1:42
    Algorithms are opinions embedded in code.
  • 1:42 - 1:45
    It's really different from what
    you think most people think of algorithms.
  • 1:45 - 1:50
    They think algorithms
    are objective and true and scientific.
  • 1:50 - 1:53
    That's a marketing trick.
  • 1:53 - 1:56
    It's also a marketing trick
  • 1:56 - 1:59
    to intimidate you with algorithms,
  • 1:59 - 2:02
    to make you trust and fear algorithms
  • 2:02 - 2:06
    because you trust and fear mathematics.
  • 2:06 - 2:07
    A lot can go wrong
  • 2:07 - 2:12
    when we put blind faith in big data.
  • 2:12 - 2:15
    This is [??].
    She's a high school principal in Brooklyn.
  • 2:15 - 2:18
    In 2011, she told me her teachers
    were being scored
  • 2:18 - 2:20
    with a complex, secret algorithm
  • 2:20 - 2:23
    called the Value Added Model.
  • 2:23 - 2:25
    I told her, "Well, figure out
    what the formula is.
  • 2:25 - 2:27
    Show it to me.
    I'm going to explain it to you."
  • 2:27 - 2:30
    She said, "Well, I tried
    to get the formula
  • 2:30 - 2:31
    but my Department
    of Education contact
  • 2:31 - 2:35
    told me it was math
    and I wouldn't understand it."
  • 2:35 - 2:37
    It gets worse.
  • 2:37 - 2:38
    The New York Post
  • 2:38 - 2:40
    filed a Freedom
    of Information Act request,
  • 2:40 - 2:43
    got all the teachers' names
    and all their scores,
  • 2:43 - 2:47
    and they published them
    as an act of teacher shaming.
  • 2:47 - 2:51
    When I tried to get the formulas,
    the source code, through the same means,
  • 2:51 - 2:53
    I was told I couldn't.
  • 2:53 - 2:55
    I was denied.
  • 2:55 - 2:56
    I later found out
  • 2:56 - 2:59
    that nobody in New York City
    had access to that formula.
  • 2:59 - 3:01
    No one understood it.
  • 3:01 - 3:04
    Then someone really smart
    got involved, Gary Rubenstein.
  • 3:04 - 3:09
    He found 665 teachers
    from that New York Post data
  • 3:09 - 3:11
    that actually had two scores.
  • 3:11 - 3:14
    That could happen if they
    were teaching seventh grade math
  • 3:14 - 3:15
    and eighth grade math.
  • 3:15 - 3:18
    He decided to plot them.
  • 3:18 - 3:19
    Each dot represents a teacher.
  • 3:19 - 3:22
    (Laughter)
  • 3:22 - 3:24
    What is that?
  • 3:24 - 3:28
    That should never have been used
    for individual assessment.
  • 3:28 - 3:30
    It's almost a random number generator.
  • 3:30 - 3:33
    (Applause)
  • 3:33 - 3:35
    But it was. This is [[?]].
  • 3:35 - 3:37
    She got fired, along
    with 205 other teachers,
  • 3:37 - 3:40
    from the Washington, DC school district
  • 3:40 - 3:43
    even though she had great
    recommendations from her principal
  • 3:43 - 3:46
    and the parents of her kids.
  • 3:46 - 3:48
    I know what a lot
    of you guys are thinking,
  • 3:48 - 3:50
    especially the data scientists,
    the AI experts here.
  • 3:50 - 3:52
    You're thinking, "Well, I would
    never make an algorithm
  • 3:52 - 3:55
    that inconsistent."
  • 3:55 - 4:00
    But algorithms can go wrong,
    even have deeply destructive effects,
  • 4:00 - 4:02
    with good intentions.
  • 4:02 - 4:05
    And whereas an airplane
    that's designed badly
  • 4:05 - 4:07
    crashes to the earth and everyone sees it,
  • 4:07 - 4:09
    an algorithm designed badly
  • 4:09 - 4:13
    can go on for a long time
  • 4:13 - 4:16
    silently wreaking havoc.
  • 4:16 - 4:19
    This is Roger Ailes.
  • 4:19 - 4:24
    He founded Fox News in 1996.
  • 4:24 - 4:26
    More than 20 women complained
    about sexual harassment.
  • 4:26 - 4:30
    They said they weren't allowed
    to succeed at Fox News.
  • 4:30 - 4:32
    He was ousted last year,
    but we've seen recently
  • 4:32 - 4:36
    that the problems have persisted.
  • 4:36 - 4:37
    That begs the question,
  • 4:37 - 4:41
    what should Fox News do
    to turn over another leaf?
  • 4:41 - 4:44
    Well, what if they replaced
    their hiring process
  • 4:44 - 4:46
    with a machine learning algorithm?
  • 4:46 - 4:48
    That sounds good, right?
  • 4:48 - 4:49
    Think about it.
  • 4:49 - 4:51
    The data, what would the data be?
  • 4:51 - 4:53
    A reasonable choice would be
  • 4:53 - 4:56
    the last 21 years
    of applications to Fox News.
  • 4:56 - 4:58
    Reasonable.
  • 4:58 - 5:00
    What about the definition of success?
  • 5:00 - 5:01
    Reasonable choice would be,
  • 5:01 - 5:03
    well, who is successful at Fox News?
  • 5:03 - 5:07
    I guess someone who, say,
    stayed there for four years
  • 5:07 - 5:09
    and was promoted at least once.
  • 5:09 - 5:11
    Sounds reasonable.
  • 5:11 - 5:13
    And then the algorithm would be trained.
  • 5:13 - 5:15
    It would be trained to look for people
  • 5:15 - 5:18
    to learn what led to success,
  • 5:18 - 5:20
    what kind of applications
  • 5:20 - 5:24
    historically led to success
    by that definition.
  • 5:24 - 5:27
    Now think about what would happen
    if we applied that
  • 5:27 - 5:29
    to the current pool of applicants.
  • 5:29 - 5:32
    It would filter out women,
  • 5:32 - 5:38
    because they do not look like people
    who were successful in the past.
  • 5:38 - 5:42
    Algorithms don't make things fair
  • 5:42 - 5:45
    if you just blithely,
    blindly apply algorithms.
  • 5:45 - 5:47
    They don't make things fair.
  • 5:47 - 5:49
    They repeat our past practices,
  • 5:49 - 5:50
    our patterns.
  • 5:50 - 5:53
    They automate the status quo.
  • 5:53 - 5:56
    That would be great if we had
    a perfect world, but we don't,
  • 5:56 - 6:01
    and I'll add that most companies
    don't have embarrassing lawsuits,
  • 6:01 - 6:05
    but the data scientists in those companies
  • 6:05 - 6:08
    are told to follow the data,
  • 6:08 - 6:10
    to focus on accuracy.
  • 6:10 - 6:12
    Think about what that means.
  • 6:12 - 6:13
    Because we all have bias, it means
  • 6:13 - 6:16
    they could be codifying sexism
  • 6:16 - 6:20
    or any other kind of bigotry.
  • 6:20 - 6:22
    Thought experiment,
  • 6:22 - 6:24
    because I like them.
  • 6:24 - 6:27
    An entirely segregated society,
  • 6:27 - 6:31
    racially segregated, all towns,
    all neighborhoods,
  • 6:31 - 6:35
    and where we send the police
    only to the minority neighborhoods
  • 6:35 - 6:37
    to look for crime.
  • 6:37 - 6:39
    The arrest data would be very biased.
  • 6:39 - 6:41
    What if on top of that
  • 6:41 - 6:44
    we found the data scientists
    and paid the data scientists
  • 6:44 - 6:48
    to predict where
    the next crime would occur?
  • 6:48 - 6:49
    Minority neighborhood.
  • 6:49 - 6:53
    Or to predict who the next
    criminal would be?
  • 6:53 - 6:56
    A minority.
  • 6:56 - 7:01
    The data scientists would brag
    about how great and how accurate
  • 7:01 - 7:01
    their model would be,
  • 7:01 - 7:03
    and they'd be right.
  • 7:03 - 7:08
    Now, reality isn't that drastic,
    but we do have severe segregations
  • 7:08 - 7:10
    in many cities and towns
  • 7:10 - 7:12
    and we have plenty of evidence
  • 7:12 - 7:16
    of biased policing
    and justice system data.
  • 7:16 - 7:19
    And we actually do predict hotspots,
  • 7:19 - 7:21
    places where crimes will occur,
  • 7:21 - 7:23
    and we do predict, in fact,
  • 7:23 - 7:25
    the individual criminality,
  • 7:25 - 7:27
    the criminality of individuals.
  • 7:27 - 7:30
    The news organization Pro Publica
  • 7:30 - 7:33
    recently looked into one of those
    recidivism risk algorithms,
  • 7:33 - 7:34
    as they're called,
  • 7:34 - 7:39
    being used in Florida during sentencing
  • 7:39 - 7:40
    by judges.
  • 7:40 - 7:41
    Bernard on the left, the black man,
  • 7:41 - 7:43
    was scored a 10 out of 10,
  • 7:43 - 7:45
    Dylan on the right three out of 10.
  • 7:45 - 7:49
    10 out of 10, high risk.
    Three out of 10, low risk.
  • 7:49 - 7:51
    They were both brought in
    for drug possession.
  • 7:51 - 7:52
    They both had records,
  • 7:52 - 7:55
    but Dylan had a felony
  • 7:55 - 7:58
    but Bernard didn't.
  • 7:58 - 8:01
    This matters, because
    the higher score you are,
  • 8:01 - 8:07
    the more likely you're being
    given a longer sentence.
  • 8:07 - 8:09
    What's going on?
  • 8:09 - 8:11
    Data laundering.
  • 8:11 - 8:14
    It's a process by which technologists
  • 8:14 - 8:17
    hide ugly truths inside
    black box algorithms
  • 8:17 - 8:20
    and call them objective,
  • 8:20 - 8:23
    call them meritocratic.
  • 8:23 - 8:26
    When they're secret,
    important, and destructive,
  • 8:26 - 8:28
    I've coined a term for these algorithms:
  • 8:28 - 8:31
    weapons of math destruction.
  • 8:31 - 8:35
    (Applause)
  • 8:35 - 8:38
    They're everywhere,
    and it's not a mistake.
  • 8:38 - 8:40
    These are private companies
  • 8:40 - 8:43
    building private algorithms
    for private ends.
  • 8:43 - 8:46
    Even the ones I talked about
    for teachers and the public police,
  • 8:46 - 8:51
    those were built by private companies
    and sold to the government institutions.
  • 8:51 - 8:53
    They call it their secret sauce.
  • 8:53 - 8:55
    That's why they can't tell us about it.
  • 8:55 - 8:58
    It's also private power.
  • 8:58 - 9:05
    They are profiting for wielding
    the authority of the inscrutable.
  • 9:05 - 9:08
    Now you might think,
    since all this stuff is private
  • 9:08 - 9:09
    and there's competition,
  • 9:09 - 9:12
    maybe the free market
    will solve this problem.
  • 9:12 - 9:13
    It won't.
  • 9:13 - 9:17
    There's a lot of money
    to be made in unfairness.
  • 9:17 - 9:21
    Also, we're not economic rational agents.
  • 9:21 - 9:23
    We all are biased.
  • 9:23 - 9:27
    We're all racist and bigoted
    in ways that we wish we weren't,
  • 9:27 - 9:30
    in ways that we don't even know.
  • 9:30 - 9:31
    We know this though
  • 9:31 - 9:32
    in aggregate
  • 9:32 - 9:36
    because sociologists have
    consistently demonstrated this
  • 9:36 - 9:37
    with these experiments they build
  • 9:37 - 9:40
    where they send a bunch
    of applications to jobs out,
  • 9:40 - 9:42
    equally qualified but some
    have white-sounding names
  • 9:42 - 9:43
    and some have black-sounding names,
  • 9:43 - 9:48
    and it's always disappointing,
    the results, always.
  • 9:48 - 9:50
    So we are the ones that are biased,
  • 9:50 - 9:52
    and we are injecting those biases
  • 9:52 - 9:53
    into the algorithms by choosing
    what data to collect,
  • 9:53 - 9:57
    like I chose not to think
    about ramen noodles --
  • 9:57 - 9:59
    I decided it was irrelevant --
  • 9:59 - 10:03
    but by having the data,
    trusting the data that's actually
  • 10:03 - 10:05
    picking up on past practices
  • 10:05 - 10:07
    and by choosing the definition of success.
  • 10:07 - 10:11
    How can we expect the algorithms
    to emerge unscathed?
  • 10:11 - 10:12
    We can't. We have to check them.
  • 10:12 - 10:14
    We have to check them for fairness.
  • 10:14 - 10:17
    The good news is, we can
    check them for fairness.
  • 10:17 - 10:22
    Algorithms can be interrogated,
  • 10:22 - 10:24
    and they will tell us
    the truth every time.
  • 10:24 - 10:27
    And we can fix them.
    We can make them better.
  • 10:27 - 10:29
    I call this an algorithmic audit,
  • 10:29 - 10:31
    and I'll walk you through it.
  • 10:31 - 10:33
    First, data integrity check.
  • 10:33 - 10:37
    For the recidivism risk
    algorithm I talked about,
  • 10:37 - 10:41
    a data integrity check would mean
    we have to come to terms with the fact
  • 10:41 - 10:45
    that in the US, whites and blacks
    smoke pot at the same rate
  • 10:45 - 10:48
    but blacks are far more likely
    to be arrested,
  • 10:48 - 10:49
    four or five times more likely
  • 10:49 - 10:52
    depending on the area.
  • 10:52 - 10:54
    What is that bias looking like
    in other crime categories,
  • 10:54 - 10:56
    and how do we account for it?
  • 10:56 - 10:59
    Second, we should think about
    the definition of success,
  • 10:59 - 11:01
    audit that.
  • 11:01 - 11:03
    Remember, with the hiring algorithm,
  • 11:03 - 11:05
    we talked about it, someone
    who stays for four years
  • 11:05 - 11:07
    and is promoted once?
  • 11:07 - 11:08
    Well, that is a successful employee,
    but it's also an employee
  • 11:08 - 11:12
    that is supported by their culture.
  • 11:12 - 11:14
    That also can be quite biased.
    We need to separate those two things.
  • 11:14 - 11:20
    We should look to
    the blind orchestra audition
  • 11:20 - 11:23
    as an example.
  • 11:23 - 11:23
    That's where the people auditioning
    are behind a sheet.
  • 11:23 - 11:25
    What I want to think about there
  • 11:25 - 11:27
    is the people who are listening
    have decided what's important
  • 11:27 - 11:31
    and they've decided what's not important,
  • 11:31 - 11:33
    and they're not getting
    distracted by that.
  • 11:33 - 11:36
    When the blind orchestra
    auditions started,
  • 11:36 - 11:40
    the number of women in orchestras
    went up by a factor of five.
  • 11:40 - 11:43
    Next, we have to consider accuracy.
  • 11:43 - 11:48
    This is where the Value Added Model
    for teachers would fail immediately.
  • 11:48 - 11:51
    No algorithm is perfect, of course,
  • 11:51 - 11:53
    so we have to consider
    the errors of every algorithm.
  • 11:53 - 12:00
    How often are there errors,
    and for whom does this model fail?
  • 12:00 - 12:03
    What is the cost of that failure?
  • 12:03 - 12:06
    And finally, we have to consider
  • 12:06 - 12:09
    the long-term effects of algorithms,
  • 12:09 - 12:12
    the feedback loops that are engendered.
  • 12:12 - 12:16
    That sounds abstract, but imagine
    if Facebook engineers had considered that
  • 12:16 - 12:22
    before they decided to show us
    only things that our friends had posted.
  • 12:22 - 12:26
    I have two more messages,
    one for the data scientists out there.
  • 12:26 - 12:30
    Data scientists, we should
    not be the arbiters of truth.
  • 12:30 - 12:33
    We should be translators
    of ethical discussions that happen
  • 12:33 - 12:36
    in larger society.
  • 12:36 - 12:39
    (Applause)
  • 12:39 - 12:40
    And the rest of you,
  • 12:40 - 12:44
    the non-data scientists,
    this is not a math test.
  • 12:44 - 12:47
    This is a political fight.
  • 12:47 - 12:52
    We need to demand accountability
    for our algorithmic overlords.
  • 12:52 - 12:54
    (Applause)
  • 12:54 - 12:59
    The era of blind faith
    in big data must end.
  • 12:59 - 13:00
    Thank you very much.
  • 13:00 - 13:06
    (Applause)
Title:
The era of blind faith in big data must end
Speaker:
Cathy O'Neil
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
13:18

English subtitles

Revisions Compare revisions