The era of blind faith in big data must end
-
0:01 - 0:03Algorithms are everywhere.
-
0:04 - 0:07They sort and separate
the winners from the losers. -
0:08 - 0:10The winners get the job
-
0:10 - 0:12or a good credit card offer.
-
0:12 - 0:15The losers don't even get an interview
-
0:16 - 0:17or they pay more for insurance.
-
0:18 - 0:22We're being scored with secret formulas
that we don't understand -
0:23 - 0:26that often don't have systems of appeal.
-
0:27 - 0:29That begs the question:
-
0:29 - 0:31What if the algorithms are wrong?
-
0:33 - 0:35To build an algorithm you need two things:
-
0:35 - 0:37you need data, what happened in the past,
-
0:37 - 0:39and a definition of success,
-
0:39 - 0:41the thing you're looking for
and often hoping for. -
0:41 - 0:46You train an algorithm
by looking, figuring out. -
0:46 - 0:50The algorithm figures out
what is associated with success. -
0:50 - 0:52What situation leads to success?
-
0:53 - 0:55Actually, everyone uses algorithms.
-
0:55 - 0:57They just don't formalize them
in written code. -
0:57 - 0:59Let me give you an example.
-
0:59 - 1:02I use an algorithm every day
to make a meal for my family. -
1:02 - 1:04The data I use
-
1:04 - 1:06is the ingredients in my kitchen,
-
1:06 - 1:08the time I have,
-
1:08 - 1:09the ambition I have,
-
1:09 - 1:11and I curate that data.
-
1:11 - 1:15I don't count those little packages
of ramen noodles as food. -
1:15 - 1:17(Laughter)
-
1:17 - 1:19My definition of success is:
-
1:19 - 1:21a meal is successful
if my kids eat vegetables. -
1:22 - 1:25It's very different
from if my youngest son were in charge. -
1:25 - 1:28He'd say success is if
he gets to eat lots of Nutella. -
1:29 - 1:31But I get to choose success.
-
1:31 - 1:34I am in charge. My opinion matters.
-
1:34 - 1:37That's the first rule of algorithms.
-
1:37 - 1:40Algorithms are opinions embedded in code.
-
1:42 - 1:45It's really different from what you think
most people think of algorithms. -
1:45 - 1:50They think algorithms are objective
and true and scientific. -
1:50 - 1:52That's a marketing trick.
-
1:53 - 1:55It's also a marketing trick
-
1:55 - 1:59to intimidate you with algorithms,
-
1:59 - 2:02to make you trust and fear algorithms
-
2:02 - 2:04because you trust and fear mathematics.
-
2:06 - 2:10A lot can go wrong when we put
blind faith in big data. -
2:12 - 2:15This is Kiri Soares.
She's a high school principal in Brooklyn. -
2:15 - 2:18In 2011, she told me
her teachers were being scored -
2:18 - 2:20with a complex, secret algorithm
-
2:20 - 2:22called the "value-added model."
-
2:23 - 2:26I told her, "Well, figure out
what the formula is, show it to me. -
2:26 - 2:27I'm going to explain it to you."
-
2:27 - 2:29She said, "Well, I tried
to get the formula, -
2:29 - 2:32but my Department of Education contact
told me it was math -
2:32 - 2:34and I wouldn't understand it."
-
2:35 - 2:37It gets worse.
-
2:37 - 2:40The New York Post filed
a Freedom of Information Act request, -
2:40 - 2:43got all the teachers' names
and all their scores -
2:43 - 2:46and they published them
as an act of teacher-shaming. -
2:47 - 2:51When I tried to get the formulas,
the source code, through the same means, -
2:51 - 2:53I was told I couldn't.
-
2:53 - 2:54I was denied.
-
2:54 - 2:56I later found out
-
2:56 - 2:58that nobody in New York City
had access to that formula. -
2:58 - 3:00No one understood it.
-
3:02 - 3:05Then someone really smart
got involved, Gary Rubinstein. -
3:05 - 3:09He found 665 teachers
from that New York Post data -
3:09 - 3:11that actually had two scores.
-
3:11 - 3:13That could happen if they were teaching
-
3:13 - 3:15seventh grade math and eighth grade math.
-
3:15 - 3:17He decided to plot them.
-
3:17 - 3:19Each dot represents a teacher.
-
3:19 - 3:21(Laughter)
-
3:22 - 3:23What is that?
-
3:23 - 3:24(Laughter)
-
3:24 - 3:28That should never have been used
for individual assessment. -
3:28 - 3:30It's almost a random number generator.
-
3:30 - 3:33(Applause)
-
3:33 - 3:34But it was.
-
3:34 - 3:35This is Sarah Wysocki.
-
3:35 - 3:37She got fired, along
with 205 other teachers, -
3:37 - 3:40from the Washington, DC school district,
-
3:40 - 3:43even though she had great
recommendations from her principal -
3:43 - 3:44and the parents of her kids.
-
3:45 - 3:47I know what a lot
of you guys are thinking, -
3:47 - 3:50especially the data scientists,
the AI experts here. -
3:50 - 3:54You're thinking, "Well, I would never make
an algorithm that inconsistent." -
3:55 - 3:57But algorithms can go wrong,
-
3:57 - 4:01even have deeply destructive effects
with good intentions. -
4:03 - 4:05And whereas an airplane
that's designed badly -
4:05 - 4:07crashes to the earth and everyone sees it,
-
4:07 - 4:09an algorithm designed badly
-
4:10 - 4:14can go on for a long time,
silently wreaking havoc. -
4:16 - 4:17This is Roger Ailes.
-
4:17 - 4:19(Laughter)
-
4:21 - 4:23He founded Fox News in 1996.
-
4:23 - 4:26More than 20 women complained
about sexual harassment. -
4:26 - 4:29They said they weren't allowed
to succeed at Fox News. -
4:29 - 4:32He was ousted last year,
but we've seen recently -
4:32 - 4:35that the problems have persisted.
-
4:36 - 4:37That begs the question:
-
4:37 - 4:40What should Fox News do
to turn over another leaf? -
4:41 - 4:44Well, what if they replaced
their hiring process -
4:44 - 4:46with a machine-learning algorithm?
-
4:46 - 4:48That sounds good, right?
-
4:48 - 4:49Think about it.
-
4:49 - 4:51The data, what would the data be?
-
4:51 - 4:56A reasonable choice would be the last
21 years of applications to Fox News. -
4:56 - 4:58Reasonable.
-
4:58 - 4:59What about the definition of success?
-
5:00 - 5:01Reasonable choice would be,
-
5:01 - 5:03well, who is successful at Fox News?
-
5:03 - 5:07I guess someone who, say,
stayed there for four years -
5:07 - 5:08and was promoted at least once.
-
5:09 - 5:10Sounds reasonable.
-
5:10 - 5:13And then the algorithm would be trained.
-
5:13 - 5:17It would be trained to look for people
to learn what led to success, -
5:17 - 5:22what kind of applications
historically led to success -
5:22 - 5:23by that definition.
-
5:24 - 5:26Now think about what would happen
-
5:26 - 5:29if we applied that
to a current pool of applicants. -
5:29 - 5:31It would filter out women
-
5:32 - 5:36because they do not look like people
who were successful in the past. -
5:40 - 5:42Algorithms don't make things fair
-
5:42 - 5:45if you just blithely,
blindly apply algorithms. -
5:45 - 5:47They don't make things fair.
-
5:47 - 5:49They repeat our past practices,
-
5:49 - 5:50our patterns.
-
5:50 - 5:52They automate the status quo.
-
5:53 - 5:55That would be great
if we had a perfect world, -
5:56 - 5:57but we don't.
-
5:57 - 6:01And I'll add that most companies
don't have embarrassing lawsuits, -
6:02 - 6:05but the data scientists in those companies
-
6:05 - 6:07are told to follow the data,
-
6:07 - 6:09to focus on accuracy.
-
6:10 - 6:12Think about what that means.
-
6:12 - 6:16Because we all have bias,
it means they could be codifying sexism -
6:16 - 6:18or any other kind of bigotry.
-
6:19 - 6:21Thought experiment,
-
6:21 - 6:22because I like them:
-
6:24 - 6:27an entirely segregated society --
-
6:28 - 6:32racially segregated, all towns,
all neighborhoods -
6:32 - 6:35and where we send the police
only to the minority neighborhoods -
6:35 - 6:36to look for crime.
-
6:36 - 6:39The arrest data would be very biased.
-
6:40 - 6:42What if, on top of that,
we found the data scientists -
6:42 - 6:47and paid the data scientists to predict
where the next crime would occur? -
6:47 - 6:49Minority neighborhood.
-
6:49 - 6:52Or to predict who the next
criminal would be? -
6:53 - 6:54A minority.
-
6:56 - 6:59The data scientists would brag
about how great and how accurate -
7:00 - 7:01their model would be,
-
7:01 - 7:02and they'd be right.
-
7:04 - 7:09Now, reality isn't that drastic,
but we do have severe segregations -
7:09 - 7:10in many cities and towns,
-
7:10 - 7:12and we have plenty of evidence
-
7:12 - 7:15of biased policing
and justice system data. -
7:16 - 7:18And we actually do predict hotspots,
-
7:18 - 7:20places where crimes will occur.
-
7:20 - 7:24And we do predict, in fact,
the individual criminality, -
7:24 - 7:26the criminality of individuals.
-
7:27 - 7:31The news organization ProPublica
recently looked into -
7:31 - 7:33one of those "recidivism risk" algorithms,
-
7:33 - 7:34as they're called,
-
7:34 - 7:37being used in Florida
during sentencing by judges. -
7:38 - 7:42Bernard, on the left, the black man,
was scored a 10 out of 10. -
7:43 - 7:45Dylan, on the right, 3 out of 10.
-
7:45 - 7:4810 out of 10, high risk.
3 out of 10, low risk. -
7:49 - 7:51They were both brought in
for drug possession. -
7:51 - 7:52They both had records,
-
7:52 - 7:55but Dylan had a felony
-
7:55 - 7:56but Bernard didn't.
-
7:58 - 8:01This matters, because
the higher score you are, -
8:01 - 8:04the more likely you're being given
a longer sentence. -
8:06 - 8:08What's going on?
-
8:09 - 8:10Data laundering.
-
8:11 - 8:15It's a process by which
technologists hide ugly truths -
8:15 - 8:17inside black box algorithms
-
8:17 - 8:19and call them objective;
-
8:19 - 8:21call them meritocratic.
-
8:23 - 8:26When they're secret,
important and destructive, -
8:26 - 8:28I've coined a term for these algorithms:
-
8:28 - 8:30"weapons of math destruction."
-
8:30 - 8:32(Laughter)
-
8:32 - 8:35(Applause)
-
8:35 - 8:37They're everywhere,
and it's not a mistake. -
8:38 - 8:41These are private companies
building private algorithms -
8:41 - 8:43for private ends.
-
8:43 - 8:46Even the ones I talked about
for teachers and the public police, -
8:46 - 8:48those were built by private companies
-
8:48 - 8:51and sold to the government institutions.
-
8:51 - 8:52They call it their "secret sauce" --
-
8:52 - 8:55that's why they can't tell us about it.
-
8:55 - 8:57It's also private power.
-
8:58 - 9:03They are profiting for wielding
the authority of the inscrutable. -
9:05 - 9:08Now you might think,
since all this stuff is private -
9:08 - 9:09and there's competition,
-
9:09 - 9:12maybe the free market
will solve this problem. -
9:12 - 9:13It won't.
-
9:13 - 9:16There's a lot of money
to be made in unfairness. -
9:17 - 9:20Also, we're not economic rational agents.
-
9:21 - 9:22We all are biased.
-
9:23 - 9:26We're all racist and bigoted
in ways that we wish we weren't, -
9:26 - 9:28in ways that we don't even know.
-
9:29 - 9:32We know this, though, in aggregate,
-
9:32 - 9:36because sociologists
have consistently demonstrated this -
9:36 - 9:37with these experiments they build,
-
9:37 - 9:40where they send a bunch
of applications to jobs out, -
9:40 - 9:42equally qualified but some
have white-sounding names -
9:43 - 9:44and some have black-sounding names,
-
9:44 - 9:47and it's always disappointing,
the results -- always. -
9:48 - 9:49So we are the ones that are biased,
-
9:49 - 9:53and we are injecting those biases
into the algorithms -
9:53 - 9:55by choosing what data to collect,
-
9:55 - 9:57like I chose not to think
about ramen noodles -- -
9:57 - 9:59I decided it was irrelevant.
-
9:59 - 10:05But by trusting the data that's actually
picking up on past practices -
10:05 - 10:07and by choosing the definition of success,
-
10:07 - 10:11how can we expect the algorithms
to emerge unscathed? -
10:11 - 10:13We can't. We have to check them.
-
10:14 - 10:16We have to check them for fairness.
-
10:16 - 10:19The good news is,
we can check them for fairness. -
10:19 - 10:22Algorithms can be interrogated,
-
10:22 - 10:24and they will tell us
the truth every time. -
10:24 - 10:27And we can fix them.
We can make them better. -
10:27 - 10:29I call this an algorithmic audit,
-
10:29 - 10:31and I'll walk you through it.
-
10:31 - 10:33First, data integrity check.
-
10:34 - 10:37For the recidivism risk
algorithm I talked about, -
10:38 - 10:41a data integrity check would mean
we'd have to come to terms with the fact -
10:41 - 10:45that in the US, whites and blacks
smoke pot at the same rate -
10:45 - 10:47but blacks are far more likely
to be arrested -- -
10:47 - 10:50four or five times more likely,
depending on the area. -
10:51 - 10:54What is that bias looking like
in other crime categories, -
10:54 - 10:56and how do we account for it?
-
10:56 - 10:59Second, we should think about
the definition of success, -
10:59 - 11:01audit that.
-
11:01 - 11:03Remember -- with the hiring
algorithm? We talked about it. -
11:03 - 11:07Someone who stays for four years
and is promoted once? -
11:07 - 11:08Well, that is a successful employee,
-
11:08 - 11:11but it's also an employee
that is supported by their culture. -
11:12 - 11:14That said, also it can be quite biased.
-
11:14 - 11:16We need to separate those two things.
-
11:16 - 11:19We should look to
the blind orchestra audition -
11:19 - 11:20as an example.
-
11:20 - 11:23That's where the people auditioning
are behind a sheet. -
11:23 - 11:25What I want to think about there
-
11:25 - 11:28is the people who are listening
have decided what's important -
11:28 - 11:30and they've decided what's not important,
-
11:30 - 11:32and they're not getting
distracted by that. -
11:33 - 11:36When the blind orchestra
auditions started, -
11:36 - 11:39the number of women in orchestras
went up by a factor of five. -
11:40 - 11:42Next, we have to consider accuracy.
-
11:43 - 11:47This is where the value-added model
for teachers would fail immediately. -
11:48 - 11:50No algorithm is perfect, of course,
-
11:51 - 11:54so we have to consider
the errors of every algorithm. -
11:55 - 11:59How often are there errors,
and for whom does this model fail? -
12:00 - 12:02What is the cost of that failure?
-
12:02 - 12:05And finally, we have to consider
-
12:06 - 12:08the long-term effects of algorithms,
-
12:09 - 12:11the feedback loops that are engendering.
-
12:12 - 12:13That sounds abstract,
-
12:13 - 12:16but imagine if Facebook engineers
had considered that -
12:16 - 12:21before they decided to show us
only things that our friends had posted. -
12:22 - 12:25I have two more messages,
one for the data scientists out there. -
12:25 - 12:29Data scientists: we should
not be the arbiters of truth. -
12:30 - 12:33We should be translators
of ethical discussions that happen -
12:33 - 12:35in larger society.
-
12:36 - 12:38(Applause)
-
12:38 - 12:39And the rest of you,
-
12:40 - 12:41the non-data scientists:
-
12:41 - 12:43this is not a math test.
-
12:44 - 12:45This is a political fight.
-
12:47 - 12:50We need to demand accountability
for our algorithmic overlords. -
12:52 - 12:54(Applause)
-
12:54 - 12:58The era of blind faith
in big data must end. -
12:58 - 12:59Thank you very much.
-
12:59 - 13:04(Applause)
- Title:
- The era of blind faith in big data must end
- Speaker:
- Cathy O'Neil
- Description:
-
Algorithms decide who gets a loan, who gets a job interview, who gets insurance and much more -- but they don't automatically make things fair. Mathematician and data scientist Cathy O'Neil coined a term for algorithms that are secret, important and harmful: "weapons of math destruction." Learn more about the hidden agendas behind the formulas.
- Video Language:
- English
- Team:
- closed TED
- Project:
- TEDTalks
- Duration:
- 13:18
Brian Greene edited English subtitles for The era of blind faith in big data must end | ||
Yasushi Aoki commented on English subtitles for The era of blind faith in big data must end | ||
Brian Greene edited English subtitles for The era of blind faith in big data must end | ||
Brian Greene edited English subtitles for The era of blind faith in big data must end | ||
Brian Greene edited English subtitles for The era of blind faith in big data must end | ||
Brian Greene edited English subtitles for The era of blind faith in big data must end | ||
Brian Greene approved English subtitles for The era of blind faith in big data must end | ||
Brian Greene edited English subtitles for The era of blind faith in big data must end |
Yasushi Aoki
Rubenstein -> Rubinstein