The era of blind faith in big data must end

0:01 - 0:03

Algorithms are everywhere.
0:03 - 0:08

They sort and separate
the winners from the losers.
0:08 - 0:10

The winners get the job
0:10 - 0:12

or a good credit card offer.
0:12 - 0:15

The losers don't even get an interview,
0:15 - 0:18

or they pay more for insurance.
0:18 - 0:21

We're being scored with secret formulas
that we don't understand
0:21 - 0:27

that often don't have systems of appeal.
0:27 - 0:29

That begs the question,
0:29 - 0:33

what if the algorithms are wrong?
0:33 - 0:35

To build an algorithm you need two things.
0:35 - 0:37

You need data, what happened in the past,
0:37 - 0:39

and a definition of success,
0:39 - 0:41

the thing you're looking for
and often hoping for.
0:41 - 0:44

You train an algorithm
0:44 - 0:47

by looking, figuring out.
0:47 - 0:50

The algorithm figures out
what is associated with success.
0:50 - 0:53

What situation leads to success?
0:53 - 0:55

Actually, everyone uses algorithms.
0:55 - 0:58

They just don't formalize them
in written code.
0:58 - 0:58

Let me give you an example.
0:58 - 1:01

I use an algorithm every day
to make a meal for my family.
1:01 - 1:04

The data I use
1:04 - 1:06

is the ingredients in my kitchen,
1:06 - 1:09

the time I have, the ambition I have,
1:09 - 1:11

and I curate that data.
1:11 - 1:17

I don't count those little
packages of ramen noodles as food.
1:17 - 1:19

My definition of success is,
1:19 - 1:22

a meal is successful
if my kids eat vegetables.
1:22 - 1:25

It's very different from
if my youngest son were in charge.
1:25 - 1:29

He'd say success is
if he gets to eat lots of Nutella.
1:29 - 1:31

But I get to choose success.
1:31 - 1:34

I am in charge. My opinion matters.
1:34 - 1:37

That's the first rule of algorithms.
1:37 - 1:42

Algorithms are opinions embedded in code.
1:42 - 1:45

It's really different from what
you think most people think of algorithms.
1:45 - 1:50

They think algorithms
are objective and true and scientific.
1:50 - 1:53

That's a marketing trick.
1:53 - 1:56

It's also a marketing trick
1:56 - 1:59

to intimidate you with algorithms,
1:59 - 2:02

to make you trust and fear algorithms
2:02 - 2:06

because you trust and fear mathematics.
2:06 - 2:07

A lot can go wrong
2:07 - 2:12

when we put blind faith in big data.
2:12 - 2:15

This is [??].
She's a high school principal in Brooklyn.
2:15 - 2:18

In 2011, she told me her teachers
were being scored
2:18 - 2:20

with a complex, secret algorithm
2:20 - 2:23

called the Value Added Model.
2:23 - 2:25

I told her, "Well, figure out
what the formula is.
2:25 - 2:27

Show it to me.
I'm going to explain it to you."
2:27 - 2:30

She said, "Well, I tried
to get the formula
2:30 - 2:31

but my Department
of Education contact
2:31 - 2:35

told me it was math
and I wouldn't understand it."
2:35 - 2:37

It gets worse.
2:37 - 2:38

The New York Post
2:38 - 2:40

filed a Freedom
of Information Act request,
2:40 - 2:43

got all the teachers' names
and all their scores,
2:43 - 2:47

and they published them
as an act of teacher shaming.
2:47 - 2:51

When I tried to get the formulas,
the source code, through the same means,
2:51 - 2:53

I was told I couldn't.
2:53 - 2:55

I was denied.
2:55 - 2:56

I later found out
2:56 - 2:59

that nobody in New York City
had access to that formula.
2:59 - 3:01

No one understood it.
3:01 - 3:04

Then someone really smart
got involved, Gary Rubenstein.
3:04 - 3:09

He found 665 teachers
from that New York Post data
3:09 - 3:11

that actually had two scores.
3:11 - 3:14

That could happen if they
were teaching seventh grade math
3:14 - 3:15

and eighth grade math.
3:15 - 3:18

He decided to plot them.
3:18 - 3:19

Each dot represents a teacher.
3:19 - 3:22

(Laughter)
3:22 - 3:24

What is that?
3:24 - 3:28

That should never have been used
for individual assessment.
3:28 - 3:30

It's almost a random number generator.
3:30 - 3:33

(Applause)
3:33 - 3:35

But it was. This is [[?]].
3:35 - 3:37

She got fired, along
with 205 other teachers,
3:37 - 3:40

from the Washington, DC school district
3:40 - 3:43

even though she had great
recommendations from her principal
3:43 - 3:46

and the parents of her kids.
3:46 - 3:48

I know what a lot
of you guys are thinking,
3:48 - 3:50

especially the data scientists,
the AI experts here.
3:50 - 3:52

You're thinking, "Well, I would
never make an algorithm
3:52 - 3:55

that inconsistent."
3:55 - 4:00

But algorithms can go wrong,
even have deeply destructive effects,
4:00 - 4:02

with good intentions.
4:02 - 4:05

And whereas an airplane
that's designed badly
4:05 - 4:07

crashes to the earth and everyone sees it,
4:07 - 4:09

an algorithm designed badly
4:09 - 4:13

can go on for a long time
4:13 - 4:16

silently wreaking havoc.
4:16 - 4:19

This is Roger Ailes.
4:19 - 4:24

He founded Fox News in 1996.
4:24 - 4:26

More than 20 women complained
about sexual harassment.
4:26 - 4:30

They said they weren't allowed
to succeed at Fox News.
4:30 - 4:32

He was ousted last year,
but we've seen recently
4:32 - 4:36

that the problems have persisted.
4:36 - 4:37

That begs the question,
4:37 - 4:41

what should Fox News do
to turn over another leaf?
4:41 - 4:44

Well, what if they replaced
their hiring process
4:44 - 4:46

with a machine learning algorithm?
4:46 - 4:48

That sounds good, right?
4:48 - 4:49

Think about it.
4:49 - 4:51

The data, what would the data be?
4:51 - 4:53

A reasonable choice would be
4:53 - 4:56

the last 21 years
of applications to Fox News.
4:56 - 4:58

Reasonable.
4:58 - 5:00

What about the definition of success?
5:00 - 5:01

Reasonable choice would be,
5:01 - 5:03

well, who is successful at Fox News?
5:03 - 5:07

I guess someone who, say,
stayed there for four years
5:07 - 5:09

and was promoted at least once.
5:09 - 5:11

Sounds reasonable.
5:11 - 5:13

And then the algorithm would be trained.
5:13 - 5:15

It would be trained to look for people
5:15 - 5:18

to learn what led to success,
5:18 - 5:20

what kind of applications
5:20 - 5:24

historically led to success
by that definition.
5:24 - 5:27

Now think about what would happen
if we applied that
5:27 - 5:29

to the current pool of applicants.
5:29 - 5:32

It would filter out women,
5:32 - 5:38

because they do not look like people
who were successful in the past.
5:38 - 5:42

Algorithms don't make things fair
5:42 - 5:45

if you just blithely,
blindly apply algorithms.
5:45 - 5:47

They don't make things fair.
5:47 - 5:49

They repeat our past practices,
5:49 - 5:50

our patterns.
5:50 - 5:53

They automate the status quo.
5:53 - 5:56

That would be great if we had
a perfect world, but we don't,
5:56 - 6:01

and I'll add that most companies
don't have embarrassing lawsuits,
6:01 - 6:05

but the data scientists in those companies
6:05 - 6:08

are told to follow the data,
6:08 - 6:10

to focus on accuracy.
6:10 - 6:12

Think about what that means.
6:12 - 6:13

Because we all have bias, it means
6:13 - 6:16

they could be codifying sexism
6:16 - 6:20

or any other kind of bigotry.
6:20 - 6:22

Thought experiment,
6:22 - 6:24

because I like them.
6:24 - 6:27

An entirely segregated society,
6:27 - 6:31

racially segregated, all towns,
all neighborhoods,
6:31 - 6:35

and where we send the police
only to the minority neighborhoods
6:35 - 6:37

to look for crime.
6:37 - 6:39

The arrest data would be very biased.
6:39 - 6:41

What if on top of that
6:41 - 6:44

we found the data scientists
and paid the data scientists
6:44 - 6:48

to predict where
the next crime would occur?
6:48 - 6:49

Minority neighborhood.
6:49 - 6:53

Or to predict who the next
criminal would be?
6:53 - 6:56

A minority.
6:56 - 7:01

The data scientists would brag
about how great and how accurate
7:01 - 7:01

their model would be,
7:01 - 7:03

and they'd be right.
7:03 - 7:08

Now, reality isn't that drastic,
but we do have severe segregations
7:08 - 7:10

in many cities and towns
7:10 - 7:12

and we have plenty of evidence
7:12 - 7:16

of biased policing
and justice system data.
7:16 - 7:19

And we actually do predict hotspots,
7:19 - 7:21

places where crimes will occur,
7:21 - 7:23

and we do predict, in fact,
7:23 - 7:25

the individual criminality,
7:25 - 7:27

the criminality of individuals.
7:27 - 7:30

The news organization Pro Publica
7:30 - 7:33

recently looked into one of those
recidivism risk algorithms,
7:33 - 7:34

as they're called,
7:34 - 7:39

being used in Florida during sentencing
7:39 - 7:40

by judges.
7:40 - 7:41

Bernard on the left, the black man,
7:41 - 7:43

was scored a 10 out of 10,
7:43 - 7:45

Dylan on the right three out of 10.
7:45 - 7:49

10 out of 10, high risk.
Three out of 10, low risk.
7:49 - 7:51

They were both brought in
for drug possession.
7:51 - 7:52

They both had records,
7:52 - 7:55

but Dylan had a felony
7:55 - 7:58

but Bernard didn't.
7:58 - 8:01

This matters, because
the higher score you are,
8:01 - 8:07

the more likely you're being
given a longer sentence.
8:07 - 8:09

What's going on?
8:09 - 8:11

Data laundering.
8:11 - 8:14

It's a process by which technologists
8:14 - 8:17

hide ugly truths inside
black box algorithms
8:17 - 8:20

and call them objective,
8:20 - 8:23

call them meritocratic.
8:23 - 8:26

When they're secret,
important, and destructive,
8:26 - 8:28

I've coined a term for these algorithms:
8:28 - 8:31

weapons of math destruction.
8:31 - 8:35

(Applause)
8:35 - 8:38

They're everywhere,
and it's not a mistake.
8:38 - 8:40

These are private companies
8:40 - 8:43

building private algorithms
for private ends.
8:43 - 8:46

Even the ones I talked about
for teachers and the public police,
8:46 - 8:51

those were built by private companies
and sold to the government institutions.
8:51 - 8:53

They call it their secret sauce.
8:53 - 8:55

That's why they can't tell us about it.
8:55 - 8:58

It's also private power.
8:58 - 9:05

They are profiting for wielding
the authority of the inscrutable.
9:05 - 9:08

Now you might think,
since all this stuff is private
9:08 - 9:09

and there's competition,
9:09 - 9:12

maybe the free market
will solve this problem.
9:12 - 9:13

It won't.
9:13 - 9:17

There's a lot of money
to be made in unfairness.
9:17 - 9:21

Also, we're not economic rational agents.
9:21 - 9:23

We all are biased.
9:23 - 9:27

We're all racist and bigoted
in ways that we wish we weren't,
9:27 - 9:30

in ways that we don't even know.
9:30 - 9:31

We know this though
9:31 - 9:32

in aggregate
9:32 - 9:36

because sociologists have
consistently demonstrated this
9:36 - 9:37

with these experiments they build
9:37 - 9:40

where they send a bunch
of applications to jobs out,
9:40 - 9:42

equally qualified but some
have white-sounding names
9:42 - 9:43

and some have black-sounding names,
9:43 - 9:48

and it's always disappointing,
the results, always.
9:48 - 9:50

So we are the ones that are biased,
9:50 - 9:52

and we are injecting those biases
9:52 - 9:53

into the algorithms by choosing
what data to collect,
9:53 - 9:57

like I chose not to think
about ramen noodles --
9:57 - 9:59

I decided it was irrelevant --
9:59 - 10:03

but by having the data,
trusting the data that's actually
10:03 - 10:05

picking up on past practices
10:05 - 10:07

and by choosing the definition of success.
10:07 - 10:11

How can we expect the algorithms
to emerge unscathed?
10:11 - 10:12

We can't. We have to check them.
10:12 - 10:14

We have to check them for fairness.
10:14 - 10:17

The good news is, we can
check them for fairness.
10:17 - 10:22

Algorithms can be interrogated,
10:22 - 10:24

and they will tell us
the truth every time.
10:24 - 10:27

And we can fix them.
We can make them better.
10:27 - 10:29

I call this an algorithmic audit,
10:29 - 10:31

and I'll walk you through it.
10:31 - 10:33

First, data integrity check.
10:33 - 10:37

For the recidivism risk
algorithm I talked about,
10:37 - 10:41

a data integrity check would mean
we have to come to terms with the fact
10:41 - 10:45

that in the US, whites and blacks
smoke pot at the same rate
10:45 - 10:48

but blacks are far more likely
to be arrested,
10:48 - 10:49

four or five times more likely
10:49 - 10:52

depending on the area.
10:52 - 10:54

What is that bias looking like
in other crime categories,
10:54 - 10:56

and how do we account for it?
10:56 - 10:59

Second, we should think about
the definition of success,
10:59 - 11:01

audit that.
11:01 - 11:03

Remember, with the hiring algorithm,
11:03 - 11:05

we talked about it, someone
who stays for four years
11:05 - 11:07

and is promoted once?
11:07 - 11:08

Well, that is a successful employee,
but it's also an employee
11:08 - 11:12

that is supported by their culture.
11:12 - 11:14

That also can be quite biased.
We need to separate those two things.
11:14 - 11:20

We should look to
the blind orchestra audition
11:20 - 11:23

as an example.
11:23 - 11:23

That's where the people auditioning
are behind a sheet.
11:23 - 11:25

What I want to think about there
11:25 - 11:27

is the people who are listening
have decided what's important
11:27 - 11:31

and they've decided what's not important,
11:31 - 11:33

and they're not getting
distracted by that.
11:33 - 11:36

When the blind orchestra
auditions started,
11:36 - 11:40

the number of women in orchestras
went up by a factor of five.
11:40 - 11:43

Next, we have to consider accuracy.
11:43 - 11:48

This is where the Value Added Model
for teachers would fail immediately.
11:48 - 11:51

No algorithm is perfect, of course,
11:51 - 11:53

so we have to consider
the errors of every algorithm.
11:53 - 12:00

How often are there errors,
and for whom does this model fail?
12:00 - 12:03

What is the cost of that failure?
12:03 - 12:06

And finally, we have to consider
12:06 - 12:09

the long-term effects of algorithms,
12:09 - 12:12

the feedback loops that are engendered.
12:12 - 12:16

That sounds abstract, but imagine
if Facebook engineers had considered that
12:16 - 12:22

before they decided to show us
only things that our friends had posted.
12:22 - 12:26

I have two more messages,
one for the data scientists out there.
12:26 - 12:30

Data scientists, we should
not be the arbiters of truth.
12:30 - 12:33

We should be translators
of ethical discussions that happen
12:33 - 12:36

in larger society.
12:36 - 12:39

(Applause)
12:39 - 12:40

And the rest of you,
12:40 - 12:44

the non-data scientists,
this is not a math test.
12:44 - 12:47

This is a political fight.
12:47 - 12:52

We need to demand accountability
for our algorithmic overlords.
12:52 - 12:54

(Applause)
12:54 - 12:59

The era of blind faith
in big data must end.
12:59 - 13:00

Thank you very much.
13:00 - 13:06

(Applause)

Title:: The era of blind faith in big data must end
Speaker:: Cathy O'Neil
Description:: more » « less
Video Language:: English
Team:: closed TED
Project:: TEDTalks
Duration:: 13:18

	Brian Greene edited English subtitles for The era of blind faith in big data must end
	Yasushi Aoki commented on English subtitles for The era of blind faith in big data must end
	Brian Greene edited English subtitles for The era of blind faith in big data must end
	Brian Greene edited English subtitles for The era of blind faith in big data must end
	Brian Greene edited English subtitles for The era of blind faith in big data must end
	Brian Greene edited English subtitles for The era of blind faith in big data must end
	Brian Greene approved English subtitles for The era of blind faith in big data must end
	Brian Greene edited English subtitles for The era of blind faith in big data must end

Show all

Yasushi Aoki

Rubenstein -> Rubinstein

English subtitles

Revisions Compare revisions

Revision 11 Edited

Brian Greene
Revision 10 Edited

Brian Greene
Revision 9 Edited

Brian Greene
Revision 8 Edited

Brian Greene
Revision 7 Edited

Brian Greene
Revision 6 Edited

Brian Greene
Revision 5 Edited

Brian Greene
Revision 4 Edited

Brian Greene
Revision 3 Edited

Camille Martínez
Revision 2 Uploaded

Camille Martínez
Revision 1 Edited

Joseph Geni

	Revision Number	Author	Created
	11	Brian Greene
	10	Brian Greene
	9	Brian Greene
	8	Brian Greene
	7	Brian Greene
	6	Brian Greene
	5	Brian Greene
	4	Brian Greene
	3	Camille Martínez
	2	Camille Martínez
	1	Joseph Geni

The era of blind faith in big data must end

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)