0:00:00.975,0:00:02.571
Algorithms are everywhere.

0:00:04.111,0:00:07.236
They sort and separate[br]the winners from the losers.

0:00:08.019,0:00:10.283
The winners get the job

0:00:10.307,0:00:12.050
or a good credit card offer.

0:00:12.074,0:00:14.725
The losers don't even get an interview

0:00:15.590,0:00:17.367
or they pay more for insurance.

0:00:18.197,0:00:21.746
We're being scored with secret formulas[br]that we don't understand

0:00:22.675,0:00:25.892
that often don't have systems of appeal.

0:00:27.240,0:00:28.536
That begs the question:

0:00:28.560,0:00:31.473
What if the algorithms are wrong?

0:00:33.100,0:00:35.140
To build an algorithm you need two things:

0:00:35.164,0:00:37.145
you need data, what happened in the past,

0:00:37.169,0:00:38.730
and a definition of success,

0:00:38.754,0:00:41.211
the thing you're looking for[br]and often hoping for.

0:00:41.235,0:00:46.272
You train an algorithm[br]by looking, figuring out.

0:00:46.296,0:00:49.715
The algorithm figures out[br]what is associated with success.

0:00:49.739,0:00:52.202
What situation leads to success?

0:00:52.881,0:00:54.643
Actually, everyone uses algorithms.

0:00:54.667,0:00:57.385
They just don't formalize them[br]in written code.

0:00:57.409,0:00:58.757
Let me give you an example.

0:00:58.781,0:01:02.097
I use an algorithm every day[br]to make a meal for my family.

0:01:02.121,0:01:03.597
The data I use

0:01:04.394,0:01:06.053
is the ingredients in my kitchen,

0:01:06.077,0:01:07.604
the time I have,

0:01:07.628,0:01:08.861
the ambition I have,

0:01:08.885,0:01:10.594
and I curate that data.

0:01:10.618,0:01:14.869
I don't count those little packages[br]of ramen noodles as food.

0:01:14.893,0:01:16.762
(Laughter)

0:01:16.786,0:01:18.631
My definition of success is:

0:01:18.655,0:01:21.314
a meal is successful[br]if my kids eat vegetables.

0:01:22.181,0:01:25.035
It's very different[br]from if my youngest son were in charge.

0:01:25.059,0:01:27.847
He'd say success is if[br]he gets to eat lots of Nutella.

0:01:29.179,0:01:31.405
But I get to choose success.

0:01:31.429,0:01:34.136
I am in charge. My opinion matters.

0:01:34.160,0:01:36.835
That's the first rule of algorithms.

0:01:36.859,0:01:40.039
Algorithms are opinions embedded in code.

0:01:41.562,0:01:45.225
It's really different from what you think[br]most people think of algorithms.

0:01:45.249,0:01:49.753
They think algorithms are objective[br]and true and scientific.

0:01:50.387,0:01:52.086
That's a marketing trick.

0:01:53.269,0:01:55.394
It's also a marketing trick

0:01:55.418,0:01:58.572
to intimidate you with algorithms,

0:01:58.596,0:02:02.257
to make you trust and fear algorithms

0:02:02.281,0:02:04.299
because you trust and fear mathematics.

0:02:05.567,0:02:10.397
A lot can go wrong when we put[br]blind faith in big data.

0:02:11.684,0:02:15.057
This is Kiri Soares.[br]She's a high school principal in Brooklyn.

0:02:15.081,0:02:17.667
In 2011, she told me[br]her teachers were being scored

0:02:17.691,0:02:20.418
with a complex, secret algorithm

0:02:20.442,0:02:21.931
called the "value-added model."

0:02:22.505,0:02:25.597
I told her, "Well, figure out[br]what the formula is, show it to me.

0:02:25.621,0:02:27.162
I'm going to explain it to you."

0:02:27.186,0:02:29.327
She said, "Well, I tried[br]to get the formula,

0:02:29.351,0:02:32.123
but my Department of Education contact[br]told me it was math

0:02:32.147,0:02:33.693
and I wouldn't understand it."

0:02:35.266,0:02:36.604
It gets worse.

0:02:36.628,0:02:40.158
The New York Post filed[br]a Freedom of Information Act request,

0:02:40.182,0:02:43.141
got all the teachers' names[br]and all their scores

0:02:43.165,0:02:45.947
and they published them[br]as an act of teacher-shaming.

0:02:47.084,0:02:50.944
When I tried to get the formulas,[br]the source code, through the same means,

0:02:50.968,0:02:53.117
I was told I couldn't.

0:02:53.141,0:02:54.377
I was denied.

0:02:54.401,0:02:55.575
I later found out

0:02:55.599,0:02:58.465
that nobody in New York City[br]had access to that formula.

0:02:58.489,0:02:59.794
No one understood it.

0:03:01.929,0:03:05.153
Then someone really smart[br]got involved, Gary Rubinstein.

0:03:05.177,0:03:08.798
He found 665 teachers[br]from that New York Post data

0:03:08.822,0:03:10.688
that actually had two scores.

0:03:10.712,0:03:12.593
That could happen if they were teaching

0:03:12.617,0:03:15.056
seventh grade math and eighth grade math.

0:03:15.080,0:03:16.618
He decided to plot them.

0:03:16.642,0:03:18.635
Each dot represents a teacher.

0:03:19.104,0:03:21.483
(Laughter)

0:03:21.507,0:03:23.028
What is that?

0:03:23.052,0:03:24.329
(Laughter)

0:03:24.353,0:03:27.799
That should never have been used[br]for individual assessment.

0:03:27.823,0:03:29.749
It's almost a random number generator.

0:03:29.773,0:03:32.719
(Applause)

0:03:32.743,0:03:33.905
But it was.

0:03:33.929,0:03:35.105
This is Sarah Wysocki.

0:03:35.129,0:03:37.304
She got fired, along[br]with 205 other teachers,

0:03:37.328,0:03:39.990
from the Washington, DC school district,

0:03:40.014,0:03:42.923
even though she had great[br]recommendations from her principal

0:03:42.947,0:03:44.375
and the parents of her kids.

0:03:45.390,0:03:47.422
I know what a lot[br]of you guys are thinking,

0:03:47.446,0:03:49.933
especially the data scientists,[br]the AI experts here.

0:03:49.957,0:03:54.183
You're thinking, "Well, I would never make[br]an algorithm that inconsistent."

0:03:54.853,0:03:56.536
But algorithms can go wrong,

0:03:56.560,0:04:01.158
even have deeply destructive effects[br]with good intentions.

0:04:02.531,0:04:04.910
And whereas an airplane[br]that's designed badly

0:04:04.934,0:04:06.935
crashes to the earth and everyone sees it,

0:04:06.959,0:04:08.809
an algorithm designed badly

0:04:10.245,0:04:14.110
can go on for a long time,[br]silently wreaking havoc.

0:04:15.748,0:04:17.318
This is Roger Ailes.

0:04:17.342,0:04:19.342
(Laughter)

0:04:20.524,0:04:22.912
He founded Fox News in 1996.

0:04:23.436,0:04:26.017
More than 20 women complained[br]about sexual harassment.

0:04:26.041,0:04:29.276
They said they weren't allowed[br]to succeed at Fox News.

0:04:29.300,0:04:31.820
He was ousted last year,[br]but we've seen recently

0:04:31.844,0:04:34.514
that the problems have persisted.

0:04:35.654,0:04:37.054
That begs the question:

0:04:37.078,0:04:39.962
What should Fox News do[br]to turn over another leaf?

0:04:41.245,0:04:44.286
Well, what if they replaced[br]their hiring process

0:04:44.310,0:04:45.964
with a machine-learning algorithm?

0:04:45.988,0:04:47.583
That sounds good, right?

0:04:47.607,0:04:48.907
Think about it.

0:04:48.931,0:04:51.036
The data, what would the data be?

0:04:51.060,0:04:56.007
A reasonable choice would be the last[br]21 years of applications to Fox News.

0:04:56.031,0:04:57.533
Reasonable.

0:04:57.557,0:04:59.495
What about the definition of success?

0:04:59.921,0:05:01.245
Reasonable choice would be,

0:05:01.269,0:05:03.047
well, who is successful at Fox News?

0:05:03.071,0:05:06.651
I guess someone who, say,[br]stayed there for four years

0:05:06.675,0:05:08.329
and was promoted at least once.

0:05:08.816,0:05:10.377
Sounds reasonable.

0:05:10.401,0:05:12.755
And then the algorithm would be trained.

0:05:12.779,0:05:16.656
It would be trained to look for people[br]to learn what led to success,

0:05:17.219,0:05:21.537
what kind of applications[br]historically led to success

0:05:21.561,0:05:22.855
by that definition.

0:05:24.200,0:05:25.975
Now think about what would happen

0:05:25.999,0:05:28.554
if we applied that[br]to a current pool of applicants.

0:05:29.119,0:05:30.748
It would filter out women

0:05:31.663,0:05:35.593
because they do not look like people[br]who were successful in the past.

0:05:39.752,0:05:42.289
Algorithms don't make things fair

0:05:42.313,0:05:45.007
if you just blithely,[br]blindly apply algorithms.

0:05:45.031,0:05:46.513
They don't make things fair.

0:05:46.537,0:05:48.665
They repeat our past practices,

0:05:48.689,0:05:49.872
our patterns.

0:05:49.896,0:05:51.835
They automate the status quo.

0:05:52.718,0:05:55.107
That would be great[br]if we had a perfect world,

0:05:55.905,0:05:57.217
but we don't.

0:05:57.241,0:06:01.343
And I'll add that most companies[br]don't have embarrassing lawsuits,

0:06:02.446,0:06:05.034
but the data scientists in those companies

0:06:05.058,0:06:07.247
are told to follow the data,

0:06:07.271,0:06:09.414
to focus on accuracy.

0:06:10.273,0:06:11.654
Think about what that means.

0:06:11.678,0:06:15.705
Because we all have bias,[br]it means they could be codifying sexism

0:06:15.729,0:06:17.565
or any other kind of bigotry.

0:06:19.488,0:06:20.909
Thought experiment,

0:06:20.933,0:06:22.442
because I like them:

0:06:23.574,0:06:26.549
an entirely segregated society --

0:06:28.247,0:06:31.575
racially segregated, all towns,[br]all neighborhoods

0:06:31.599,0:06:34.636
and where we send the police[br]only to the minority neighborhoods

0:06:34.660,0:06:35.853
to look for crime.

0:06:36.451,0:06:38.670
The arrest data would be very biased.

0:06:39.851,0:06:42.426
What if, on top of that,[br]we found the data scientists

0:06:42.450,0:06:46.611
and paid the data scientists to predict[br]where the next crime would occur?

0:06:47.275,0:06:48.762
Minority neighborhood.

0:06:49.285,0:06:52.410
Or to predict who the next[br]criminal would be?

0:06:52.888,0:06:54.283
A minority.

0:06:55.949,0:06:59.490
The data scientists would brag[br]about how great and how accurate

0:06:59.514,0:07:00.811
their model would be,

0:07:00.835,0:07:02.134
and they'd be right.

0:07:03.951,0:07:08.566
Now, reality isn't that drastic,[br]but we do have severe segregations

0:07:08.590,0:07:09.877
in many cities and towns,

0:07:09.901,0:07:11.794
and we have plenty of evidence

0:07:11.818,0:07:14.506
of biased policing[br]and justice system data.

0:07:15.632,0:07:18.447
And we actually do predict hotspots,

0:07:18.471,0:07:20.001
places where crimes will occur.

0:07:20.401,0:07:24.267
And we do predict, in fact,[br]the individual criminality,

0:07:24.291,0:07:26.061
the criminality of individuals.

0:07:26.972,0:07:30.935
The news organization ProPublica[br]recently looked into

0:07:30.959,0:07:32.983
one of those "recidivism risk" algorithms,

0:07:33.007,0:07:34.170
as they're called,

0:07:34.194,0:07:37.388
being used in Florida[br]during sentencing by judges.

0:07:38.411,0:07:41.996
Bernard, on the left, the black man,[br]was scored a 10 out of 10.

0:07:43.179,0:07:45.186
Dylan, on the right, 3 out of 10.

0:07:45.210,0:07:47.711
10 out of 10, high risk.[br]3 out of 10, low risk.

0:07:48.598,0:07:50.983
They were both brought in[br]for drug possession.

0:07:51.007,0:07:52.161
They both had records,

0:07:52.185,0:07:54.991
but Dylan had a felony

0:07:55.015,0:07:56.191
but Bernard didn't.

0:07:57.818,0:08:00.884
This matters, because[br]the higher score you are,

0:08:00.908,0:08:04.381
the more likely you're being given[br]a longer sentence.

0:08:06.294,0:08:07.588
What's going on?

0:08:08.526,0:08:09.858
Data laundering.

0:08:10.930,0:08:15.357
It's a process by which[br]technologists hide ugly truths

0:08:15.381,0:08:17.202
inside black box algorithms

0:08:17.226,0:08:18.516
and call them objective;

0:08:19.320,0:08:20.888
call them meritocratic.

0:08:23.118,0:08:25.503
When they're secret,[br]important and destructive,

0:08:25.527,0:08:28.014
I've coined a term for these algorithms:

0:08:28.038,0:08:30.037
"weapons of math destruction."

0:08:30.061,0:08:31.625
(Laughter)

0:08:31.649,0:08:34.703
(Applause)

0:08:34.727,0:08:37.081
They're everywhere,[br]and it's not a mistake.

0:08:37.695,0:08:41.418
These are private companies[br]building private algorithms

0:08:41.442,0:08:42.834
for private ends.

0:08:43.214,0:08:46.428
Even the ones I talked about[br]for teachers and the public police,

0:08:46.452,0:08:48.321
those were built by private companies

0:08:48.345,0:08:50.576
and sold to the government institutions.

0:08:50.600,0:08:52.473
They call it their "secret sauce" --

0:08:52.497,0:08:54.625
that's why they can't tell us about it.

0:08:54.649,0:08:56.869
It's also private power.

0:08:57.924,0:09:02.619
They are profiting for wielding[br]the authority of the inscrutable.

0:09:05.114,0:09:08.048
Now you might think,[br]since all this stuff is private

0:09:08.072,0:09:09.230
and there's competition,

0:09:09.254,0:09:11.560
maybe the free market[br]will solve this problem.

0:09:11.584,0:09:12.833
It won't.

0:09:12.857,0:09:15.977
There's a lot of money[br]to be made in unfairness.

0:09:17.127,0:09:20.496
Also, we're not economic rational agents.

0:09:21.031,0:09:22.323
We all are biased.

0:09:22.960,0:09:26.337
We're all racist and bigoted[br]in ways that we wish we weren't,

0:09:26.361,0:09:28.380
in ways that we don't even know.

0:09:29.352,0:09:32.433
We know this, though, in aggregate,

0:09:32.457,0:09:35.677
because sociologists[br]have consistently demonstrated this

0:09:35.701,0:09:37.366
with these experiments they build,

0:09:37.390,0:09:39.958
where they send a bunch[br]of applications to jobs out,

0:09:39.982,0:09:42.483
equally qualified but some[br]have white-sounding names

0:09:42.507,0:09:44.213
and some have black-sounding names,

0:09:44.237,0:09:46.931
and it's always disappointing,[br]the results -- always.

0:09:47.510,0:09:49.281
So we are the ones that are biased,

0:09:49.305,0:09:52.734
and we are injecting those biases[br]into the algorithms

0:09:52.758,0:09:54.570
by choosing what data to collect,

0:09:54.594,0:09:57.337
like I chose not to think[br]about ramen noodles --

0:09:57.361,0:09:58.986
I decided it was irrelevant.

0:09:59.010,0:10:04.694
But by trusting the data that's actually[br]picking up on past practices

0:10:04.718,0:10:06.732
and by choosing the definition of success,

0:10:06.756,0:10:10.739
how can we expect the algorithms[br]to emerge unscathed?

0:10:10.763,0:10:13.119
We can't. We have to check them.

0:10:14.165,0:10:15.874
We have to check them for fairness.

0:10:15.898,0:10:18.609
The good news is,[br]we can check them for fairness.

0:10:18.633,0:10:21.985
Algorithms can be interrogated,

0:10:22.009,0:10:24.043
and they will tell us[br]the truth every time.

0:10:24.067,0:10:26.560
And we can fix them.[br]We can make them better.

0:10:26.584,0:10:28.959
I call this an algorithmic audit,

0:10:28.983,0:10:30.662
and I'll walk you through it.

0:10:30.686,0:10:32.882
First, data integrity check.

0:10:34.132,0:10:36.789
For the recidivism risk[br]algorithm I talked about,

0:10:37.582,0:10:41.155
a data integrity check would mean[br]we'd have to come to terms with the fact

0:10:41.179,0:10:44.705
that in the US, whites and blacks[br]smoke pot at the same rate

0:10:44.729,0:10:47.214
but blacks are far more likely[br]to be arrested --

0:10:47.238,0:10:50.422
four or five times more likely,[br]depending on the area.

0:10:51.317,0:10:54.143
What is that bias looking like[br]in other crime categories,

0:10:54.167,0:10:55.618
and how do we account for it?

0:10:56.162,0:10:59.201
Second, we should think about[br]the definition of success,

0:10:59.225,0:11:00.606
audit that.

0:11:00.630,0:11:03.382
Remember -- with the hiring[br]algorithm? We talked about it.

0:11:03.406,0:11:06.571
Someone who stays for four years[br]and is promoted once?

0:11:06.595,0:11:08.364
Well, that is a successful employee,

0:11:08.388,0:11:11.467
but it's also an employee[br]that is supported by their culture.

0:11:12.089,0:11:14.015
That said, also it can be quite biased.

0:11:14.039,0:11:16.104
We need to separate those two things.

0:11:16.128,0:11:18.554
We should look to[br]the blind orchestra audition

0:11:18.578,0:11:19.774
as an example.

0:11:19.798,0:11:22.554
That's where the people auditioning[br]are behind a sheet.

0:11:22.946,0:11:24.877
What I want to think about there

0:11:24.901,0:11:28.318
is the people who are listening[br]have decided what's important

0:11:28.342,0:11:30.371
and they've decided what's not important,

0:11:30.395,0:11:32.454
and they're not getting[br]distracted by that.

0:11:32.961,0:11:35.710
When the blind orchestra[br]auditions started,

0:11:35.734,0:11:39.178
the number of women in orchestras[br]went up by a factor of five.

0:11:40.253,0:11:42.268
Next, we have to consider accuracy.

0:11:43.233,0:11:46.967
This is where the value-added model[br]for teachers would fail immediately.

0:11:47.578,0:11:49.740
No algorithm is perfect, of course,

0:11:50.620,0:11:54.225
so we have to consider[br]the errors of every algorithm.

0:11:54.836,0:11:59.195
How often are there errors,[br]and for whom does this model fail?

0:11:59.850,0:12:01.568
What is the cost of that failure?

0:12:02.434,0:12:04.641
And finally, we have to consider

0:12:05.973,0:12:08.159
the long-term effects of algorithms,

0:12:08.866,0:12:11.073
the feedback loops that are engendering.

0:12:11.586,0:12:12.822
That sounds abstract,

0:12:12.846,0:12:15.510
but imagine if Facebook engineers[br]had considered that

0:12:16.270,0:12:21.125
before they decided to show us[br]only things that our friends had posted.

0:12:21.761,0:12:24.995
I have two more messages,[br]one for the data scientists out there.

0:12:25.450,0:12:28.859
Data scientists: we should[br]not be the arbiters of truth.

0:12:29.520,0:12:33.303
We should be translators[br]of ethical discussions that happen

0:12:33.327,0:12:34.621
in larger society.

0:12:35.579,0:12:37.712
(Applause)

0:12:37.736,0:12:39.292
And the rest of you,

0:12:40.011,0:12:41.407
the non-data scientists:

0:12:41.431,0:12:42.929
this is not a math test.

0:12:43.632,0:12:44.980
This is a political fight.

0:12:46.587,0:12:50.494
We need to demand accountability[br]for our algorithmic overlords.

0:12:52.118,0:12:53.617
(Applause)

0:12:53.641,0:12:57.866
The era of blind faith[br]in big data must end.

0:12:57.890,0:12:59.057
Thank you very much.

0:12:59.081,0:13:04.384
(Applause)