0:00:00.975,0:00:02.571 Algorithms are everywhere. 0:00:04.111,0:00:07.236 They sort and separate[br]the winners from the losers. 0:00:08.019,0:00:10.283 The winners get the job 0:00:10.307,0:00:12.050 or a good credit card offer. 0:00:12.074,0:00:14.725 The losers don't even get an interview 0:00:15.590,0:00:17.367 or they pay more for insurance. 0:00:18.197,0:00:21.746 We're being scored with secret formulas[br]that we don't understand 0:00:22.675,0:00:25.892 that often don't have systems of appeal. 0:00:27.240,0:00:28.536 That begs the question: 0:00:28.560,0:00:31.473 What if the algorithms are wrong? 0:00:33.100,0:00:35.140 To build an algorithm you need two things: 0:00:35.164,0:00:37.145 you need data, what happened in the past, 0:00:37.169,0:00:38.730 and a definition of success, 0:00:38.754,0:00:41.211 the thing you're looking for[br]and often hoping for. 0:00:41.235,0:00:46.272 You train an algorithm[br]by looking, figuring out. 0:00:46.296,0:00:49.715 The algorithm figures out[br]what is associated with success. 0:00:49.739,0:00:52.202 What situation leads to success? 0:00:52.881,0:00:54.643 Actually, everyone uses algorithms. 0:00:54.667,0:00:57.385 They just don't formalize them[br]in written code. 0:00:57.409,0:00:58.757 Let me give you an example. 0:00:58.781,0:01:02.097 I use an algorithm every day[br]to make a meal for my family. 0:01:02.121,0:01:03.597 The data I use 0:01:04.394,0:01:06.053 is the ingredients in my kitchen, 0:01:06.077,0:01:07.604 the time I have, 0:01:07.628,0:01:08.861 the ambition I have, 0:01:08.885,0:01:10.594 and I curate that data. 0:01:10.618,0:01:14.869 I don't count those little packages[br]of ramen noodles as food. 0:01:14.893,0:01:16.762 (Laughter) 0:01:16.786,0:01:18.631 My definition of success is: 0:01:18.655,0:01:21.314 a meal is successful[br]if my kids eat vegetables. 0:01:22.181,0:01:25.035 It's very different[br]from if my youngest son were in charge. 0:01:25.059,0:01:27.847 He'd say success is if[br]he gets to eat lots of Nutella. 0:01:29.179,0:01:31.405 But I get to choose success. 0:01:31.429,0:01:34.136 I am in charge. My opinion matters. 0:01:34.160,0:01:36.835 That's the first rule of algorithms. 0:01:36.859,0:01:40.039 Algorithms are opinions embedded in code. 0:01:41.562,0:01:45.225 It's really different from what you think[br]most people think of algorithms. 0:01:45.249,0:01:49.753 They think algorithms are objective[br]and true and scientific. 0:01:50.387,0:01:52.086 That's a marketing trick. 0:01:53.269,0:01:55.394 It's also a marketing trick 0:01:55.418,0:01:58.572 to intimidate you with algorithms, 0:01:58.596,0:02:02.257 to make you trust and fear algorithms 0:02:02.281,0:02:04.299 because you trust and fear mathematics. 0:02:05.567,0:02:10.397 A lot can go wrong when we put[br]blind faith in big data. 0:02:11.684,0:02:15.057 This is Kiri Soares.[br]She's a high school principal in Brooklyn. 0:02:15.081,0:02:17.667 In 2011, she told me[br]her teachers were being scored 0:02:17.691,0:02:20.418 with a complex, secret algorithm 0:02:20.442,0:02:21.931 called the "value-added model." 0:02:22.505,0:02:25.597 I told her, "Well, figure out[br]what the formula is, show it to me. 0:02:25.621,0:02:27.162 I'm going to explain it to you." 0:02:27.186,0:02:29.327 She said, "Well, I tried[br]to get the formula, 0:02:29.351,0:02:32.123 but my Department of Education contact[br]told me it was math 0:02:32.147,0:02:33.693 and I wouldn't understand it." 0:02:35.266,0:02:36.604 It gets worse. 0:02:36.628,0:02:40.158 The New York Post filed[br]a Freedom of Information Act request, 0:02:40.182,0:02:43.141 got all the teachers' names[br]and all their scores 0:02:43.165,0:02:45.947 and they published them[br]as an act of teacher-shaming. 0:02:47.084,0:02:50.944 When I tried to get the formulas,[br]the source code, through the same means, 0:02:50.968,0:02:53.117 I was told I couldn't. 0:02:53.141,0:02:54.377 I was denied. 0:02:54.401,0:02:55.575 I later found out 0:02:55.599,0:02:58.465 that nobody in New York City[br]had access to that formula. 0:02:58.489,0:02:59.794 No one understood it. 0:03:01.929,0:03:05.153 Then someone really smart[br]got involved, Gary Rubinstein. 0:03:05.177,0:03:08.798 He found 665 teachers[br]from that New York Post data 0:03:08.822,0:03:10.688 that actually had two scores. 0:03:10.712,0:03:12.593 That could happen if they were teaching 0:03:12.617,0:03:15.056 seventh grade math and eighth grade math. 0:03:15.080,0:03:16.618 He decided to plot them. 0:03:16.642,0:03:18.635 Each dot represents a teacher. 0:03:19.104,0:03:21.483 (Laughter) 0:03:21.507,0:03:23.028 What is that? 0:03:23.052,0:03:24.329 (Laughter) 0:03:24.353,0:03:27.799 That should never have been used[br]for individual assessment. 0:03:27.823,0:03:29.749 It's almost a random number generator. 0:03:29.773,0:03:32.719 (Applause) 0:03:32.743,0:03:33.905 But it was. 0:03:33.929,0:03:35.105 This is Sarah Wysocki. 0:03:35.129,0:03:37.304 She got fired, along[br]with 205 other teachers, 0:03:37.328,0:03:39.990 from the Washington, DC school district, 0:03:40.014,0:03:42.923 even though she had great[br]recommendations from her principal 0:03:42.947,0:03:44.375 and the parents of her kids. 0:03:45.390,0:03:47.422 I know what a lot[br]of you guys are thinking, 0:03:47.446,0:03:49.933 especially the data scientists,[br]the AI experts here. 0:03:49.957,0:03:54.183 You're thinking, "Well, I would never make[br]an algorithm that inconsistent." 0:03:54.853,0:03:56.536 But algorithms can go wrong, 0:03:56.560,0:04:01.158 even have deeply destructive effects[br]with good intentions. 0:04:02.531,0:04:04.910 And whereas an airplane[br]that's designed badly 0:04:04.934,0:04:06.935 crashes to the earth and everyone sees it, 0:04:06.959,0:04:08.809 an algorithm designed badly 0:04:10.245,0:04:14.110 can go on for a long time,[br]silently wreaking havoc. 0:04:15.748,0:04:17.318 This is Roger Ailes. 0:04:17.342,0:04:19.342 (Laughter) 0:04:20.524,0:04:22.912 He founded Fox News in 1996. 0:04:23.436,0:04:26.017 More than 20 women complained[br]about sexual harassment. 0:04:26.041,0:04:29.276 They said they weren't allowed[br]to succeed at Fox News. 0:04:29.300,0:04:31.820 He was ousted last year,[br]but we've seen recently 0:04:31.844,0:04:34.514 that the problems have persisted. 0:04:35.654,0:04:37.054 That begs the question: 0:04:37.078,0:04:39.962 What should Fox News do[br]to turn over another leaf? 0:04:41.245,0:04:44.286 Well, what if they replaced[br]their hiring process 0:04:44.310,0:04:45.964 with a machine-learning algorithm? 0:04:45.988,0:04:47.583 That sounds good, right? 0:04:47.607,0:04:48.907 Think about it. 0:04:48.931,0:04:51.036 The data, what would the data be? 0:04:51.060,0:04:56.007 A reasonable choice would be the last[br]21 years of applications to Fox News. 0:04:56.031,0:04:57.533 Reasonable. 0:04:57.557,0:04:59.495 What about the definition of success? 0:04:59.921,0:05:01.245 Reasonable choice would be, 0:05:01.269,0:05:03.047 well, who is successful at Fox News? 0:05:03.071,0:05:06.651 I guess someone who, say,[br]stayed there for four years 0:05:06.675,0:05:08.329 and was promoted at least once. 0:05:08.816,0:05:10.377 Sounds reasonable. 0:05:10.401,0:05:12.755 And then the algorithm would be trained. 0:05:12.779,0:05:16.656 It would be trained to look for people[br]to learn what led to success, 0:05:17.219,0:05:21.537 what kind of applications[br]historically led to success 0:05:21.561,0:05:22.855 by that definition. 0:05:24.200,0:05:25.975 Now think about what would happen 0:05:25.999,0:05:28.554 if we applied that[br]to a current pool of applicants. 0:05:29.119,0:05:30.748 It would filter out women 0:05:31.663,0:05:35.593 because they do not look like people[br]who were successful in the past. 0:05:39.752,0:05:42.289 Algorithms don't make things fair 0:05:42.313,0:05:45.007 if you just blithely,[br]blindly apply algorithms. 0:05:45.031,0:05:46.513 They don't make things fair. 0:05:46.537,0:05:48.665 They repeat our past practices, 0:05:48.689,0:05:49.872 our patterns. 0:05:49.896,0:05:51.835 They automate the status quo. 0:05:52.718,0:05:55.107 That would be great[br]if we had a perfect world, 0:05:55.905,0:05:57.217 but we don't. 0:05:57.241,0:06:01.343 And I'll add that most companies[br]don't have embarrassing lawsuits, 0:06:02.446,0:06:05.034 but the data scientists in those companies 0:06:05.058,0:06:07.247 are told to follow the data, 0:06:07.271,0:06:09.414 to focus on accuracy. 0:06:10.273,0:06:11.654 Think about what that means. 0:06:11.678,0:06:15.705 Because we all have bias,[br]it means they could be codifying sexism 0:06:15.729,0:06:17.565 or any other kind of bigotry. 0:06:19.488,0:06:20.909 Thought experiment, 0:06:20.933,0:06:22.442 because I like them: 0:06:23.574,0:06:26.549 an entirely segregated society -- 0:06:28.247,0:06:31.575 racially segregated, all towns,[br]all neighborhoods 0:06:31.599,0:06:34.636 and where we send the police[br]only to the minority neighborhoods 0:06:34.660,0:06:35.853 to look for crime. 0:06:36.451,0:06:38.670 The arrest data would be very biased. 0:06:39.851,0:06:42.426 What if, on top of that,[br]we found the data scientists 0:06:42.450,0:06:46.611 and paid the data scientists to predict[br]where the next crime would occur? 0:06:47.275,0:06:48.762 Minority neighborhood. 0:06:49.285,0:06:52.410 Or to predict who the next[br]criminal would be? 0:06:52.888,0:06:54.283 A minority. 0:06:55.949,0:06:59.490 The data scientists would brag[br]about how great and how accurate 0:06:59.514,0:07:00.811 their model would be, 0:07:00.835,0:07:02.134 and they'd be right. 0:07:03.951,0:07:08.566 Now, reality isn't that drastic,[br]but we do have severe segregations 0:07:08.590,0:07:09.877 in many cities and towns, 0:07:09.901,0:07:11.794 and we have plenty of evidence 0:07:11.818,0:07:14.506 of biased policing[br]and justice system data. 0:07:15.632,0:07:18.447 And we actually do predict hotspots, 0:07:18.471,0:07:20.001 places where crimes will occur. 0:07:20.401,0:07:24.267 And we do predict, in fact,[br]the individual criminality, 0:07:24.291,0:07:26.061 the criminality of individuals. 0:07:26.972,0:07:30.935 The news organization ProPublica[br]recently looked into 0:07:30.959,0:07:32.983 one of those "recidivism risk" algorithms, 0:07:33.007,0:07:34.170 as they're called, 0:07:34.194,0:07:37.388 being used in Florida[br]during sentencing by judges. 0:07:38.411,0:07:41.996 Bernard, on the left, the black man,[br]was scored a 10 out of 10. 0:07:43.179,0:07:45.186 Dylan, on the right, 3 out of 10. 0:07:45.210,0:07:47.711 10 out of 10, high risk.[br]3 out of 10, low risk. 0:07:48.598,0:07:50.983 They were both brought in[br]for drug possession. 0:07:51.007,0:07:52.161 They both had records, 0:07:52.185,0:07:54.991 but Dylan had a felony 0:07:55.015,0:07:56.191 but Bernard didn't. 0:07:57.818,0:08:00.884 This matters, because[br]the higher score you are, 0:08:00.908,0:08:04.381 the more likely you're being given[br]a longer sentence. 0:08:06.294,0:08:07.588 What's going on? 0:08:08.526,0:08:09.858 Data laundering. 0:08:10.930,0:08:15.357 It's a process by which[br]technologists hide ugly truths 0:08:15.381,0:08:17.202 inside black box algorithms 0:08:17.226,0:08:18.516 and call them objective; 0:08:19.320,0:08:20.888 call them meritocratic. 0:08:23.118,0:08:25.503 When they're secret,[br]important and destructive, 0:08:25.527,0:08:28.014 I've coined a term for these algorithms: 0:08:28.038,0:08:30.037 "weapons of math destruction." 0:08:30.061,0:08:31.625 (Laughter) 0:08:31.649,0:08:34.703 (Applause) 0:08:34.727,0:08:37.081 They're everywhere,[br]and it's not a mistake. 0:08:37.695,0:08:41.418 These are private companies[br]building private algorithms 0:08:41.442,0:08:42.834 for private ends. 0:08:43.214,0:08:46.428 Even the ones I talked about[br]for teachers and the public police, 0:08:46.452,0:08:48.321 those were built by private companies 0:08:48.345,0:08:50.576 and sold to the government institutions. 0:08:50.600,0:08:52.473 They call it their "secret sauce" -- 0:08:52.497,0:08:54.625 that's why they can't tell us about it. 0:08:54.649,0:08:56.869 It's also private power. 0:08:57.924,0:09:02.619 They are profiting for wielding[br]the authority of the inscrutable. 0:09:05.114,0:09:08.048 Now you might think,[br]since all this stuff is private 0:09:08.072,0:09:09.230 and there's competition, 0:09:09.254,0:09:11.560 maybe the free market[br]will solve this problem. 0:09:11.584,0:09:12.833 It won't. 0:09:12.857,0:09:15.977 There's a lot of money[br]to be made in unfairness. 0:09:17.127,0:09:20.496 Also, we're not economic rational agents. 0:09:21.031,0:09:22.323 We all are biased. 0:09:22.960,0:09:26.337 We're all racist and bigoted[br]in ways that we wish we weren't, 0:09:26.361,0:09:28.380 in ways that we don't even know. 0:09:29.352,0:09:32.433 We know this, though, in aggregate, 0:09:32.457,0:09:35.677 because sociologists[br]have consistently demonstrated this 0:09:35.701,0:09:37.366 with these experiments they build, 0:09:37.390,0:09:39.958 where they send a bunch[br]of applications to jobs out, 0:09:39.982,0:09:42.483 equally qualified but some[br]have white-sounding names 0:09:42.507,0:09:44.213 and some have black-sounding names, 0:09:44.237,0:09:46.931 and it's always disappointing,[br]the results -- always. 0:09:47.510,0:09:49.281 So we are the ones that are biased, 0:09:49.305,0:09:52.734 and we are injecting those biases[br]into the algorithms 0:09:52.758,0:09:54.570 by choosing what data to collect, 0:09:54.594,0:09:57.337 like I chose not to think[br]about ramen noodles -- 0:09:57.361,0:09:58.986 I decided it was irrelevant. 0:09:59.010,0:10:04.694 But by trusting the data that's actually[br]picking up on past practices 0:10:04.718,0:10:06.732 and by choosing the definition of success, 0:10:06.756,0:10:10.739 how can we expect the algorithms[br]to emerge unscathed? 0:10:10.763,0:10:13.119 We can't. We have to check them. 0:10:14.165,0:10:15.874 We have to check them for fairness. 0:10:15.898,0:10:18.609 The good news is,[br]we can check them for fairness. 0:10:18.633,0:10:21.985 Algorithms can be interrogated, 0:10:22.009,0:10:24.043 and they will tell us[br]the truth every time. 0:10:24.067,0:10:26.560 And we can fix them.[br]We can make them better. 0:10:26.584,0:10:28.959 I call this an algorithmic audit, 0:10:28.983,0:10:30.662 and I'll walk you through it. 0:10:30.686,0:10:32.882 First, data integrity check. 0:10:34.132,0:10:36.789 For the recidivism risk[br]algorithm I talked about, 0:10:37.582,0:10:41.155 a data integrity check would mean[br]we'd have to come to terms with the fact 0:10:41.179,0:10:44.705 that in the US, whites and blacks[br]smoke pot at the same rate 0:10:44.729,0:10:47.214 but blacks are far more likely[br]to be arrested -- 0:10:47.238,0:10:50.422 four or five times more likely,[br]depending on the area. 0:10:51.317,0:10:54.143 What is that bias looking like[br]in other crime categories, 0:10:54.167,0:10:55.618 and how do we account for it? 0:10:56.162,0:10:59.201 Second, we should think about[br]the definition of success, 0:10:59.225,0:11:00.606 audit that. 0:11:00.630,0:11:03.382 Remember -- with the hiring[br]algorithm? We talked about it. 0:11:03.406,0:11:06.571 Someone who stays for four years[br]and is promoted once? 0:11:06.595,0:11:08.364 Well, that is a successful employee, 0:11:08.388,0:11:11.467 but it's also an employee[br]that is supported by their culture. 0:11:12.089,0:11:14.015 That said, also it can be quite biased. 0:11:14.039,0:11:16.104 We need to separate those two things. 0:11:16.128,0:11:18.554 We should look to[br]the blind orchestra audition 0:11:18.578,0:11:19.774 as an example. 0:11:19.798,0:11:22.554 That's where the people auditioning[br]are behind a sheet. 0:11:22.946,0:11:24.877 What I want to think about there 0:11:24.901,0:11:28.318 is the people who are listening[br]have decided what's important 0:11:28.342,0:11:30.371 and they've decided what's not important, 0:11:30.395,0:11:32.454 and they're not getting[br]distracted by that. 0:11:32.961,0:11:35.710 When the blind orchestra[br]auditions started, 0:11:35.734,0:11:39.178 the number of women in orchestras[br]went up by a factor of five. 0:11:40.253,0:11:42.268 Next, we have to consider accuracy. 0:11:43.233,0:11:46.967 This is where the value-added model[br]for teachers would fail immediately. 0:11:47.578,0:11:49.740 No algorithm is perfect, of course, 0:11:50.620,0:11:54.225 so we have to consider[br]the errors of every algorithm. 0:11:54.836,0:11:59.195 How often are there errors,[br]and for whom does this model fail? 0:11:59.850,0:12:01.568 What is the cost of that failure? 0:12:02.434,0:12:04.641 And finally, we have to consider 0:12:05.973,0:12:08.159 the long-term effects of algorithms, 0:12:08.866,0:12:11.073 the feedback loops that are engendering. 0:12:11.586,0:12:12.822 That sounds abstract, 0:12:12.846,0:12:15.510 but imagine if Facebook engineers[br]had considered that 0:12:16.270,0:12:21.125 before they decided to show us[br]only things that our friends had posted. 0:12:21.761,0:12:24.995 I have two more messages,[br]one for the data scientists out there. 0:12:25.450,0:12:28.859 Data scientists: we should[br]not be the arbiters of truth. 0:12:29.520,0:12:33.303 We should be translators[br]of ethical discussions that happen 0:12:33.327,0:12:34.621 in larger society. 0:12:35.579,0:12:37.712 (Applause) 0:12:37.736,0:12:39.292 And the rest of you, 0:12:40.011,0:12:41.407 the non-data scientists: 0:12:41.431,0:12:42.929 this is not a math test. 0:12:43.632,0:12:44.980 This is a political fight. 0:12:46.587,0:12:50.494 We need to demand accountability[br]for our algorithmic overlords. 0:12:52.118,0:12:53.617 (Applause) 0:12:53.641,0:12:57.866 The era of blind faith[br]in big data must end. 0:12:57.890,0:12:59.057 Thank you very much. 0:12:59.081,0:13:04.384 (Applause)