0:00:10.080,0:00:17.985 applause 0:00:17.985,0:00:22.900 Thank you very much, can you…[br]You can hear me? Yes! 0:00:22.900,0:00:27.620 I’ve been at this now 23 years. We[br]worked, with… My colleagues and I, 0:00:27.620,0:00:31.390 we worked in about 30 countries,[br]we’ve advised 9 Truth Commissions, 0:00:31.390,0:00:36.410 official Truth Commissions, 4 UN missions, 0:00:36.410,0:00:40.150 4 international criminal tribunals.[br]We have testified in 4 different cases 0:00:40.150,0:00:44.240 – 2 internationally, 2 domestically – and[br]we’ve advised dozens and dozens 0:00:44.240,0:00:49.120 of non-governmental Human Rights groups[br]around the world. The point of this stuff 0:00:49.120,0:00:54.180 is to figure out how to bring the[br]knowledge of the people who’ve suffered 0:00:54.180,0:00:58.770 human rights violations to bear,[br]on demanding accountability 0:00:58.770,0:01:04.960 from the perpetrators. Our job is to[br]figure out how we can tell the truth. 0:01:04.960,0:01:09.240 It is one of the moral foundations of the[br]international Human Rights movement 0:01:09.240,0:01:14.220 that we speak Truth to Power. We[br]look in the face of the powerful 0:01:14.220,0:01:19.299 and we tell them what we believe[br]they have done that is wrong. 0:01:19.299,0:01:23.639 If that’s gonna work, we[br]have to speak the truth. 0:01:23.639,0:01:29.470 We have to be right, we[br]have to get the analysis on. 0:01:29.470,0:01:33.979 That’s not always easy and to get there, 0:01:33.979,0:01:37.209 there are sort of 3 themes that[br]I wanna try to touch in this talk. 0:01:37.209,0:01:40.379 Since the talk is pretty short I’m[br]really gonna touch on 2 of them, so 0:01:40.379,0:01:43.619 at the very end of the talk I’ll invite[br]people who’d like to talk more about 0:01:43.619,0:01:49.270 the specifically technical aspects of this[br]work, about classifiers, about clustering, 0:01:49.270,0:01:53.620 about statistical estimation, about[br]database techniques. People who wanna talk 0:01:53.620,0:01:56.990 about that I’d love to gather and we’ll[br]try to find a space. I’ve been fighting 0:01:56.990,0:02:00.460 with the Wiki for 2 days; I think[br]I’m probably not the only one. 0:02:00.460,0:02:04.959 We can gather, we can talk about[br]that stuff more in detail. So today, 0:02:04.959,0:02:09.990 in the next 25 minutes I’m[br]going to focus specifically on 0:02:09.990,0:02:14.520 the trial of General[br]José Efraín Ríos Montt 0:02:14.520,0:02:20.200 who ruled Guatemala from[br]March 1982 until August 1983. 0:02:20.200,0:02:25.180 That’s General Ríos, there in[br]the upper corner in the red tie. 0:02:25.180,0:02:30.600 During the government[br]of General Ríos Montt 0:02:30.600,0:02:35.610 tens of thousands of people were killed by[br]the army of Guatemala. And the question 0:02:35.610,0:02:39.610 that has been facing Guatemalans[br]since that time is: 0:02:39.610,0:02:44.080 “Did the pattern of killing[br]that the army committed 0:02:44.080,0:02:49.690 constitute acts of genocide?”. Now[br]genocide is a very specific crime 0:02:49.690,0:02:54.420 in International Law. It does not[br]mean you killed a lot of people. 0:02:54.420,0:02:58.910 There are other war crimes for mass[br]killing. Genocide specifically means 0:02:58.910,0:03:03.930 that you picked out a particular group;[br]and to the exclusion of other groups 0:03:03.930,0:03:08.460 nearby them you focused[br]on eliminating that group. 0:03:08.460,0:03:14.240 That’s key because for a statistician[br]that gives us a hypothesis we can test 0:03:14.240,0:03:18.860 which is: “What is the relative risk,[br]what is the differential probability 0:03:18.860,0:03:22.820 of people in the target group being[br]killed relative to their neighbours 0:03:22.820,0:03:28.150 who are not in the target group?”[br]So without further ado, 0:03:28.150,0:03:31.970 let’s look at the relative risk of[br]being killed for indigenous people 0:03:31.970,0:03:36.880 in the 3 rural counties of[br]Chajul, Cotzal and Nebaj 0:03:36.880,0:03:41.400 relative to their[br]non-indigenous neighbours. 0:03:41.400,0:03:45.960 We have – and I’ll talk in a moment about[br]how we have this – we have information, 0:03:45.960,0:03:51.490 and evidence, and estimations of the[br]deaths of about 2150 indigenous people. 0:03:51.490,0:03:58.550 People killed by the army in the period[br]of the government of General Ríos. 0:03:58.550,0:04:02.550 The population, the total number of[br]people alive who were indigenous 0:04:02.550,0:04:07.370 in those counties in the census[br]of 1981 is about 39,000. 0:04:07.370,0:04:14.500 So the approximate crude mortality[br]rate due to homicide by the army 0:04:14.500,0:04:18.710 is 5.5% for indigenous people in[br]that period. Now that’s relative 0:04:18.710,0:04:22.890 to the homicide rate for non-indigenous[br]people in the same place 0:04:22.890,0:04:27.200 of approximately 0.7%. So what[br]we ask is: “What is the ratio 0:04:27.200,0:04:30.530 between those 2 numbers?” And[br]the ratio between those 2 numbers 0:04:30.530,0:04:35.600 is the relative risk. It’s approximately[br]8. We interpret that as: if you were 0:04:35.600,0:04:41.339 an indigenous person alive in[br]one of those 3 counties in 1982, 0:04:41.339,0:04:46.939 your probability of being killed[br]by the army was 8 times greater 0:04:46.939,0:04:51.069 than a person also living[br]in those 3 counties 0:04:51.069,0:04:56.179 who was not indigenous.[br]Eight times, 8 times! 0:04:56.179,0:05:00.250 To put that in relative terms: the[br]probability… the relative risk of being 0:05:00.250,0:05:04.720 a Bosniac relative to being Serb[br]in Bosnia during the war in Bosnia 0:05:04.720,0:05:09.800 was a little less than 3. So your[br]relative risk of being indigenous 0:05:09.800,0:05:13.310 was more than twice nearly 3 times[br]as much as your relative risk 0:05:13.310,0:05:19.200 of being Bosniac in the Bosnian War.[br]It’s an astonishing level of focus. 0:05:19.200,0:05:23.809 It shows a tremendous planning[br]and coherence, I believe. 0:05:23.809,0:05:29.469 So, again coming back to the statistical[br]conclusion, how do we come to that? 0:05:29.469,0:05:32.849 How do we find that information? How do we[br]make that conclusion? First, we’re only 0:05:32.849,0:05:35.470 looking at homicides committed by the[br]army. We’re not looking at homicides 0:05:35.470,0:05:39.409 committed by other parties, by[br]the guerrillas, by private actors. 0:05:39.409,0:05:44.499 We’re not looking at excess mortality,[br]the mortality that we might find 0:05:44.499,0:05:47.709 in conflict that is in excess of[br]normal peacetime mortality. 0:05:47.709,0:05:51.470 We’re not looking at any of that,[br]only homicide. And the percentage 0:05:51.470,0:05:55.330 relates the number of people killed by the[br]army with the population that was alive. 0:05:55.330,0:05:58.650 That’s crucial here. We’re looking at[br]rates and we’re comparing the rate 0:05:58.650,0:06:02.430 of the indigenous people shown in the[br]blue bar to non-indigenous people 0:06:02.430,0:06:06.869 shown in the green bar. The width of[br]the bars show the relative populations 0:06:06.869,0:06:11.829 in each of those 2 communities. So clearly[br]there are many more indigenous people, 0:06:11.829,0:06:14.980 but a higher fraction of them are also[br]killed. The bars also show something else. 0:06:14.980,0:06:18.049 And that’s what I’ll focus on for the[br]rest of the talk. There are 2 sections 0:06:18.049,0:06:22.159 to each of the 2 bars, a dark section[br]on the bottom, a lighter section on top. 0:06:22.159,0:06:27.779 And what that indicates is what we know[br]in terms of being able to name people 0:06:27.779,0:06:31.249 with their first and last name, their[br]location and dates of death, and 0:06:31.249,0:06:35.560 what we must infer statistically. Now I’m[br]beginning to touch on the second theme 0:06:35.560,0:06:40.949 of my talk: Which is that when we are[br]studying mass violence and war crimes, 0:06:40.949,0:06:48.749 we cannot do statistical or pattern[br]analysis with raw information. 0:06:48.749,0:06:51.950 We must use the tools of mathematical[br]statistics to understand 0:06:51.950,0:06:56.080 what we don’t know! The information[br]which cannot be observed directly. 0:06:56.080,0:07:00.649 We have to estimate that in order to[br]control for the process of the production 0:07:00.649,0:07:04.989 of information. Information doesn’t just[br]fall out of the sky, the way it does 0:07:04.989,0:07:10.359 for industry. If I’m running an ISP I know[br]every packet that runs through my routers. 0:07:10.359,0:07:14.959 That’s not how the social world works. In[br]order to find information about killings 0:07:14.959,0:07:17.889 we have to hear about that killing from[br]someone, we have to investigate, 0:07:17.889,0:07:22.119 we have to find the human remains.[br]And if we can’t observe the killing 0:07:22.119,0:07:28.130 we won’t hear about it and many killings[br]are hidden. In my team we have a kind of 0:07:28.130,0:07:33.760 catch phrase: that the world… if a lawyer[br]is killed in a big city at high noon 0:07:33.760,0:07:38.259 the world knows about it before[br]dinner time. Every single time. 0:07:38.259,0:07:41.850 But when a rural peasant is killed 3-days[br]walk from a road in the dead of night, 0:07:41.850,0:07:45.489 we’re unlikely to ever hear. And[br]technology is not changing this. 0:07:45.489,0:07:48.899 I’ll talk later about that technology is[br]actually making the problem worse. 0:07:48.899,0:07:53.470 So, let’s get back to Guatemala[br]and just conclude 0:07:53.470,0:07:57.950 that the little vertical bars, little[br]vertical lines at the top of each bar 0:07:57.950,0:08:03.079 indicate the confidence interval. Which is[br]similar to what lay people sometimes call 0:08:03.079,0:08:07.199 a margin of error. It is our level of[br]uncertainty about each of those estimates 0:08:07.199,0:08:10.960 and you’ll notice that the uncertainty[br]is much, much smaller than 0:08:10.960,0:08:14.509 the difference between the 2 bars. The[br]uncertainty does not affect our ability 0:08:14.509,0:08:17.970 to draw the conclusion that there[br]was a spectacular difference 0:08:17.970,0:08:21.900 in the mortality rates between the[br]people who were the hypothesized 0:08:21.900,0:08:26.630 target of genocide and those who were not. 0:08:26.630,0:08:30.520 Now the data: first we[br]had the census of 1981, 0:08:30.520,0:08:35.339 this was a crucial piece. I think there’s[br]very interesting questions to ask 0:08:35.339,0:08:39.609 about why the Government of Guatemala[br]conducted a census on the eve of 0:08:39.609,0:08:44.540 committing a genocide. There is excellent[br]work done by historical demographers 0:08:44.540,0:08:47.950 about the use of censuses in mass[br]violence. It has been common 0:08:47.950,0:08:52.880 throughout history. Similarly,[br]or excuse me, in parallel 0:08:52.880,0:08:57.420 there were 4 very large[br]projects. First, the CIIDH 0:08:57.420,0:09:01.600 – a group of non-Governmental[br]Human Rights groups – 0:09:01.600,0:09:06.610 collected 1240 records of deaths[br]in this three-county region. 0:09:06.610,0:09:11.750 Next, the Catholic Church collected[br]a bit fewer than 800 deaths. 0:09:11.750,0:09:16.539 The truth commission – the Comisión[br]para el Esclarecimiento Histórico (CEH) – 0:09:16.539,0:09:22.000 conducted a really big research[br]project in the late 1990s and 0:09:22.000,0:09:25.810 of that we got information about a little[br]bit more than a thousand deaths. 0:09:25.810,0:09:30.450 And then the National Program for[br]Compensation is very, very large 0:09:30.450,0:09:35.370 and gave us about 4700[br]records of deaths. 0:09:35.370,0:09:40.659 Now, this is interesting[br]but this is not unique. 0:09:40.659,0:09:45.769 Many of the deaths are reported in common[br]across those data sources and so… 0:09:45.769,0:09:49.490 we think about this in terms of a Venn[br]diagram. We think of: how did these 0:09:49.490,0:09:54.329 different data sets intersect with each[br]other or collide with each other. And 0:09:54.329,0:09:59.130 we can diagram that as in the sense[br]of these 3 white circles intersecting. 0:09:59.130,0:10:05.610 But as I mentioned earlier we’re also[br]interested in what we have not observed. 0:10:05.610,0:10:09.490 And this is crucial for us because[br]when we’re thinking about 0:10:09.490,0:10:13.420 how much information we have, we have to[br]distinguish between the world on the left, 0:10:13.420,0:10:17.200 in which our intersecting circles[br]cover about a third of the reality, 0:10:17.200,0:10:21.829 versus the world on the right where our[br]intersecting circles cover all of reality. 0:10:21.829,0:10:26.390 These are very different worlds; and the[br]reason they’re so different is not simply 0:10:26.390,0:10:29.710 because we want to know the magnitude,[br]not simply because we want to know 0:10:29.710,0:10:34.490 the total number of killings. That’s[br]important – but even more important: 0:10:34.490,0:10:40.160 we have to know that we’ve covered,[br]we’ve estimated in equal proportions 0:10:40.160,0:10:44.430 the two parties. We have to estimate in[br]equal proportions the number of deaths 0:10:44.430,0:10:48.340 of non-indigenous people and the[br]number of deaths of indigenous people. 0:10:48.340,0:10:51.510 Because if we don’t get those[br]estimates correct our comparison 0:10:51.510,0:10:56.080 of their mortality rates will be biased.[br]Our story will be wrong. We will fail 0:10:56.080,0:11:01.840 to speak Truth to Power. We can’t have[br]that. So what do we do? Algebra! 0:11:01.840,0:11:06.390 Algebra is our friend. So I’m gonna[br]give you just a tiny taste of how we 0:11:06.390,0:11:09.650 solve this problem and I’m going to[br]introduce a series of assumptions. 0:11:09.650,0:11:13.279 Those of you who would like to debate[br]those assumptions: I invite you to join me 0:11:13.279,0:11:18.359 after the talk and we will talk endlessly[br]and tediously about capture heterogeneity. 0:11:18.359,0:11:22.240 But in the short term, 0:11:22.240,0:11:27.940 we have a universe N of total killings in[br]a specific time/space/ethnicity/location. 0:11:27.940,0:11:30.690 And of that we have 2 projects A and B. 0:11:30.690,0:11:34.619 A captures some number of[br]deaths from the universe N, 0:11:34.619,0:11:40.169 and the probability with which a death is[br]captured by project A from the universe N 0:11:40.169,0:11:44.600 is by elementary probability theory the[br]number of deaths documented by A 0:11:44.600,0:11:48.740 divided by the unknown number[br]of deaths in the population N. 0:11:48.740,0:11:52.969 Similarly, the probability with which a[br]death from N is documented by project B 0:11:52.969,0:11:58.149 is B over N, and this is the cool part:[br]the probability with which a death 0:11:58.149,0:12:01.949 is documented by both A and B is M. 0:12:01.949,0:12:05.579 Now we can put the 2 databases together,[br]we can compare them. Let’s talk about 0:12:05.579,0:12:09.370 the use of random force classifiers[br]and clustering to do that later. 0:12:09.370,0:12:12.489 But we can put the 2 databases together,[br]compare them, determine the deaths 0:12:12.489,0:12:17.429 that are in M – that is in N both[br]A and B – and divide M by N. 0:12:17.429,0:12:23.060 But, also by probability theory, the[br]probability that a death occurs in M 0:12:23.060,0:12:27.740 is equal to the product of[br]the individual probabilities. 0:12:27.740,0:12:31.619 The probability of any compound event, an[br]event made up of two independent events is 0:12:31.619,0:12:36.410 equal to the product of those two[br]events, so M over N is equal to 0:12:36.410,0:12:41.420 A over N times B over N. Solve for N. 0:12:41.420,0:12:45.140 Multiply it through by N squared, divide[br]by M, and we have an estimate of N 0:12:45.140,0:12:49.360 which is equal to AB over M. Now, the[br]lights in my eyes, I can’t see, but I saw 0:12:49.360,0:12:52.740 a few light bulbs go off over people’s[br]heads. And when I showed this proof 0:12:52.740,0:12:57.180 to the judge in the trial of General Ríos 0:12:57.180,0:13:01.529 I saw a light bulb go on over her head. 0:13:01.529,0:13:04.379 It’s a beautiful thing,[br]it’s a beautiful thing. 0:13:04.379,0:13:09.509 applause 0:13:09.509,0:13:12.660 So we don’t do it in 2 systems because[br]that takes a lot of assumptions. 0:13:12.660,0:13:16.069 We do it in 4. You will recall that we[br]have 4 data sources. We organize 0:13:16.069,0:13:21.530 the data sources in this format[br]such that we have an inclusion 0:13:21.530,0:13:26.249 and an exclusion pattern in the table on [br]the left, which… for which we can define 0:13:26.249,0:13:29.810 the number of deaths which fall into[br]each of these intersecting patterns. 0:13:29.810,0:13:33.729 And I’ll give you a very quick[br]metaphor here. The metaphor is: 0:13:33.729,0:13:38.239 imagine that you have 2 dark rooms and you[br]want to assess the size of those 2 rooms 0:13:38.239,0:13:42.049 – which room is larger? And the only[br]tool that you have to assess the size 0:13:42.049,0:13:46.359 of those rooms is a handful of little[br]rubber balls. The little rubber balls 0:13:46.359,0:13:50.400 have a property that when they hit each[br]other they make a sound. makes CLICK sound 0:13:50.400,0:13:53.390 So we throw the balls into the first[br]room and we listen, and we hear 0:13:53.390,0:13:57.190 makes several CLICK sounds. We[br]collect the balls, go to the second room, 0:13:57.190,0:14:00.490 throw them with equal force – imagining[br]a spherical cow of uniform density! 0:14:00.490,0:14:03.950 We throw the balls into the second[br]room with equal force and we hear 0:14:03.950,0:14:07.799 makes one CLICK sound[br]So which room is larger? 0:14:07.799,0:14:12.070 The second room, because we hear fewer[br]collisions, right? Well, the estimation, 0:14:12.070,0:14:15.620 the toy example I gave in the previous[br]slide is the mathematical formalization 0:14:15.620,0:14:20.070 of the intuition that fewer[br]collisions mean a larger space. 0:14:20.070,0:14:23.329 And so what we’re doing here is[br]laying out the pattern of collisions. 0:14:23.329,0:14:26.679 Not just the collisions, the pairwise[br]collisions, but the three-way and 0:14:26.679,0:14:31.409 four-way collisions. And that[br]allows us to make the estimate 0:14:31.409,0:14:37.439 that was shown in the bar graph of[br]the light part of each of the bars. So 0:14:37.439,0:14:41.460 we can come back to our conclusion and put[br]a confidence interval on the estimates. 0:14:41.460,0:14:45.910 And the confidence intervals are shown[br]there. Now I’m gonna move through this 0:14:45.910,0:14:50.850 somewhat more quickly to get to the end of[br]the talk but I wanna put up one more slide 0:14:50.850,0:14:56.240 that was used in the testimony[br]and that is that we divided time 0:14:56.240,0:15:01.220 into 16-month periods and[br]compared the 16-month period of 0:15:01.220,0:15:04.580 General Ríos’s governance – now it’s only[br]16 months ’cause we went April to July, 0:15:04.580,0:15:07.679 because it’s only a few days in August, a[br]few days in March, so we shaved those off, 0:15:07.679,0:15:12.310 okay… – 16-month period of General[br]Ríos’s Government and compared it 0:15:12.310,0:15:17.110 to several periods before and after. And[br]I think that the key observation here 0:15:17.110,0:15:21.809 is that the rate of killing[br]against indigenous people 0:15:21.809,0:15:26.729 is substantially higher done under General[br]Ríos’s Government than under previous 0:15:26.729,0:15:33.280 or succeeding governments. But more[br]importantly the ratio between the two, 0:15:33.280,0:15:37.950 the relative risk of being killed as an[br]indigenous person, was at its peak 0:15:37.950,0:15:42.639 during the government of General Ríos. 0:15:42.639,0:15:46.709 Have we proven genocide? No. 0:15:46.709,0:15:49.870 This is evidence consistent with the[br]hypothesis that acts of genocide 0:15:49.870,0:15:53.539 were committed. The finding of genocide[br]is a legal finding, not so much 0:15:53.539,0:15:58.580 a scientific one. So as scientists,[br]our job is to provide evidence that 0:15:58.580,0:16:02.870 the finders of fact – the judges in this[br]case – can use in their determination. 0:16:02.870,0:16:05.219 This is evidence consistent[br]with that hypothesis. 0:16:05.219,0:16:08.189 Were this evidence otherwise, as[br]scientists we would say we would 0:16:08.189,0:16:11.480 reject the hypothesis that genocide was[br]committed. However, with this evidence 0:16:11.480,0:16:15.370 we find that the evidence,[br]the data is consistent with 0:16:15.370,0:16:18.080 the prosecution’s hypothesis. 0:16:18.080,0:16:25.320 So, it worked! 0:16:25.320,0:16:29.049 Ríos Montt was convicted on[br]genocide charges. applause 0:16:29.049,0:16:31.359 You can clap![br]applause 0:16:31.359,0:16:36.359 applause 0:16:36.359,0:16:39.499 For a week![br]mumbled, surprised laughter 0:16:39.499,0:16:42.279 Then the Constitutional Court intervened, 0:16:42.279,0:16:44.959 there I know a couple of experts on[br]Guatemala here in the audience 0:16:44.959,0:16:47.839 who can tell you more about why that[br]happened and exactly what happened. 0:16:47.839,0:16:52.669 However, the Constitutional[br]Court ordered a new trial, 0:16:52.669,0:16:59.160 which is at this time scheduled[br]for the very beginning of 2015. 0:16:59.160,0:17:02.970 And I look forward to testifying again, 0:17:02.970,0:17:06.820 and again, and again, and again! 0:17:06.820,0:17:12.680 applause 0:17:12.680,0:17:16.989 Look, but I wanna come back to this point.[br]Because as a bunch of technologists… 0:17:16.989,0:17:21.589 – there is a lot of folks who really like[br]technology here, I really like it too! 0:17:21.589,0:17:25.559 Technology doesn’t get us to science[br]– you have to have science 0:17:25.559,0:17:28.770 to get you to science. Technology helps[br]you organize the data. It helps you do 0:17:28.770,0:17:32.050 all kinds of extremely great and cool[br]things without which we wouldn’t be able 0:17:32.050,0:17:36.480 to even do the science. But you[br]can’t have just technology! 0:17:36.480,0:17:40.970 You can’t just have a bunch of data[br]and make conclusions. That’s naive, 0:17:40.970,0:17:44.529 and you will get the wrong conclusions.[br]‘The point of rigorous statistics is 0:17:44.529,0:17:48.100 to be right’, and there is a little bit of[br]a caveat there – or to at least know 0:17:48.100,0:17:51.620 how uncertain you are. Statistics is often[br]called the ‘Science of Uncertainty’. 0:17:51.620,0:17:55.960 That is actually my favorite[br]definition of it. So, 0:17:55.960,0:18:01.509 I’m going to assume that we[br]care about getting it right. 0:18:01.509,0:18:05.489 No one laughed, that’s good. 0:18:05.489,0:18:08.890 Not everyone does, to my distress. 0:18:08.890,0:18:11.320 So if you only have some of the data 0:18:11.320,0:18:15.490 – and I will argue that we always[br]only have some of the data – 0:18:15.490,0:18:20.449 you need some kind of model that will tell[br]you the relationship between your data 0:18:20.449,0:18:23.989 and the real world.[br]Statisticians call that an inference. 0:18:23.989,0:18:26.200 In order to get from here to there[br]you’re gonna need some kind of 0:18:26.200,0:18:30.469 probability model that tells you[br]why your data is like the world, 0:18:30.469,0:18:33.960 or in what sense you have to tweet,[br]twiddle and do algebra with your data 0:18:33.960,0:18:39.309 to get from what you can[br]observe to what is actually true. 0:18:39.309,0:18:42.690 And statistics is about comparisons.[br]Yeah, we get a big number and 0:18:42.690,0:18:46.169 journalists love the big number; but[br]it’s really about these relationships 0:18:46.169,0:18:50.609 and patterns! So to get those[br]relationships and patterns, 0:18:50.609,0:18:53.560 in order for them to be right, in order[br]for our answer to be correct, 0:18:53.560,0:18:57.439 every one of the estimates we make[br]for every point in the pattern 0:18:57.439,0:19:01.700 has to be right. It’s a hard[br]problem. It’s a hard problem. 0:19:01.700,0:19:05.070 And what I worry about is that[br]we have come into this world 0:19:05.070,0:19:09.400 in which people throw the notion of Big[br]Data around as though the data allows us 0:19:09.400,0:19:14.230 to make an end-run around problems[br]of sampling and modeling. It doesn’t. 0:19:14.230,0:19:19.120 So as technologist, the reason I’m,[br]you know, ranting at you guys about it 0:19:19.120,0:19:24.540 is that it’s very tempting to have a lot[br]of data and think you have an answer! 0:19:24.540,0:19:30.580 And it’s even more tempting because[br]in industry context you might be right. 0:19:30.580,0:19:36.739 Not so much in Human Rights, not so[br]much. Violence is a hidden process. 0:19:36.739,0:19:39.960 The people who commit violence have[br]an enormous commitment to hiding it, 0:19:39.960,0:19:44.420 distorting it, explaining it in different[br]ways. All of those things dramatically 0:19:44.420,0:19:48.350 affect the information that is produced[br]from the violence that we’re going to use 0:19:48.350,0:19:53.730 to do our analysis. So we usually[br]don’t know what we don’t know 0:19:53.730,0:19:58.220 in Human Rights data collection.[br]And that means that we don’t know 0:19:58.220,0:20:03.829 if what we don’t know is systematically[br]different from what we do know. 0:20:03.829,0:20:06.270 Maybe we know about all the lawyers[br]and we don’t know about the people 0:20:06.270,0:20:10.070 in the countryside. Maybe we know[br]about all the indigenous people 0:20:10.070,0:20:14.130 and not the non-indigenous people.[br]If that were true, the argument 0:20:14.130,0:20:17.980 that I just made would be merely[br]an artifact of the reporting process 0:20:17.980,0:20:21.740 rather than some true analysis. Now[br]we did the estimations why I believe 0:20:21.740,0:20:25.009 we can reject that critique, but that’s[br]what we have to worry about. 0:20:25.009,0:20:28.860 And let’s go back to the Venn diagram[br]and say: which of these is accurate? 0:20:28.860,0:20:32.840 It’s not just for one of the[br]points in our pattern analysis. 0:20:32.840,0:20:36.500 The problem is that we’re[br]going to compare things. 0:20:36.500,0:20:40.890 As in Peru where we compared killings[br]committed by the Peruvian army against 0:20:40.890,0:20:44.860 killings committed by the Maoist Guerillas[br]with the Sendero Luminoso. And we found 0:20:44.860,0:20:51.460 there that in fact we knew very little[br]about what the Sendero Luminoso had done. 0:20:51.460,0:20:55.779 Whereas we knew almost everything[br]what the Peruvian army had done. 0:20:55.779,0:20:57.970 This is called the coverage rate.[br]The rate between what we know and 0:20:57.970,0:21:02.750 what we don’t know. And[br]raw data, however big, 0:21:02.750,0:21:07.510 does not get us to patterns.[br]And here is a bunch of… 0:21:07.510,0:21:11.569 kinds of raw data that I’ve used[br]and that I really enjoy using. 0:21:11.569,0:21:14.270 You know – truth commission testimonies,[br]UN investigations, press articles, 0:21:14.270,0:21:18.309 SMS messages, crowdsourcing, NGO[br]documentation, social media feeds, 0:21:18.309,0:21:21.180 perpetrator records, government archives,[br]state agency registries – I know those 0:21:21.180,0:21:23.570 sound all the same but they actually[br]turn out to be slightly different. 0:21:23.570,0:21:28.340 Happy to talk in tedious detail! Refugee[br]Camp records, any non-random sample. 0:21:28.340,0:21:31.990 All of those are gonna take[br]some kind of probability model 0:21:31.990,0:21:36.070 and we don’t have that many[br]probability models to use. So 0:21:36.070,0:21:40.330 raw data is great for cases – but[br]it doesn’t get you to patterns. 0:21:40.330,0:21:45.120 And patterns – again – patterns are[br]the thing that allow us to do analysis. 0:21:45.120,0:21:49.289 They are the thing… the patterns are what[br]get us to something that we can use 0:21:49.289,0:21:53.629 to help prosecutors, advocates and the… 0:21:53.629,0:21:56.409 and the victims themselves. 0:21:56.409,0:22:00.589 I gave a version of this talk, a[br]much earlier version of this talk 0:22:00.589,0:22:04.630 several years ago in Medellín, Columbia.[br]I’ve worked a lot in Columbia, 0:22:04.630,0:22:07.670 it’s really… it’s a great place to[br]work. There’s really terrific 0:22:07.670,0:22:13.569 Victims Rights groups there.[br]And a woman from a township, 0:22:13.569,0:22:17.310 smaller than a county, near to Medellín[br]came up to me after the talk and she said: 0:22:17.310,0:22:21.150 “You know, a lot of people… you[br]know I’m a Human Rights activist, 0:22:21.150,0:22:25.309 my job is to collect data, I tell stories[br]about people who have suffered. 0:22:25.309,0:22:28.210 But there are people in my[br]village I know who have had 0:22:28.210,0:22:32.910 people in their families disappeared and[br]they’re never gonna talk about, ever. 0:22:32.910,0:22:38.090 We’re never going to be able to use[br]their names, because they are afraid.” 0:22:38.090,0:22:45.349 We can’t name the victims. At[br]least we’d better count them. 0:22:45.349,0:22:49.520 So about that counting: there’s[br]3 ways to do it right. You can have 0:22:49.520,0:22:54.430 a perfect census – you can have all the[br]data. Yeah it’s nice, good luck with that. 0:22:54.430,0:22:58.910 You can have a random sample[br]of the population - that’s hard! 0:22:58.910,0:23:03.029 Sometimes doable but very hard.[br]In my experience we rarely interview 0:23:03.029,0:23:07.140 victims of homicide, very rarely.[br]Laughing 0:23:07.140,0:23:09.640 And that means there’s a complicated[br]probability relationship between 0:23:09.640,0:23:13.670 the person you sampled, the interview[br]and the death that they talk to you about. 0:23:13.670,0:23:17.300 Or you can do some kind of posterior[br]modeling of the sampling process which is… 0:23:17.300,0:23:21.260 which is in essence what[br]I proposed in the earlier slide. 0:23:21.260,0:23:25.020 So what can we do with raw data,[br]guys? We can collect a bunch of… 0:23:25.020,0:23:28.930 We can say that a case exists. Ok[br]– that’s actually important! We can say: 0:23:28.930,0:23:34.409 “Something happened” with raw data. We can[br]say: “We know something about that case". 0:23:34.409,0:23:38.250 We can say: “There were 100 victims[br]in that case or at least 100 victims 0:23:38.250,0:23:41.570 in that case”, if we can name 100 people. 0:23:41.570,0:23:46.390 But we can’t do comparisons: “This[br]is the biggest massacre this year”. 0:23:46.390,0:23:48.350 We don’t really know. Because we[br]don’t know about that massacres 0:23:48.350,0:23:53.910 we don’t know about. No patterns. Don’t[br]talk about the hot spot of violence. 0:23:53.910,0:23:59.420 No, we don’t know that. Happy to talk[br]more about that if we gather after, 0:23:59.420,0:24:06.439 but I wanna come to a close here with[br]the importance of getting it right. 0:24:06.439,0:24:11.380 I’ve talked about one case today. This[br]is another case, the case of this man: 0:24:11.380,0:24:16.320 Edgar Fernando García. Mr. García was[br]a student Labor leader in Guatemala 0:24:16.320,0:24:19.800 early in the 1980s. He left[br]his office in February 1984 0:24:19.800,0:24:24.470 – did not come home. People reported[br]later that they saw someone 0:24:24.470,0:24:28.810 shoving Mr. García into a[br]vehicle and driving away. 0:24:28.810,0:24:33.900 His widow became a very important[br]Human Rights activist in Guatemala 0:24:33.900,0:24:38.570 and now she’s a very important, and[br]in my opinion impressive politician. 0:24:38.570,0:24:42.240 And there’s her infant daughter. She[br]continued to struggle to find out 0:24:42.240,0:24:46.130 what had happened to[br]Mr. García for decades. 0:24:46.130,0:24:50.400 And in 2006 documents came to light[br]in the National Archives of the… 0:24:50.400,0:24:54.429 excuse me, the Historical Archives[br]of the national Police, showing that 0:24:54.429,0:24:59.320 the Police had realized an operation[br]in the area of Mr. García’s office 0:24:59.320,0:25:01.930 and it was very likely that[br]they had disappeared him. 0:25:01.930,0:25:07.400 These 2 guys up here in the upper[br]right were Police officers in that area; 0:25:07.400,0:25:11.359 they were arrested, charged with the[br]disappearance of Mister García and 0:25:11.359,0:25:15.620 convicted. Part of the evidence used to[br]convict them was communications meta data 0:25:15.620,0:25:19.510 showing that documents[br]flowed through the archive. 0:25:19.510,0:25:23.699 I mean paper communications! We coded[br]it by hand. We went through and read 0:25:23.699,0:25:28.459 the ‘From’ and ‘To’ lines[br]from every Memo. And 0:25:28.459,0:25:34.229 they were convicted in 2010[br]and after that conviction 0:25:34.229,0:25:38.699 Mr. García’s infant daughter – now[br]a grown woman – was clearly joyful. 0:25:38.699,0:25:42.730 Justice brings closure to a family[br]that never knows when to start talking 0:25:42.730,0:25:48.059 about someone in the past tense.[br]Perhaps even more powerfully: 0:25:48.059,0:25:52.319 those guys’ grand boss, their boss's[br]boss, Colonel Héctor Bol de la Cruz, 0:25:52.319,0:25:58.439 this man here, was convicted[br]of Mr. García’s disappearance 0:25:58.439,0:26:02.069 in September this year [2013].[br]applause 0:26:02.069,0:26:07.610 applause 0:26:07.610,0:26:10.789 I don’t know if any of you have[br]ever been dissident students, 0:26:10.789,0:26:15.330 but if you’ve been dissident students[br]demonstrating in the street think about 0:26:15.330,0:26:19.300 how you would feel if your friends[br]and comrades were disappeared, 0:26:19.300,0:26:23.419 and take a long look at Colonel Bol[br]de la Cruz. Here is the rest of the stuff 0:26:23.419,0:26:25.626 that we will talk about if we gather[br]afterwards. Thank you very much 0:26:25.626,0:26:29.086 for your attention. I really[br]have enjoyed CCC. 0:26:29.086,0:26:36.086 applause 0:26:36.086,0:26:47.203 Subtitles created by c3subtitles.de[br]in the year 2016. Join and help us![br]