0:00:06.636,0:00:09.077 Statistics are persuasive. 0:00:09.077,0:00:12.541 So much so that people, organizations,[br]and whole countries 0:00:12.541,0:00:17.747 base some of their most important [br]decisions on organized data. 0:00:17.747,0:00:19.484 But there's a problem with that. 0:00:19.484,0:00:23.301 Any set of statistics might have something[br]lurking inside it, 0:00:23.301,0:00:27.251 something that can turn the results[br]completely upside down. 0:00:27.251,0:00:30.920 For example, imagine you need to choose[br]between two hospitals 0:00:30.920,0:00:33.737 for an elderly relative's surgery. 0:00:33.737,0:00:36.434 Out of each hospital's [br]last 1000 patient's, 0:00:36.434,0:00:39.612 900 survived at Hospital A, 0:00:39.612,0:00:43.021 while only 800 survived at Hospital B. 0:00:43.021,0:00:46.170 So it looks like Hospital A [br]is the better choice. 0:00:46.170,0:00:47.843 But before you make your decision, 0:00:47.843,0:00:51.411 remember that not all patients[br]arrive at the hospital 0:00:51.411,0:00:53.811 with the same level of health. 0:00:53.811,0:00:56.703 And if we divide each hospital's[br]last 1000 patients 0:00:56.703,0:01:01.132 into those who arrived in good health[br]and those who arrived in poor health, 0:01:01.132,0:01:03.772 the picture starts to look very different. 0:01:03.772,0:01:07.849 Hospital A had only 100 patients[br]who arrived in poor health, 0:01:07.849,0:01:10.325 of which 30 survived. 0:01:10.325,0:01:14.852 But Hospital B had 400,[br]and they were able to save 210. 0:01:14.852,0:01:17.169 So Hospital B is the better choice 0:01:17.169,0:01:20.741 for patients who arrive [br]at hospital in poor health, 0:01:20.741,0:01:24.526 with a survival rate of 52.5%. 0:01:24.526,0:01:28.445 And what if your relative's health[br]is good when she arrives at the hospital? 0:01:28.445,0:01:32.271 Strangely enough, Hospital B is still[br]the better choice, 0:01:32.271,0:01:35.676 with a survival rate of over 98%. 0:01:35.676,0:01:38.733 So how can Hospital A have a better[br]overall survival rate 0:01:38.733,0:01:44.830 if Hospital B has better survival rates[br]for patients in each of the two groups? 0:01:44.830,0:01:48.589 What we've stumbled upon is a case[br]of Simpson's paradox, 0:01:48.589,0:01:51.899 where the same set of data can appear[br]to show opposite trends 0:01:51.899,0:01:54.664 depending on how it's grouped. 0:01:54.664,0:01:58.744 This often occurs when aggregated data[br]hides a conditional variable, 0:01:58.744,0:02:01.377 sometimes known as a lurking variable, 0:02:01.377,0:02:06.584 which is a hidden additional factor[br]that significantly influences results. 0:02:06.584,0:02:10.023 Here, the hidden factor is the relative[br]proportion of patients 0:02:10.023,0:02:13.264 who arrive in good or poor health. 0:02:13.264,0:02:16.544 Simpson's paradox isn't just[br]a hypothetical scenario. 0:02:16.544,0:02:18.924 It pops up from time [br]to time in the real world, 0:02:18.924,0:02:22.132 sometimes in important contexts. 0:02:22.132,0:02:24.130 One study in the UK appeared to show 0:02:24.130,0:02:27.600 that smokers had a higher survival rate[br]than nonsmokers 0:02:27.600,0:02:29.846 over a twenty-year time period. 0:02:29.846,0:02:33.307 That is, until dividing the participants[br]by age group 0:02:33.307,0:02:37.823 showed that the nonsmokers [br]were significantly older on average, 0:02:37.823,0:02:40.930 and thus, more likely[br]to die during the trial period, 0:02:40.930,0:02:44.438 precisely because they were living longer[br]in general. 0:02:44.438,0:02:47.286 Here, the age groups [br]are the lurking variable, 0:02:47.286,0:02:50.176 and are vital to correctly [br]interpret the data. 0:02:50.176,0:02:51.559 In another example, 0:02:51.559,0:02:54.281 an analysis of Florida's [br]death penalty cases 0:02:54.281,0:02:58.265 seemed to reveal [br]no racial disparity in sentencing 0:02:58.265,0:03:01.581 between black and white defendants[br]convicted of murder. 0:03:01.581,0:03:06.396 But dividing the cases by the race[br]of the victim told a different story. 0:03:06.396,0:03:07.969 In either situation, 0:03:07.969,0:03:11.091 black defendants were more likely[br]to be sentenced to death. 0:03:11.091,0:03:15.066 The slightly higher overall sentencing [br]rate for white defendants 0:03:15.066,0:03:18.692 was due to the fact [br]that cases with white victims 0:03:18.692,0:03:21.359 were more likely [br]to elicit a death sentence 0:03:21.359,0:03:24.091 than cases where the victim was black, 0:03:24.091,0:03:28.483 and most murders occurred between[br]people of the same race. 0:03:28.483,0:03:31.319 So how do we avoid [br]falling for the paradox? 0:03:31.319,0:03:34.686 Unfortunately, [br]there's no one-size-fits-all answer. 0:03:34.686,0:03:38.504 Data can be grouped and divided[br]in any number of ways, 0:03:38.504,0:03:42.106 and overall numbers may sometimes[br]give a more accurate picture 0:03:42.106,0:03:46.638 than data divided into misleading[br]or arbitrary categories. 0:03:46.638,0:03:52.089 All we can do is carefully study the[br]actual situations the statistics describe 0:03:52.089,0:03:55.977 and consider whether lurking variables[br]may be present. 0:03:55.977,0:03:59.378 Otherwise, we leave ourselves[br]vulnerable to those who would use data 0:03:59.378,0:04:02.649 to manipulate others[br]and promote their own agendas.