[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:06.64,0:00:09.08,Default,,0000,0000,0000,,Statistics are persuasive. Dialogue: 0,0:00:09.08,0:00:12.54,Default,,0000,0000,0000,,So much so that people, organizations,\Nand whole countries Dialogue: 0,0:00:12.54,0:00:17.75,Default,,0000,0000,0000,,base some of their most important \Ndecisions on organized data. Dialogue: 0,0:00:17.75,0:00:19.48,Default,,0000,0000,0000,,But there's a problem with that. Dialogue: 0,0:00:19.48,0:00:23.30,Default,,0000,0000,0000,,Any set of statistics might have something\Nlurking inside it, Dialogue: 0,0:00:23.30,0:00:27.25,Default,,0000,0000,0000,,something that can turn the results\Ncompletely upside down. Dialogue: 0,0:00:27.25,0:00:30.92,Default,,0000,0000,0000,,For example, imagine you need to choose\Nbetween two hospitals Dialogue: 0,0:00:30.92,0:00:33.74,Default,,0000,0000,0000,,for an elderly relative's surgery. Dialogue: 0,0:00:33.74,0:00:36.43,Default,,0000,0000,0000,,Out of each hospital's \Nlast 1000 patient's, Dialogue: 0,0:00:36.43,0:00:39.61,Default,,0000,0000,0000,,900 survived at Hospital A, Dialogue: 0,0:00:39.61,0:00:43.02,Default,,0000,0000,0000,,while only 800 survived at Hospital B. Dialogue: 0,0:00:43.02,0:00:46.17,Default,,0000,0000,0000,,So it looks like Hospital A \Nis the better choice. Dialogue: 0,0:00:46.17,0:00:47.84,Default,,0000,0000,0000,,But before you make your decision, Dialogue: 0,0:00:47.84,0:00:51.41,Default,,0000,0000,0000,,remember that not all patients\Narrive at the hospital Dialogue: 0,0:00:51.41,0:00:53.81,Default,,0000,0000,0000,,with the same level of health. Dialogue: 0,0:00:53.81,0:00:56.70,Default,,0000,0000,0000,,And if we divide each hospital's\Nlast 1000 patients Dialogue: 0,0:00:56.70,0:01:01.13,Default,,0000,0000,0000,,into those who arrived in good health\Nand those who arrived in poor health, Dialogue: 0,0:01:01.13,0:01:03.77,Default,,0000,0000,0000,,the picture starts to look very different. Dialogue: 0,0:01:03.77,0:01:07.85,Default,,0000,0000,0000,,Hospital A had only 100 patients\Nwho arrived in poor health, Dialogue: 0,0:01:07.85,0:01:10.32,Default,,0000,0000,0000,,of which 30 survived. Dialogue: 0,0:01:10.32,0:01:14.85,Default,,0000,0000,0000,,But Hospital B had 400,\Nand they were able to save 210. Dialogue: 0,0:01:14.85,0:01:17.17,Default,,0000,0000,0000,,So Hospital B is the better choice Dialogue: 0,0:01:17.17,0:01:20.74,Default,,0000,0000,0000,,for patients who arrive \Nat hospital in poor health, Dialogue: 0,0:01:20.74,0:01:24.53,Default,,0000,0000,0000,,with a survival rate of 52.5%. Dialogue: 0,0:01:24.53,0:01:28.44,Default,,0000,0000,0000,,And what if your relative's health\Nis good when she arrives at the hospital? Dialogue: 0,0:01:28.44,0:01:32.27,Default,,0000,0000,0000,,Strangely enough, Hospital B is still\Nthe better choice, Dialogue: 0,0:01:32.27,0:01:35.68,Default,,0000,0000,0000,,with a survival rate of over 98%. Dialogue: 0,0:01:35.68,0:01:38.73,Default,,0000,0000,0000,,So how can Hospital A have a better\Noverall survival rate Dialogue: 0,0:01:38.73,0:01:44.83,Default,,0000,0000,0000,,if Hospital B has better survival rates\Nfor patients in each of the two groups? Dialogue: 0,0:01:44.83,0:01:48.59,Default,,0000,0000,0000,,What we've stumbled upon is a case\Nof Simpson's paradox, Dialogue: 0,0:01:48.59,0:01:51.90,Default,,0000,0000,0000,,where the same set of data can appear\Nto show opposite trends Dialogue: 0,0:01:51.90,0:01:54.66,Default,,0000,0000,0000,,depending on how it's grouped. Dialogue: 0,0:01:54.66,0:01:58.74,Default,,0000,0000,0000,,This often occurs when aggregated data\Nhides a conditional variable, Dialogue: 0,0:01:58.74,0:02:01.38,Default,,0000,0000,0000,,sometimes known as a lurking variable, Dialogue: 0,0:02:01.38,0:02:06.58,Default,,0000,0000,0000,,which is a hidden additional factor\Nthat significantly influences results. Dialogue: 0,0:02:06.58,0:02:10.02,Default,,0000,0000,0000,,Here, the hidden factor is the relative\Nproportion of patients Dialogue: 0,0:02:10.02,0:02:13.26,Default,,0000,0000,0000,,who arrive in good or poor health. Dialogue: 0,0:02:13.26,0:02:16.54,Default,,0000,0000,0000,,Simpson's paradox isn't just\Na hypothetical scenario. Dialogue: 0,0:02:16.54,0:02:18.92,Default,,0000,0000,0000,,It pops up from time \Nto time in the real world, Dialogue: 0,0:02:18.92,0:02:22.13,Default,,0000,0000,0000,,sometimes in important contexts. Dialogue: 0,0:02:22.13,0:02:24.13,Default,,0000,0000,0000,,One study in the UK appeared to show Dialogue: 0,0:02:24.13,0:02:27.60,Default,,0000,0000,0000,,that smokers had a higher survival rate\Nthan nonsmokers Dialogue: 0,0:02:27.60,0:02:29.85,Default,,0000,0000,0000,,over a twenty-year time period. Dialogue: 0,0:02:29.85,0:02:33.31,Default,,0000,0000,0000,,That is, until dividing the participants\Nby age group Dialogue: 0,0:02:33.31,0:02:37.82,Default,,0000,0000,0000,,showed that the nonsmokers \Nwere significantly older on average, Dialogue: 0,0:02:37.82,0:02:40.93,Default,,0000,0000,0000,,and thus, more likely\Nto die during the trial period, Dialogue: 0,0:02:40.93,0:02:44.44,Default,,0000,0000,0000,,precisely because they were living longer\Nin general. Dialogue: 0,0:02:44.44,0:02:47.29,Default,,0000,0000,0000,,Here, the age groups \Nare the lurking variable, Dialogue: 0,0:02:47.29,0:02:50.18,Default,,0000,0000,0000,,and are vital to correctly \Ninterpret the data. Dialogue: 0,0:02:50.18,0:02:51.56,Default,,0000,0000,0000,,In another example, Dialogue: 0,0:02:51.56,0:02:54.28,Default,,0000,0000,0000,,an analysis of Florida's \Ndeath penalty cases Dialogue: 0,0:02:54.28,0:02:58.26,Default,,0000,0000,0000,,seemed to reveal \Nno racial disparity in sentencing Dialogue: 0,0:02:58.26,0:03:01.58,Default,,0000,0000,0000,,between black and white defendants\Nconvicted of murder. Dialogue: 0,0:03:01.58,0:03:06.40,Default,,0000,0000,0000,,But dividing the cases by the race\Nof the victim told a different story. Dialogue: 0,0:03:06.40,0:03:07.97,Default,,0000,0000,0000,,In either situation, Dialogue: 0,0:03:07.97,0:03:11.09,Default,,0000,0000,0000,,black defendants were more likely\Nto be sentenced to death. Dialogue: 0,0:03:11.09,0:03:15.07,Default,,0000,0000,0000,,The slightly higher overall sentencing \Nrate for white defendants Dialogue: 0,0:03:15.07,0:03:18.69,Default,,0000,0000,0000,,was due to the fact \Nthat cases with white victims Dialogue: 0,0:03:18.69,0:03:21.36,Default,,0000,0000,0000,,were more likely \Nto elicit a death sentence Dialogue: 0,0:03:21.36,0:03:24.09,Default,,0000,0000,0000,,than cases where the victim was black, Dialogue: 0,0:03:24.09,0:03:28.48,Default,,0000,0000,0000,,and most murders occurred between\Npeople of the same race. Dialogue: 0,0:03:28.48,0:03:31.32,Default,,0000,0000,0000,,So how do we avoid \Nfalling for the paradox? Dialogue: 0,0:03:31.32,0:03:34.69,Default,,0000,0000,0000,,Unfortunately, \Nthere's no one-size-fits-all answer. Dialogue: 0,0:03:34.69,0:03:38.50,Default,,0000,0000,0000,,Data can be grouped and divided\Nin any number of ways, Dialogue: 0,0:03:38.50,0:03:42.11,Default,,0000,0000,0000,,and overall numbers may sometimes\Ngive a more accurate picture Dialogue: 0,0:03:42.11,0:03:46.64,Default,,0000,0000,0000,,than data divided into misleading\Nor arbitrary categories. Dialogue: 0,0:03:46.64,0:03:52.09,Default,,0000,0000,0000,,All we can do is carefully study the\Nactual situations the statistics describe Dialogue: 0,0:03:52.09,0:03:55.98,Default,,0000,0000,0000,,and consider whether lurking variables\Nmay be present. Dialogue: 0,0:03:55.98,0:03:59.38,Default,,0000,0000,0000,,Otherwise, we leave ourselves\Nvulnerable to those who would use data Dialogue: 0,0:03:59.38,0:04:02.65,Default,,0000,0000,0000,,to manipulate others\Nand promote their own agendas.