Return to Video

How statistics can be misleading - Mark Liddell

  • 0:07 - 0:09
    Statistics are persuasive.
  • 0:09 - 0:13
    So much so that people, organizations,
    and whole countries
  • 0:13 - 0:18
    base some of their most important
    decisions on organized data.
  • 0:18 - 0:19
    But there's a problem with that.
  • 0:19 - 0:23
    Any set of statistics might have something
    lurking inside it,
  • 0:23 - 0:27
    something that can turn the results
    completely upside down.
  • 0:27 - 0:31
    For example, imagine you need to choose
    between two hospitals
  • 0:31 - 0:34
    for an elderly relative's surgery.
  • 0:34 - 0:36
    Out of each hospital's
    last 1000 patient's,
  • 0:36 - 0:40
    900 survived at Hospital A,
  • 0:40 - 0:43
    while only 800 survived at Hospital B.
  • 0:43 - 0:46
    So it looks like Hospital A
    is the better choice.
  • 0:46 - 0:48
    But before you make your decision,
  • 0:48 - 0:51
    remember that not all patients
    arrive at the hospital
  • 0:51 - 0:54
    with the same level of health.
  • 0:54 - 0:57
    And if we divide each hospital's
    last 1000 patients
  • 0:57 - 1:01
    into those who arrived in good health
    and those who arrived in poor health,
  • 1:01 - 1:04
    the picture starts to look very different.
  • 1:04 - 1:08
    Hospital A had only 100 patients
    who arrived in poor health,
  • 1:08 - 1:10
    of which 30 survived.
  • 1:10 - 1:15
    But Hospital B had 400,
    and they were able to save 210.
  • 1:15 - 1:17
    So Hospital B is the better choice
  • 1:17 - 1:21
    for patients who arrive
    at hospital in poor health,
  • 1:21 - 1:25
    with a survival rate of 52.5%.
  • 1:25 - 1:28
    And what if your relative's health
    is good when she arrives at the hospital?
  • 1:28 - 1:32
    Strangely enough, Hospital B is still
    the better choice,
  • 1:32 - 1:36
    with a survival rate of over 98%.
  • 1:36 - 1:39
    So how can Hospital A have a better
    overall survival rate
  • 1:39 - 1:45
    if Hospital B has better survival rates
    for patients in each of the two groups?
  • 1:45 - 1:49
    What we've stumbled upon is a case
    of Simpson's Paradox,
  • 1:49 - 1:52
    where the same set of data can appear
    to show opposite trends
  • 1:52 - 1:55
    depending on how its grouped.
  • 1:55 - 1:59
    This often occurs when aggregated data
    hides a conditional variable,
  • 1:59 - 2:01
    sometimes known as lurking variable,
  • 2:01 - 2:07
    which is a hidden additional factor
    that significantly influences results.
  • 2:07 - 2:10
    Here, the hidden factor is the relative
    proportion of patients
  • 2:10 - 2:13
    who arrive in good or poor health.
  • 2:13 - 2:17
    Simpson's Paradox isn't just
    a hypothetical scenario.
  • 2:17 - 2:19
    It pops up from time
    to time in real world,
  • 2:19 - 2:22
    sometimes in important contexts.
  • 2:22 - 2:24
    One study in the U.K. appeared to show
  • 2:24 - 2:28
    that smokers had a higher survival rate
    than nonsmokers
  • 2:28 - 2:30
    over a twenty-year time period.
  • 2:30 - 2:33
    That is, until dividing the participants
    by age group
  • 2:33 - 2:38
    showed that the nonsmokers
    were significantly older on average,
  • 2:38 - 2:41
    and thus, more likely
    to die during the trial period,
  • 2:41 - 2:44
    precisely because they were living longer
    in general.
  • 2:44 - 2:47
    Here, the age groups
    are the lurking variable,
  • 2:47 - 2:50
    and are vital to correctly
    interpret the data.
  • 2:50 - 2:52
    In another example,
  • 2:52 - 2:54
    an analysis of Florida's
    death penalty cases
  • 2:54 - 2:58
    seemed to reveal
    no racial disparity in sentencing
  • 2:58 - 3:02
    between black and white defendants
    convicted of murder.
  • 3:02 - 3:06
    But dividing the cases by the race
    of the victim told a different story.
  • 3:06 - 3:08
    In either situation,
  • 3:08 - 3:11
    black defendants were more likely
    to be sentenced to death.
  • 3:11 - 3:15
    The slightly higher overall sentencing
    rate for white defendants
  • 3:15 - 3:19
    was due to the fact
    that cases with white victims
  • 3:19 - 3:21
    were more likely
    to elicit a death sentence
  • 3:21 - 3:24
    than cases where the victim was black,
  • 3:24 - 3:28
    and most murders occurred between
    people of the same race.
  • 3:28 - 3:31
    So how do we avoid
    falling for the paradox?
  • 3:31 - 3:35
    Unfortunately,
    there's no one-size-fits-all answer.
  • 3:35 - 3:39
    Data can be grouped and divided
    in any number of ways,
  • 3:39 - 3:42
    and overall numbers may sometimes
    give a more accurate picture
  • 3:42 - 3:47
    than data divided into misleading
    or arbitrary categories.
  • 3:47 - 3:52
    All we can do is carefully study the
    actual situations the statistics describe
  • 3:52 - 3:56
    and consider whether lurking variables
    may be present.
  • 3:56 - 3:59
    Otherwise, we leave ourselves
    vulnerable to those who would use data
  • 3:59 - 4:03
    to manipulate others
    and promote their own agendas.
Title:
How statistics can be misleading - Mark Liddell
Speaker:
Mark Liddell
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TED-Ed
Duration:
04:19

English subtitles

Revisions Compare revisions