Return to Video

Utilizando el poder de millones de mentes humanas | Luis von Ahn | TEDxRíodelaPlata

  • 0:02 - 0:05
    Hello. Well, let me start
    by asking you a question:
  • 0:06 - 0:09
    How many of you had to fill out
    some sort of web form
  • 0:09 - 0:13
    where you've been asked to read
    a distorted sequence of characters?
  • 0:16 - 0:18
    How many of you found it really annoying?
  • 0:19 - 0:21
    Okay, outstanding. So I invented that.
  • 0:21 - 0:22
    (Laughter)
  • 0:22 - 0:26
    (Applause)
  • 0:30 - 0:32
    That thing is called a CAPTCHA.
  • 0:32 - 0:36
    And it is there to make sure
    the entity filling out the form
  • 0:36 - 0:39
    is actually a human and not
    some sort of computer program
  • 0:39 - 0:43
    that was written to submit the form
    millions and millions of times.
  • 0:43 - 0:45
    The reason it works is because humans
  • 0:45 - 0:47
    have no trouble reading
    these squiggly characters,
  • 0:47 - 0:50
    whereas computer programs
    simply can't do it as well yet.
  • 0:50 - 0:54
    For example, when you're buying tickets
    online for attending a concert
  • 0:54 - 0:57
    the reason you have to type
  • 0:58 - 1:02
    these distorted characters is to prevent
  • 1:02 - 1:04
    scalpers from writing a program
  • 1:04 - 1:07
    that can buy millions of tickets,
    two at a time.
  • 1:07 - 1:09
    CAPTCHAs are used all over the Internet.
  • 1:09 - 1:11
    And since they're used so often,
  • 1:11 - 1:16
    a lot of times the precise sequence
    of random characters shown to the user
  • 1:16 - 1:17
    is not so fortunate.
  • 1:17 - 1:21
    So this is an example from Yahoo.
  • 1:21 - 1:23
    The random characters that happened
  • 1:23 - 1:27
    to be shown to the user were W, A, I, T
  • 1:27 - 1:30
    which spells a word.
  • 1:30 - 1:33
    But the best part is the message
    that the Yahoo help desk
  • 1:33 - 1:36
    got about 20 minutes later.
  • 1:36 - 1:38
    ["Help! I've been waiting
    for over 20 minutes,
  • 1:38 - 1:42
    and nothing happens."]
    (Laughter)
  • 1:42 - 1:46
    This of course, is not as bad
    as this poor person.
  • 1:46 - 1:49
    [REBOOT]
    (Laughter)
  • 1:49 - 1:52
    I can tell funny stories
    about captchas for hours
  • 1:52 - 1:53
    but since I cannot do that
  • 1:53 - 1:56
    let me tell you about a project
    that we did afterwards
  • 1:56 - 1:58
    which is sort of the next
    evolution of CAPTCHA.
  • 1:58 - 2:00
    We call it reCAPTCHA,
  • 2:00 - 2:03
    which is something that
    we started at the University,
  • 2:03 - 2:05
    and then we turned it
    into a startup company.
  • 2:05 - 2:07
    And then Google acquired this company.
  • 2:07 - 2:10
    so, all what I'm going to say
    for the next 5 minutes
  • 2:10 - 2:12
    is owned by Google.
  • 2:12 - 2:15
    So, please, do not spread the word.
  • 2:15 - 2:19
    So let me tell you
    how this project started.
  • 2:19 - 2:23
    It turns out that about 200 million
    CAPTCHAs are typed everyday.
  • 2:23 - 2:26
    When I first heard this,
    I was quite proud of myself.
  • 2:26 - 2:29
    I thought, "look at the impact
    that my research has had."
  • 2:29 - 2:31
    But then I started feeling bad.
  • 2:31 - 2:33
    They are not only obnoxious, but also
  • 2:33 - 2:37
    each time you type a CAPTCHA
  • 2:37 - 2:39
    essentially you waste
    10 seconds of your time.
  • 2:39 - 2:43
    And if you multiply that
    by 200 million you get that
  • 2:43 - 2:47
    humanity as a whole is wasting
    about 500,000 hours every day
  • 2:47 - 2:49
    typing these annoying CAPTCHAs.
  • 2:49 - 2:51
    So then I started feeling bad.
  • 2:51 - 2:54
    And then I started thinking,
    is there any way
  • 2:54 - 2:58
    we can use this effort for something
    that is good for humanity?
  • 2:58 - 3:04
    While you're typing a CAPTCHA,
    during those 10 seconds,
  • 3:04 - 3:06
    your brain is doing something amazing.
  • 3:06 - 3:10
    Your brain is doing something
    that computers cannot yet do.
  • 3:10 - 3:11
    So can we get you to do some
  • 3:11 - 3:13
    useful work to mankind?
  • 3:13 - 3:14
    Putting it differently,
  • 3:14 - 3:17
    is there some humongous problem
    that we cannot yet get
  • 3:17 - 3:19
    computers to solve,
  • 3:19 - 3:21
    yet we can split into tiny chunks
  • 3:21 - 3:23
    such that each time
    somebody solves a CAPTCHA
  • 3:23 - 3:25
    they solve a little bit of this problem?
  • 3:25 - 3:28
    The answer to that is "yes,"
    and this is what we're doing now.
  • 3:28 - 3:33
    So what you may not know is that
    nowadays while you're typing a CAPTCHA,
  • 3:33 - 3:36
    not only are you authenticating
    yourself as a human,
  • 3:36 - 3:39
    but in addition you're actually
    helping us to digitize books.
  • 3:39 - 3:41
    So let me explain how this works.
  • 3:41 - 3:44
    So there's a lot of projects out there
    trying to digitize the existing books.
  • 3:44 - 3:46
    Google is digitizing books.
  • 3:46 - 3:48
    Amazon, with the Kindle,
    is digitizing books.
  • 3:48 - 3:51
    The way this works
    is you start with an old book.
  • 3:51 - 3:53
    You've seen those things, right?
  • 3:53 - 3:54
    Like a book?
  • 3:54 - 3:56
    (Laughter)
  • 3:56 - 3:58
    So you start with a book,
    and then you scan it.
  • 3:58 - 4:04
    Now scanning a book is like taking
    a digital photograph of every page.
  • 4:08 - 4:10
    The next step in the process
    is that the computer
  • 4:10 - 4:15
    needs to be able to decipher
    all of the words in this image.
  • 4:15 - 4:19
    Now the problem is that for older books
    that were written several years ago
  • 4:19 - 4:21
    the computer cannot recognize
    a lot of the words
  • 4:21 - 4:25
    because the ink has faded
    and the pages have turned yellow.
  • 4:25 - 4:27
    Thus the words look a bit different
  • 4:27 - 4:30
    and the computer cannot recognize them.
  • 4:30 - 4:32
    So, for books that were written
    more than 50 years ago,
  • 4:32 - 4:36
    the computer cannot recognize
    about 30 percent of the words.
  • 4:36 - 4:37
    So what we're doing now
  • 4:37 - 4:40
    is we're taking all of the words
    that the computer cannot recognize
  • 4:40 - 4:44
    and we're getting people to read them
    for us while they're typing
  • 4:44 - 4:45
    a CAPTCHA on the Internet.
  • 4:45 - 4:48
    So, the next time you type a CAPTCHA -
  • 4:48 - 4:54
    (Applause)
  • 4:54 - 4:58
    these words that you're typing
  • 4:58 - 5:01
    are actually words that are coming
    from books that are being digitized
  • 5:01 - 5:03
    that the computer could not recognize.
  • 5:03 - 5:07
    And now the reason we have
    two words nowadays instead of one
  • 5:07 - 5:12
    is because we need to verify
    if the answer is correct.
  • 5:14 - 5:18
    Because one of the words is such
    that the system knows what it was,
  • 5:18 - 5:21
    and the other is a word that
    the system just got out of a book,
  • 5:21 - 5:24
    it didn't know what it was,
    and it's presented to you.
  • 5:24 - 5:27
    We're going to ask you to type both words.
  • 5:27 - 5:29
    And we won't tell you which one's which.
  • 5:29 - 5:31
    And if you type the correct word
  • 5:31 - 5:33
    for the one for which the system
    already knows the answer,
  • 5:34 - 5:35
    it assumes you are human,
  • 5:35 - 5:40
    and it also gets some confidence
    that you typed the other word correctly.
  • 5:40 - 5:43
    And if we repeat this process
    to like 10 different people
  • 5:43 - 5:46
    and all of them agree
    on what the new word is,
  • 5:46 - 5:48
    we are very confident
    that this new word
  • 5:48 - 5:50
    was accurately digitized.
  • 5:50 - 5:52
    So this is how the system works.
  • 5:52 - 5:54
    The good thing is that
    it has been very successful.
  • 5:54 - 5:58
    We're digitizing about
    100 million words a day,
  • 5:58 - 6:02
    which is the equivalent of
    about two million books a year.
  • 6:02 - 6:04
    And this is all being done
    one word at a time
  • 6:04 - 6:07
    by just people typing CAPTCHAs
    on the Internet.
  • 6:07 - 6:11
    Now, since we're doing
    so many words per day,
  • 6:11 - 6:15
    funny things can happen.
  • 6:15 - 6:18
    And this is especially true
    because now we're giving people
  • 6:18 - 6:22
    two randomly chosen English words
    next to each other.
  • 6:22 - 6:24
    So funny things can happen.
  • 6:24 - 6:26
    For example, we presented this word.
  • 6:26 - 6:30
    It's the word "Christians";
    there's nothing wrong with it.
  • 6:30 - 6:34
    But if you present it along with
    another randomly chosen word,
  • 6:34 - 6:35
    bad things can happen.
  • 6:35 - 6:38
    So we get this.
    [bad Christians]
  • 6:38 - 6:39
    (Laughter)
  • 6:39 - 6:41
    It's quite funny.
  • 6:41 - 6:43
    But it's even worse,
    because the particular website
  • 6:43 - 6:45
    where we showed this
  • 6:45 - 6:50
    actually happened to be called
    The Embassy of the Kingdom of God.
  • 6:50 - 6:52
    (Laughter)
  • 6:52 - 6:53
    Oops!
  • 6:53 - 6:55
    Here's another really bad one.
  • 6:55 - 6:59
    American politician, JohnEdwards.com
    [Damn liberal]
  • 6:59 - 7:04
    (Laughter)
  • 7:04 - 7:07
    So we keep on insulting people everyday.
  • 7:07 - 7:09
    Now, we're not just insulting people.
  • 7:09 - 7:12
    Quite often,
    interesting things can happen.
  • 7:12 - 7:15
    So this actually has given rise
    to an Internet meme
  • 7:15 - 7:17
    that millions of people
    have participated in,
  • 7:17 - 7:20
    which is called CAPTCHA art.
  • 7:20 - 7:22
    Here's how it works.
  • 7:22 - 7:26
    Imagine you're using the
    Internet and you see a CAPTCHA
  • 7:26 - 7:28
    that you think is somewhat peculiar,
  • 7:28 - 7:30
    like this CAPTCHA.
  • 7:30 - 7:33
    Then what you're supposed to do
    is you take a screen shot of it.
  • 7:34 - 7:35
    Then of course,
    you fill out the CAPTCHA
  • 7:35 - 7:38
    because you help us
    digitize a book, please.
  • 7:38 - 7:40
    But then, first you take a screen shot,
  • 7:40 - 7:44
    and then you draw something
    that is related to it, like this.
  • 7:44 - 7:46
    [invisible toaster]
  • 7:46 - 7:47
    (Laughter)
  • 7:47 - 7:51
    It's just an example of CAPTCHA art.
  • 7:51 - 7:54
    There are tens of thousands of these.
    Some of them are interesting.
  • 7:54 - 7:58
    Some of them are very cute.
    [clenched it!]
  • 7:58 - 8:01
    Some of them are funnier.
  • 8:01 - 8:08
    [stoned founders]
    (Laughter)
  • 8:08 - 8:12
    This is my favorite number
    of this whole project: 900 millions.
  • 8:12 - 8:15
    This is the number of distinct people
  • 8:15 - 8:17
    that have helped us digitize
    at least one word
  • 8:17 - 8:19
    out of a book through reCAPTCHA.
  • 8:19 - 8:21
    A little over 10%
    of the world's population,
  • 8:21 - 8:23
    has helped digitize human knowledge.
  • 8:23 - 8:26
    And it is numbers like these
    that motivate my research agenda.
  • 8:26 - 8:30
    So the question that motivates
    my research is the following:
  • 8:30 - 8:33
    If you look at humanity's
    large-scale achievements,
  • 8:33 - 8:35
    these really big things that humanity
    has gotten together
  • 8:35 - 8:40
    like building the pyramids of Egypt
    or the Panama Canal
  • 8:40 - 8:42
    or putting a man on the Moon --
  • 8:42 - 8:45
    there is a curious fact about them,
  • 8:45 - 8:48
    and it is that they were all done
    with about the same number of people.
  • 8:48 - 8:51
    They were all done with
    about 100,000 people.
  • 8:51 - 8:53
    We can ask ourselves
    why is that all of them used
  • 8:53 - 8:55
    about the same number of people.
  • 8:55 - 8:57
    The reason for that is because,
    before the Internet,
  • 8:57 - 9:01
    coordinating more than
    100,000 people was impossible.
  • 9:01 - 9:04
    But now with the Internet,
    I've just shown you a project
  • 9:04 - 9:06
    where we've coordinated
    900 million people.
  • 9:06 - 9:09
    So the question that
    motivates my research is,
  • 9:09 - 9:11
    if we can put a man on the Moon
    with 100,000 people,
  • 9:11 - 9:14
    what can we do with
    100 million people?
  • 9:14 - 9:17
    Based on this question,
    we've been working on a lot of projects.
  • 9:17 - 9:19
    I will not tell you
    about all we have done.
  • 9:19 - 9:22
    But, let me tell you about
    one that we are working on now.
  • 9:22 - 9:25
    We've been working on this
    for about two years now.
  • 9:26 - 9:30
    And we're going to launch it
    in about 30 days.
  • 9:30 - 9:33
    It's called Duolingo.
  • 9:33 - 9:35
    This project started asking
    the following question:
  • 9:36 - 9:40
    How can we get 100 million people
  • 9:40 - 9:45
    translating the Web into
    every major language for free?
  • 9:45 - 9:48
    So there's a lot of things
    to say about this question.
  • 9:48 - 9:49
    First of all, translating the Web.
  • 9:49 - 9:52
    Right now it is partitioned
    into multiple languages.
  • 9:52 - 9:54
    A large fraction of it is in English.
  • 9:54 - 9:57
    If you don't know any English,
    you can't access it.
  • 9:57 - 9:59
    But large fractions are
    in other languages,
  • 9:59 - 10:01
    and if you don't know the languages
    you can't access them.
  • 10:01 - 10:05
    I would like to translate all
    of the Web into every major language.
  • 10:05 - 10:08
    Now some of you may say,
  • 10:08 - 10:11
    why can't we use computers to translate?
  • 10:11 - 10:15
    Machine translation nowadays is starting
    to translate some sentences.
  • 10:15 - 10:17
    Well the problem with that is that
  • 10:17 - 10:19
    it's not yet good enough,
  • 10:19 - 10:23
    and it probably won't be
    for the next 20 to 30 years.
  • 10:23 - 10:27
    So let me show you an example of something
  • 10:27 - 10:28
    that was translated by a machine.
  • 10:28 - 10:33
    Actually it was a forum
    about programming questions.
  • 10:38 - 10:40
    It was a programming question
    translated from Japanese
  • 10:40 - 10:45
    into English and from then into Spanish,
    though my translation is good.
  • 10:45 - 10:47
    The other one is bad. You'll see.
  • 10:47 - 10:50
    So I'll just let you read.
  • 10:50 - 10:53
    This person starts apologizing for
    the machine translation.
  • 10:53 - 10:56
    Indeed, this was done with
    the best translation program
  • 10:56 - 10:57
    from Japanese into English.
  • 10:57 - 11:03
    Remember, it's a question
    about computer programming.
  • 11:03 - 11:06
    So here you are the preamble
    to the question.
  • 11:06 - 11:12
    [At often, the goat-time install
    a error is vomit.] (Laughter)
  • 11:12 - 11:14
    Then comes the first part of the question.
  • 11:14 - 11:20
    [How many times like the wind, a pole,
    and the dragon?] (Laughter)
  • 11:20 - 11:22
    Then comes my favorite part
    of the question.
  • 11:22 - 11:26
    [This insult to father's stones?]
    (Laughter)
  • 11:26 - 11:28
    And then comes my favorite
    part of the whole thing.
  • 11:28 - 11:32
    [Please apologize for your stupidity.
    There are a many thank you.] (Laughter)
  • 11:32 - 11:35
    Okay, so computer translation
    isn't yet good enough.
  • 11:35 - 11:37
    We need people to translate.
  • 11:37 - 11:39
    So what I want is to get
    100 million people
  • 11:39 - 11:43
    translating the Web into
    every major language for free.
  • 11:43 - 11:46
    I couldn't afford paying
    100 million people for the job,
  • 11:46 - 11:47
    so I want them to do it for free.
  • 11:47 - 11:49
    Now if this is what you want to do,
  • 11:49 - 11:51
    you pretty quickly realize
    you're going to run into
  • 11:52 - 11:56
    two pretty big obstacles,
    needing to be hurdled.
  • 11:56 - 12:00
    The first one is a lack of bilinguals.
  • 12:00 - 12:03
    So I don't even know if there
    exists 100 million people out there
  • 12:03 - 12:07
    using the Web who are bilingual
    enough to help us translate.
  • 12:07 - 12:08
    That's a big problem.
  • 12:08 - 12:10
    The other problem is a lack of motivation.
  • 12:10 - 12:14
    How are we going to motivate people
    to actually translate the Web for free?
  • 12:14 - 12:19
    After thinking about this for months,
  • 12:19 - 12:21
    we realized there's actually a way
  • 12:21 - 12:24
    to solve both these problems
    with the same solution.
  • 12:24 - 12:27
    We realized that there's a way
    to kill two birds with one stone.
  • 12:27 - 12:31
    And that is to transform
    language translation
  • 12:31 - 12:34
    into something that millions
    of people want to do,
  • 12:34 - 12:38
    and that also helps with
    the problem of lack of bilinguals,
  • 12:38 - 12:40
    and that is language education.
  • 12:40 - 12:44
    It turns out there are millions of people
    wanting to learn other languages.
  • 12:44 - 12:49
    Today there are over 1.2 billion people
    learning a foreign language.
  • 12:49 - 12:52
    And it's not just because
    they're being forced to do so in school.
  • 12:52 - 12:55
    For example, in the United States alone,
    there are over
  • 12:55 - 12:59
    5 million people who have paid
    over $500 for software
  • 12:59 - 13:00
    to learn a new language.
  • 13:00 - 13:03
    Many people want to learn a new language.
  • 13:03 - 13:07
    So what we've been working on
    for the last two years
  • 13:07 - 13:09
    is a new website called Duolingo,
  • 13:09 - 13:12
    where the basic idea is
    people learn a new language
  • 13:12 - 13:17
    for free, while simultaneously
    translating the Web.
  • 13:18 - 13:20
    And so they're learning by doing.
  • 13:20 - 13:22
    So this is how it works.
  • 13:22 - 13:26
    The way this works is whenever
    you're a just a beginner,
  • 13:26 - 13:28
    we give you very,
    very simple sentences on the Web.
  • 13:28 - 13:32
    And if you don't know a word
    we'll tell you what each word means
  • 13:32 - 13:34
    though you are asked
    to "translate this sentence".
  • 13:34 - 13:36
    And it turns out that it really works.
  • 13:36 - 13:38
    Even though people know nothing
    of the language
  • 13:38 - 13:42
    if we explain what each word means,
    they'll be able to translate it.
  • 13:42 - 13:43
    As you see how other people translate
  • 13:43 - 13:46
    the same sentence,
    you start learning the language.
  • 13:46 - 13:48
    And as you get more and more advanced,
  • 13:48 - 13:51
    we give you more and more
    complex sentences to translate.
  • 13:51 - 13:53
    This is how
    you are going to help us translate.
  • 13:53 - 13:55
    This is how the site works.
  • 13:55 - 13:57
    We're mostly done building it,
  • 13:57 - 13:59
    and now we're testing it.
  • 13:59 - 14:01
    When we started working on this
  • 14:01 - 14:04
    I didn't think it could work, really.
  • 14:04 - 14:06
    But it turns out that
    it works, indeed. It's amazing.
  • 14:06 - 14:09
    First, people really
    can learn a language with it.
  • 14:09 - 14:12
    In this case we are testing it with people
  • 14:12 - 14:15
    knowing English,
    wanting to learn Spanish,
  • 14:15 - 14:16
    and vice versa.
  • 14:16 - 14:18
    So people really do learn a language.
  • 14:18 - 14:20
    And they learn it about as well
  • 14:20 - 14:22
    as the leading language learning software,
  • 14:22 - 14:24
    which is very good,
    but perhaps more surprisingly,
  • 14:25 - 14:29
    the translations that we get from people
    using the site are very good.
  • 14:29 - 14:31
    They are as accurate as those
  • 14:31 - 14:33
    of professional language translators.
  • 14:33 - 14:36
    Now of course,
    we play a trick here and it is that
  • 14:36 - 14:41
    we combine the translations
    of multiple beginners, several students,
  • 14:41 - 14:42
    and choose the best.
  • 14:43 - 14:45
    But it turns out
    that that best translation
  • 14:45 - 14:48
    is as good as those of
    professional language translators.
  • 14:48 - 14:54
    Now even though we're combining
    multiple translations,
  • 14:55 - 14:57
    another good thing about Duolingo is that
  • 14:57 - 15:00
    the site actually
    can translate pretty fast.
  • 15:00 - 15:04
    So let me show you an estimate
    of how fast we could translate.
  • 15:04 - 15:07
    If we wanted to translate Wikipedia
    from English into Spanish --
  • 15:07 - 15:09
    of course, Wikipedia exists in Spanish
    but is much smaller
  • 15:09 - 15:12
    than its English counterpart,
    about 20 percent of it --
  • 15:12 - 15:16
    If we wanted to translate Wikipedia
    from English into Spanish using Duolingo
  • 15:16 - 15:21
    we could do it in five weeks
    with 100,000 active users
  • 15:21 - 15:22
    learning English with Duolingo.
  • 15:23 - 15:26
    And we could do it in about 80 hours
    with a million active users.
  • 15:26 - 15:28
    Since all the projects that
    my group has worked on so far
  • 15:28 - 15:30
    have gotten millions of users,
  • 15:30 - 15:33
    we're hopeful that we'll be able
    to translate the Web for free.
  • 15:33 - 15:36
    We haven't yet launched Duolingo,
  • 15:36 - 15:39
    (Applause)
  • 15:45 - 15:47
    I'd like to leave you with --
  • 15:48 - 15:50
    we haven't yet launched Duolingo
    we plan to do so in 30 days.
  • 15:50 - 15:52
    If you visit Duolingo.com, you can sign up
  • 15:52 - 15:57
    to be part of our private beta
    in about 30 days.
  • 15:57 - 15:58
    Help us.
  • 15:58 - 15:59
    Thank you.
  • 15:59 - 16:00
    (Applause)
Title:
Utilizando el poder de millones de mentes humanas | Luis von Ahn | TEDxRíodelaPlata
Speaker:
Luis von Ahn
Video Language:
Spanish
Duration:
16:14

English subtitles

Revisions