Utilizando el poder de millones de mentes humanas | Luis von Ahn | TEDxRíodelaPlata
-
0:02 - 0:05Hello. Well, let me start
by asking you a question: -
0:06 - 0:09How many of you had to fill out
some sort of web form -
0:09 - 0:13where you've been asked to read
a distorted sequence of characters? -
0:16 - 0:18How many of you found it really annoying?
-
0:19 - 0:21Okay, outstanding. So I invented that.
-
0:21 - 0:22(Laughter)
-
0:22 - 0:26(Applause)
-
0:30 - 0:32That thing is called a CAPTCHA.
-
0:32 - 0:36And it is there to make sure
the entity filling out the form -
0:36 - 0:39is actually a human and not
some sort of computer program -
0:39 - 0:43that was written to submit the form
millions and millions of times. -
0:43 - 0:45The reason it works is because humans
-
0:45 - 0:47have no trouble reading
these squiggly characters, -
0:47 - 0:50whereas computer programs
simply can't do it as well yet. -
0:50 - 0:54For example, when you're buying tickets
online for attending a concert -
0:54 - 0:57the reason you have to type
-
0:58 - 1:02these distorted characters is to prevent
-
1:02 - 1:04scalpers from writing a program
-
1:04 - 1:07that can buy millions of tickets,
two at a time. -
1:07 - 1:09CAPTCHAs are used all over the Internet.
-
1:09 - 1:11And since they're used so often,
-
1:11 - 1:16a lot of times the precise sequence
of random characters shown to the user -
1:16 - 1:17is not so fortunate.
-
1:17 - 1:21So this is an example from Yahoo.
-
1:21 - 1:23The random characters that happened
-
1:23 - 1:27to be shown to the user were W, A, I, T
-
1:27 - 1:30which spells a word.
-
1:30 - 1:33But the best part is the message
that the Yahoo help desk -
1:33 - 1:36got about 20 minutes later.
-
1:36 - 1:38["Help! I've been waiting
for over 20 minutes, -
1:38 - 1:42and nothing happens."]
(Laughter) -
1:42 - 1:46This of course, is not as bad
as this poor person. -
1:46 - 1:49[REBOOT]
(Laughter) -
1:49 - 1:52I can tell funny stories
about captchas for hours -
1:52 - 1:53but since I cannot do that
-
1:53 - 1:56let me tell you about a project
that we did afterwards -
1:56 - 1:58which is sort of the next
evolution of CAPTCHA. -
1:58 - 2:00We call it reCAPTCHA,
-
2:00 - 2:03which is something that
we started at the University, -
2:03 - 2:05and then we turned it
into a startup company. -
2:05 - 2:07And then Google acquired this company.
-
2:07 - 2:10so, all what I'm going to say
for the next 5 minutes -
2:10 - 2:12is owned by Google.
-
2:12 - 2:15So, please, do not spread the word.
-
2:15 - 2:19So let me tell you
how this project started. -
2:19 - 2:23It turns out that about 200 million
CAPTCHAs are typed everyday. -
2:23 - 2:26When I first heard this,
I was quite proud of myself. -
2:26 - 2:29I thought, "look at the impact
that my research has had." -
2:29 - 2:31But then I started feeling bad.
-
2:31 - 2:33They are not only obnoxious, but also
-
2:33 - 2:37each time you type a CAPTCHA
-
2:37 - 2:39essentially you waste
10 seconds of your time. -
2:39 - 2:43And if you multiply that
by 200 million you get that -
2:43 - 2:47humanity as a whole is wasting
about 500,000 hours every day -
2:47 - 2:49typing these annoying CAPTCHAs.
-
2:49 - 2:51So then I started feeling bad.
-
2:51 - 2:54And then I started thinking,
is there any way -
2:54 - 2:58we can use this effort for something
that is good for humanity? -
2:58 - 3:04While you're typing a CAPTCHA,
during those 10 seconds, -
3:04 - 3:06your brain is doing something amazing.
-
3:06 - 3:10Your brain is doing something
that computers cannot yet do. -
3:10 - 3:11So can we get you to do some
-
3:11 - 3:13useful work to mankind?
-
3:13 - 3:14Putting it differently,
-
3:14 - 3:17is there some humongous problem
that we cannot yet get -
3:17 - 3:19computers to solve,
-
3:19 - 3:21yet we can split into tiny chunks
-
3:21 - 3:23such that each time
somebody solves a CAPTCHA -
3:23 - 3:25they solve a little bit of this problem?
-
3:25 - 3:28The answer to that is "yes,"
and this is what we're doing now. -
3:28 - 3:33So what you may not know is that
nowadays while you're typing a CAPTCHA, -
3:33 - 3:36not only are you authenticating
yourself as a human, -
3:36 - 3:39but in addition you're actually
helping us to digitize books. -
3:39 - 3:41So let me explain how this works.
-
3:41 - 3:44So there's a lot of projects out there
trying to digitize the existing books. -
3:44 - 3:46Google is digitizing books.
-
3:46 - 3:48Amazon, with the Kindle,
is digitizing books. -
3:48 - 3:51The way this works
is you start with an old book. -
3:51 - 3:53You've seen those things, right?
-
3:53 - 3:54Like a book?
-
3:54 - 3:56(Laughter)
-
3:56 - 3:58So you start with a book,
and then you scan it. -
3:58 - 4:04Now scanning a book is like taking
a digital photograph of every page. -
4:08 - 4:10The next step in the process
is that the computer -
4:10 - 4:15needs to be able to decipher
all of the words in this image. -
4:15 - 4:19Now the problem is that for older books
that were written several years ago -
4:19 - 4:21the computer cannot recognize
a lot of the words -
4:21 - 4:25because the ink has faded
and the pages have turned yellow. -
4:25 - 4:27Thus the words look a bit different
-
4:27 - 4:30and the computer cannot recognize them.
-
4:30 - 4:32So, for books that were written
more than 50 years ago, -
4:32 - 4:36the computer cannot recognize
about 30 percent of the words. -
4:36 - 4:37So what we're doing now
-
4:37 - 4:40is we're taking all of the words
that the computer cannot recognize -
4:40 - 4:44and we're getting people to read them
for us while they're typing -
4:44 - 4:45a CAPTCHA on the Internet.
-
4:45 - 4:48So, the next time you type a CAPTCHA -
-
4:48 - 4:54(Applause)
-
4:54 - 4:58these words that you're typing
-
4:58 - 5:01are actually words that are coming
from books that are being digitized -
5:01 - 5:03that the computer could not recognize.
-
5:03 - 5:07And now the reason we have
two words nowadays instead of one -
5:07 - 5:12is because we need to verify
if the answer is correct. -
5:14 - 5:18Because one of the words is such
that the system knows what it was, -
5:18 - 5:21and the other is a word that
the system just got out of a book, -
5:21 - 5:24it didn't know what it was,
and it's presented to you. -
5:24 - 5:27We're going to ask you to type both words.
-
5:27 - 5:29And we won't tell you which one's which.
-
5:29 - 5:31And if you type the correct word
-
5:31 - 5:33for the one for which the system
already knows the answer, -
5:34 - 5:35it assumes you are human,
-
5:35 - 5:40and it also gets some confidence
that you typed the other word correctly. -
5:40 - 5:43And if we repeat this process
to like 10 different people -
5:43 - 5:46and all of them agree
on what the new word is, -
5:46 - 5:48we are very confident
that this new word -
5:48 - 5:50was accurately digitized.
-
5:50 - 5:52So this is how the system works.
-
5:52 - 5:54The good thing is that
it has been very successful. -
5:54 - 5:58We're digitizing about
100 million words a day, -
5:58 - 6:02which is the equivalent of
about two million books a year. -
6:02 - 6:04And this is all being done
one word at a time -
6:04 - 6:07by just people typing CAPTCHAs
on the Internet. -
6:07 - 6:11Now, since we're doing
so many words per day, -
6:11 - 6:15funny things can happen.
-
6:15 - 6:18And this is especially true
because now we're giving people -
6:18 - 6:22two randomly chosen English words
next to each other. -
6:22 - 6:24So funny things can happen.
-
6:24 - 6:26For example, we presented this word.
-
6:26 - 6:30It's the word "Christians";
there's nothing wrong with it. -
6:30 - 6:34But if you present it along with
another randomly chosen word, -
6:34 - 6:35bad things can happen.
-
6:35 - 6:38So we get this.
[bad Christians] -
6:38 - 6:39(Laughter)
-
6:39 - 6:41It's quite funny.
-
6:41 - 6:43But it's even worse,
because the particular website -
6:43 - 6:45where we showed this
-
6:45 - 6:50actually happened to be called
The Embassy of the Kingdom of God. -
6:50 - 6:52(Laughter)
-
6:52 - 6:53Oops!
-
6:53 - 6:55Here's another really bad one.
-
6:55 - 6:59American politician, JohnEdwards.com
[Damn liberal] -
6:59 - 7:04(Laughter)
-
7:04 - 7:07So we keep on insulting people everyday.
-
7:07 - 7:09Now, we're not just insulting people.
-
7:09 - 7:12Quite often,
interesting things can happen. -
7:12 - 7:15So this actually has given rise
to an Internet meme -
7:15 - 7:17that millions of people
have participated in, -
7:17 - 7:20which is called CAPTCHA art.
-
7:20 - 7:22Here's how it works.
-
7:22 - 7:26Imagine you're using the
Internet and you see a CAPTCHA -
7:26 - 7:28that you think is somewhat peculiar,
-
7:28 - 7:30like this CAPTCHA.
-
7:30 - 7:33Then what you're supposed to do
is you take a screen shot of it. -
7:34 - 7:35Then of course,
you fill out the CAPTCHA -
7:35 - 7:38because you help us
digitize a book, please. -
7:38 - 7:40But then, first you take a screen shot,
-
7:40 - 7:44and then you draw something
that is related to it, like this. -
7:44 - 7:46[invisible toaster]
-
7:46 - 7:47(Laughter)
-
7:47 - 7:51It's just an example of CAPTCHA art.
-
7:51 - 7:54There are tens of thousands of these.
Some of them are interesting. -
7:54 - 7:58Some of them are very cute.
[clenched it!] -
7:58 - 8:01Some of them are funnier.
-
8:01 - 8:08[stoned founders]
(Laughter) -
8:08 - 8:12This is my favorite number
of this whole project: 900 millions. -
8:12 - 8:15This is the number of distinct people
-
8:15 - 8:17that have helped us digitize
at least one word -
8:17 - 8:19out of a book through reCAPTCHA.
-
8:19 - 8:21A little over 10%
of the world's population, -
8:21 - 8:23has helped digitize human knowledge.
-
8:23 - 8:26And it is numbers like these
that motivate my research agenda. -
8:26 - 8:30So the question that motivates
my research is the following: -
8:30 - 8:33If you look at humanity's
large-scale achievements, -
8:33 - 8:35these really big things that humanity
has gotten together -
8:35 - 8:40like building the pyramids of Egypt
or the Panama Canal -
8:40 - 8:42or putting a man on the Moon --
-
8:42 - 8:45there is a curious fact about them,
-
8:45 - 8:48and it is that they were all done
with about the same number of people. -
8:48 - 8:51They were all done with
about 100,000 people. -
8:51 - 8:53We can ask ourselves
why is that all of them used -
8:53 - 8:55about the same number of people.
-
8:55 - 8:57The reason for that is because,
before the Internet, -
8:57 - 9:01coordinating more than
100,000 people was impossible. -
9:01 - 9:04But now with the Internet,
I've just shown you a project -
9:04 - 9:06where we've coordinated
900 million people. -
9:06 - 9:09So the question that
motivates my research is, -
9:09 - 9:11if we can put a man on the Moon
with 100,000 people, -
9:11 - 9:14what can we do with
100 million people? -
9:14 - 9:17Based on this question,
we've been working on a lot of projects. -
9:17 - 9:19I will not tell you
about all we have done. -
9:19 - 9:22But, let me tell you about
one that we are working on now. -
9:22 - 9:25We've been working on this
for about two years now. -
9:26 - 9:30And we're going to launch it
in about 30 days. -
9:30 - 9:33It's called Duolingo.
-
9:33 - 9:35This project started asking
the following question: -
9:36 - 9:40How can we get 100 million people
-
9:40 - 9:45translating the Web into
every major language for free? -
9:45 - 9:48So there's a lot of things
to say about this question. -
9:48 - 9:49First of all, translating the Web.
-
9:49 - 9:52Right now it is partitioned
into multiple languages. -
9:52 - 9:54A large fraction of it is in English.
-
9:54 - 9:57If you don't know any English,
you can't access it. -
9:57 - 9:59But large fractions are
in other languages, -
9:59 - 10:01and if you don't know the languages
you can't access them. -
10:01 - 10:05I would like to translate all
of the Web into every major language. -
10:05 - 10:08Now some of you may say,
-
10:08 - 10:11why can't we use computers to translate?
-
10:11 - 10:15Machine translation nowadays is starting
to translate some sentences. -
10:15 - 10:17Well the problem with that is that
-
10:17 - 10:19it's not yet good enough,
-
10:19 - 10:23and it probably won't be
for the next 20 to 30 years. -
10:23 - 10:27So let me show you an example of something
-
10:27 - 10:28that was translated by a machine.
-
10:28 - 10:33Actually it was a forum
about programming questions. -
10:38 - 10:40It was a programming question
translated from Japanese -
10:40 - 10:45into English and from then into Spanish,
though my translation is good. -
10:45 - 10:47The other one is bad. You'll see.
-
10:47 - 10:50So I'll just let you read.
-
10:50 - 10:53This person starts apologizing for
the machine translation. -
10:53 - 10:56Indeed, this was done with
the best translation program -
10:56 - 10:57from Japanese into English.
-
10:57 - 11:03Remember, it's a question
about computer programming. -
11:03 - 11:06So here you are the preamble
to the question. -
11:06 - 11:12[At often, the goat-time install
a error is vomit.] (Laughter) -
11:12 - 11:14Then comes the first part of the question.
-
11:14 - 11:20[How many times like the wind, a pole,
and the dragon?] (Laughter) -
11:20 - 11:22Then comes my favorite part
of the question. -
11:22 - 11:26[This insult to father's stones?]
(Laughter) -
11:26 - 11:28And then comes my favorite
part of the whole thing. -
11:28 - 11:32[Please apologize for your stupidity.
There are a many thank you.] (Laughter) -
11:32 - 11:35Okay, so computer translation
isn't yet good enough. -
11:35 - 11:37We need people to translate.
-
11:37 - 11:39So what I want is to get
100 million people -
11:39 - 11:43translating the Web into
every major language for free. -
11:43 - 11:46I couldn't afford paying
100 million people for the job, -
11:46 - 11:47so I want them to do it for free.
-
11:47 - 11:49Now if this is what you want to do,
-
11:49 - 11:51you pretty quickly realize
you're going to run into -
11:52 - 11:56two pretty big obstacles,
needing to be hurdled. -
11:56 - 12:00The first one is a lack of bilinguals.
-
12:00 - 12:03So I don't even know if there
exists 100 million people out there -
12:03 - 12:07using the Web who are bilingual
enough to help us translate. -
12:07 - 12:08That's a big problem.
-
12:08 - 12:10The other problem is a lack of motivation.
-
12:10 - 12:14How are we going to motivate people
to actually translate the Web for free? -
12:14 - 12:19After thinking about this for months,
-
12:19 - 12:21we realized there's actually a way
-
12:21 - 12:24to solve both these problems
with the same solution. -
12:24 - 12:27We realized that there's a way
to kill two birds with one stone. -
12:27 - 12:31And that is to transform
language translation -
12:31 - 12:34into something that millions
of people want to do, -
12:34 - 12:38and that also helps with
the problem of lack of bilinguals, -
12:38 - 12:40and that is language education.
-
12:40 - 12:44It turns out there are millions of people
wanting to learn other languages. -
12:44 - 12:49Today there are over 1.2 billion people
learning a foreign language. -
12:49 - 12:52And it's not just because
they're being forced to do so in school. -
12:52 - 12:55For example, in the United States alone,
there are over -
12:55 - 12:595 million people who have paid
over $500 for software -
12:59 - 13:00to learn a new language.
-
13:00 - 13:03Many people want to learn a new language.
-
13:03 - 13:07So what we've been working on
for the last two years -
13:07 - 13:09is a new website called Duolingo,
-
13:09 - 13:12where the basic idea is
people learn a new language -
13:12 - 13:17for free, while simultaneously
translating the Web. -
13:18 - 13:20And so they're learning by doing.
-
13:20 - 13:22So this is how it works.
-
13:22 - 13:26The way this works is whenever
you're a just a beginner, -
13:26 - 13:28we give you very,
very simple sentences on the Web. -
13:28 - 13:32And if you don't know a word
we'll tell you what each word means -
13:32 - 13:34though you are asked
to "translate this sentence". -
13:34 - 13:36And it turns out that it really works.
-
13:36 - 13:38Even though people know nothing
of the language -
13:38 - 13:42if we explain what each word means,
they'll be able to translate it. -
13:42 - 13:43As you see how other people translate
-
13:43 - 13:46the same sentence,
you start learning the language. -
13:46 - 13:48And as you get more and more advanced,
-
13:48 - 13:51we give you more and more
complex sentences to translate. -
13:51 - 13:53This is how
you are going to help us translate. -
13:53 - 13:55This is how the site works.
-
13:55 - 13:57We're mostly done building it,
-
13:57 - 13:59and now we're testing it.
-
13:59 - 14:01When we started working on this
-
14:01 - 14:04I didn't think it could work, really.
-
14:04 - 14:06But it turns out that
it works, indeed. It's amazing. -
14:06 - 14:09First, people really
can learn a language with it. -
14:09 - 14:12In this case we are testing it with people
-
14:12 - 14:15knowing English,
wanting to learn Spanish, -
14:15 - 14:16and vice versa.
-
14:16 - 14:18So people really do learn a language.
-
14:18 - 14:20And they learn it about as well
-
14:20 - 14:22as the leading language learning software,
-
14:22 - 14:24which is very good,
but perhaps more surprisingly, -
14:25 - 14:29the translations that we get from people
using the site are very good. -
14:29 - 14:31They are as accurate as those
-
14:31 - 14:33of professional language translators.
-
14:33 - 14:36Now of course,
we play a trick here and it is that -
14:36 - 14:41we combine the translations
of multiple beginners, several students, -
14:41 - 14:42and choose the best.
-
14:43 - 14:45But it turns out
that that best translation -
14:45 - 14:48is as good as those of
professional language translators. -
14:48 - 14:54Now even though we're combining
multiple translations, -
14:55 - 14:57another good thing about Duolingo is that
-
14:57 - 15:00the site actually
can translate pretty fast. -
15:00 - 15:04So let me show you an estimate
of how fast we could translate. -
15:04 - 15:07If we wanted to translate Wikipedia
from English into Spanish -- -
15:07 - 15:09of course, Wikipedia exists in Spanish
but is much smaller -
15:09 - 15:12than its English counterpart,
about 20 percent of it -- -
15:12 - 15:16If we wanted to translate Wikipedia
from English into Spanish using Duolingo -
15:16 - 15:21we could do it in five weeks
with 100,000 active users -
15:21 - 15:22learning English with Duolingo.
-
15:23 - 15:26And we could do it in about 80 hours
with a million active users. -
15:26 - 15:28Since all the projects that
my group has worked on so far -
15:28 - 15:30have gotten millions of users,
-
15:30 - 15:33we're hopeful that we'll be able
to translate the Web for free. -
15:33 - 15:36We haven't yet launched Duolingo,
-
15:36 - 15:39(Applause)
-
15:45 - 15:47I'd like to leave you with --
-
15:48 - 15:50we haven't yet launched Duolingo
we plan to do so in 30 days. -
15:50 - 15:52If you visit Duolingo.com, you can sign up
-
15:52 - 15:57to be part of our private beta
in about 30 days. -
15:57 - 15:58Help us.
-
15:58 - 15:59Thank you.
-
15:59 - 16:00(Applause)
- Title:
- Utilizando el poder de millones de mentes humanas | Luis von Ahn | TEDxRíodelaPlata
- Speaker:
- Luis von Ahn
- Video Language:
- Spanish
- Duration:
- 16:14
Show all