Hello. Well, let me start
by asking you a question:
How many of you had to fill out
some sort of web form
where you've been asked to read
a distorted sequence of characters?
How many of you found it really annoying?
Okay, outstanding. So I invented that.
(Laughter)
(Applause)
That thing is called a CAPTCHA.
And it is there to make sure
the entity filling out the form
is actually a human and not
some sort of computer program
that was written to submit the form
millions and millions of times.
The reason it works is because humans
have no trouble reading
these squiggly characters,
whereas computer programs
simply can't do it as well yet.
For example, when you're buying tickets
online for attending a concert
the reason you have to type
these distorted characters is to prevent
scalpers from writing a program
that can buy millions of tickets,
two at a time.
CAPTCHAs are used all over the Internet.
And since they're used so often,
a lot of times the precise sequence
of random characters shown to the user
is not so fortunate.
So this is an example from Yahoo.
The random characters that happened
to be shown to the user were W, A, I, T
which spells a word.
But the best part is the message
that the Yahoo help desk
got about 20 minutes later.
["Help! I've been waiting
for over 20 minutes,
and nothing happens."]
(Laughter)
This of course, is not as bad
as this poor person.
[REBOOT]
(Laughter)
I can tell funny stories
about captchas for hours
but since I cannot do that
let me tell you about a project
that we did afterwards
which is sort of the next
evolution of CAPTCHA.
We call it reCAPTCHA,
which is something that
we started at the University,
and then we turned it
into a startup company.
And then Google acquired this company.
so, all what I'm going to say
for the next 5 minutes
is owned by Google.
So, please, do not spread the word.
So let me tell you
how this project started.
It turns out that about 200 million
CAPTCHAs are typed everyday.
When I first heard this,
I was quite proud of myself.
I thought, "look at the impact
that my research has had."
But then I started feeling bad.
They are not only obnoxious, but also
each time you type a CAPTCHA
essentially you waste
10 seconds of your time.
And if you multiply that
by 200 million you get that
humanity as a whole is wasting
about 500,000 hours every day
typing these annoying CAPTCHAs.
So then I started feeling bad.
And then I started thinking,
is there any way
we can use this effort for something
that is good for humanity?
While you're typing a CAPTCHA,
during those 10 seconds,
your brain is doing something amazing.
Your brain is doing something
that computers cannot yet do.
So can we get you to do some
useful work to mankind?
Putting it differently,
is there some humongous problem
that we cannot yet get
computers to solve,
yet we can split into tiny chunks
such that each time
somebody solves a CAPTCHA
they solve a little bit of this problem?
The answer to that is "yes,"
and this is what we're doing now.
So what you may not know is that
nowadays while you're typing a CAPTCHA,
not only are you authenticating
yourself as a human,
but in addition you're actually
helping us to digitize books.
So let me explain how this works.
So there's a lot of projects out there
trying to digitize the existing books.
Google is digitizing books.
Amazon, with the Kindle,
is digitizing books.
The way this works
is you start with an old book.
You've seen those things, right?
Like a book?
(Laughter)
So you start with a book,
and then you scan it.
Now scanning a book is like taking
a digital photograph of every page.
The next step in the process
is that the computer
needs to be able to decipher
all of the words in this image.
Now the problem is that for older books
that were written several years ago
the computer cannot recognize
a lot of the words
because the ink has faded
and the pages have turned yellow.
Thus the words look a bit different
and the computer cannot recognize them.
So, for books that were written
more than 50 years ago,
the computer cannot recognize
about 30 percent of the words.
So what we're doing now
is we're taking all of the words
that the computer cannot recognize
and we're getting people to read them
for us while they're typing
a CAPTCHA on the Internet.
So, the next time you type a CAPTCHA -
(Applause)
these words that you're typing
are actually words that are coming
from books that are being digitized
that the computer could not recognize.
And now the reason we have
two words nowadays instead of one
is because we need to verify
if the answer is correct.
Because one of the words is such
that the system knows what it was,
and the other is a word that
the system just got out of a book,
it didn't know what it was,
and it's presented to you.
We're going to ask you to type both words.
And we won't tell you which one's which.
And if you type the correct word
for the one for which the system
already knows the answer,
it assumes you are human,
and it also gets some confidence
that you typed the other word correctly.
And if we repeat this process
to like 10 different people
and all of them agree
on what the new word is,
we are very confident
that this new word
was accurately digitized.
So this is how the system works.
The good thing is that
it has been very successful.
We're digitizing about
100 million words a day,
which is the equivalent of
about two million books a year.
And this is all being done
one word at a time
by just people typing CAPTCHAs
on the Internet.
Now, since we're doing
so many words per day,
funny things can happen.
And this is especially true
because now we're giving people
two randomly chosen English words
next to each other.
So funny things can happen.
For example, we presented this word.
It's the word "Christians";
there's nothing wrong with it.
But if you present it along with
another randomly chosen word,
bad things can happen.
So we get this.
[bad Christians]
(Laughter)
It's quite funny.
But it's even worse,
because the particular website
where we showed this
actually happened to be called
The Embassy of the Kingdom of God.
(Laughter)
Oops!
Here's another really bad one.
American politician, JohnEdwards.com
[Damn liberal]
(Laughter)
So we keep on insulting people everyday.
Now, we're not just insulting people.
Quite often,
interesting things can happen.
So this actually has given rise
to an Internet meme
that millions of people
have participated in,
which is called CAPTCHA art.
Here's how it works.
Imagine you're using the
Internet and you see a CAPTCHA
that you think is somewhat peculiar,
like this CAPTCHA.
Then what you're supposed to do
is you take a screen shot of it.
Then of course,
you fill out the CAPTCHA
because you help us
digitize a book, please.
But then, first you take a screen shot,
and then you draw something
that is related to it, like this.
[invisible toaster]
(Laughter)
It's just an example of CAPTCHA art.
There are tens of thousands of these.
Some of them are interesting.
Some of them are very cute.
[clenched it!]
Some of them are funnier.
[stoned founders]
(Laughter)
This is my favorite number
of this whole project: 900 millions.
This is the number of distinct people
that have helped us digitize
at least one word
out of a book through reCAPTCHA.
A little over 10%
of the world's population,
has helped digitize human knowledge.
And it is numbers like these
that motivate my research agenda.
So the question that motivates
my research is the following:
If you look at humanity's
large-scale achievements,
these really big things that humanity
has gotten together
like building the pyramids of Egypt
or the Panama Canal
or putting a man on the Moon --
there is a curious fact about them,
and it is that they were all done
with about the same number of people.
They were all done with
about 100,000 people.
We can ask ourselves
why is that all of them used
about the same number of people.
The reason for that is because,
before the Internet,
coordinating more than
100,000 people was impossible.
But now with the Internet,
I've just shown you a project
where we've coordinated
900 million people.
So the question that
motivates my research is,
if we can put a man on the Moon
with 100,000 people,
what can we do with
100 million people?
Based on this question,
we've been working on a lot of projects.
I will not tell you
about all we have done.
But, let me tell you about
one that we are working on now.
We've been working on this
for about two years now.
And we're going to launch it
in about 30 days.
It's called Duolingo.
This project started asking
the following question:
How can we get 100 million people
translating the Web into
every major language for free?
So there's a lot of things
to say about this question.
First of all, translating the Web.
Right now it is partitioned
into multiple languages.
A large fraction of it is in English.
If you don't know any English,
you can't access it.
But large fractions are
in other languages,
and if you don't know the languages
you can't access them.
I would like to translate all
of the Web into every major language.
Now some of you may say,
why can't we use computers to translate?
Machine translation nowadays is starting
to translate some sentences.
Well the problem with that is that
it's not yet good enough,
and it probably won't be
for the next 20 to 30 years.
So let me show you an example of something
that was translated by a machine.
Actually it was a forum
about programming questions.
It was a programming question
translated from Japanese
into English and from then into Spanish,
though my translation is good.
The other one is bad. You'll see.
So I'll just let you read.
This person starts apologizing for
the machine translation.
Indeed, this was done with
the best translation program
from Japanese into English.
Remember, it's a question
about computer programming.
So here you are the preamble
to the question.
[At often, the goat-time install
a error is vomit.] (Laughter)
Then comes the first part of the question.
[How many times like the wind, a pole,
and the dragon?] (Laughter)
Then comes my favorite part
of the question.
[This insult to father's stones?]
(Laughter)
And then comes my favorite
part of the whole thing.
[Please apologize for your stupidity.
There are a many thank you.] (Laughter)
Okay, so computer translation
isn't yet good enough.
We need people to translate.
So what I want is to get
100 million people
translating the Web into
every major language for free.
I couldn't afford paying
100 million people for the job,
so I want them to do it for free.
Now if this is what you want to do,
you pretty quickly realize
you're going to run into
two pretty big obstacles,
needing to be hurdled.
The first one is a lack of bilinguals.
So I don't even know if there
exists 100 million people out there
using the Web who are bilingual
enough to help us translate.
That's a big problem.
The other problem is a lack of motivation.
How are we going to motivate people
to actually translate the Web for free?
After thinking about this for months,
we realized there's actually a way
to solve both these problems
with the same solution.
We realized that there's a way
to kill two birds with one stone.
And that is to transform
language translation
into something that millions
of people want to do,
and that also helps with
the problem of lack of bilinguals,
and that is language education.
It turns out there are millions of people
wanting to learn other languages.
Today there are over 1.2 billion people
learning a foreign language.
And it's not just because
they're being forced to do so in school.
For example, in the United States alone,
there are over
5 million people who have paid
over $500 for software
to learn a new language.
Many people want to learn a new language.
So what we've been working on
for the last two years
is a new website called Duolingo,
where the basic idea is
people learn a new language
for free, while simultaneously
translating the Web.
And so they're learning by doing.
So this is how it works.
The way this works is whenever
you're a just a beginner,
we give you very,
very simple sentences on the Web.
And if you don't know a word
we'll tell you what each word means
though you are asked
to "translate this sentence".
And it turns out that it really works.
Even though people know nothing
of the language
if we explain what each word means,
they'll be able to translate it.
As you see how other people translate
the same sentence,
you start learning the language.
And as you get more and more advanced,
we give you more and more
complex sentences to translate.
This is how
you are going to help us translate.
This is how the site works.
We're mostly done building it,
and now we're testing it.
When we started working on this
I didn't think it could work, really.
But it turns out that
it works, indeed. It's amazing.
First, people really
can learn a language with it.
In this case we are testing it with people
knowing English,
wanting to learn Spanish,
and vice versa.
So people really do learn a language.
And they learn it about as well
as the leading language learning software,
which is very good,
but perhaps more surprisingly,
the translations that we get from people
using the site are very good.
They are as accurate as those
of professional language translators.
Now of course,
we play a trick here and it is that
we combine the translations
of multiple beginners, several students,
and choose the best.
But it turns out
that that best translation
is as good as those of
professional language translators.
Now even though we're combining
multiple translations,
another good thing about Duolingo is that
the site actually
can translate pretty fast.
So let me show you an estimate
of how fast we could translate.
If we wanted to translate Wikipedia
from English into Spanish --
of course, Wikipedia exists in Spanish
but is much smaller
than its English counterpart,
about 20 percent of it --
If we wanted to translate Wikipedia
from English into Spanish using Duolingo
we could do it in five weeks
with 100,000 active users
learning English with Duolingo.
And we could do it in about 80 hours
with a million active users.
Since all the projects that
my group has worked on so far
have gotten millions of users,
we're hopeful that we'll be able
to translate the Web for free.
We haven't yet launched Duolingo,
(Applause)
I'd like to leave you with --
we haven't yet launched Duolingo
we plan to do so in 30 days.
If you visit Duolingo.com, you can sign up
to be part of our private beta
in about 30 days.
Help us.
Thank you.
(Applause)