WEBVTT

00:00:00.000 --> 00:00:04.894
Tatoeba: A bridge between languages.

00:00:05.961 --> 00:00:11.279
What is Tatoeba?

00:00:11.387 --> 00:00:14.317
Tatoeba is a language dictionary.

00:00:14.434 --> 00:00:16.010
You can search words

00:00:16.010 --> 00:00:17.926
and get translations.

00:00:18.541 --> 00:00:22.570
But it's not exactly a typical dictionary.

00:00:23.277 --> 00:00:25.415
It's all about sentences,

00:00:25.415 --> 00:00:26.717
Not words.

00:00:26.717 --> 00:00:30.191
You can search sentences containing a certain word

00:00:30.191 --> 00:00:33.696
And get translations for these sentences.

00:00:34.327 --> 00:00:37.077
"Why sentences?" you may ask.

00:00:37.077 --> 00:00:40.642
Well, because, sentences are more interesting.

00:00:40.688 --> 00:00:43.345
Sentences bring context to the words.

00:00:43.345 --> 00:00:45.797
Sentences have personalities.

00:00:45.797 --> 00:00:48.538
They can be funny, smart, silly

00:00:48.538 --> 00:00:50.378
insightful, touching,

00:00:50.378 --> 00:00:51.763
hurtful.

00:00:51.886 --> 00:00:54.338
Sentences can teach us a lot,

00:00:54.338 --> 00:00:56.745
and a lot more than just words.

00:00:57.160 --> 00:00:59.628
So we love sentences.

00:01:00.074 --> 00:01:03.677
But, even more, we love languages.

00:01:03.677 --> 00:01:07.265
And what we really want is to have many sentences

00:01:07.265 --> 00:01:10.320
in many—and any—languages.

00:01:10.751 --> 00:01:14.218
This is why Tatoeba is multilingual.

00:01:14.880 --> 00:01:17.588
But not that kind of multilingual—

00:01:17.588 --> 00:01:19.618
not the kind where languages

00:01:19.618 --> 00:01:22.111
are being simply paired up together,

00:01:22.111 --> 00:01:24.637
and where some pairs are left behind.

00:01:25.067 --> 00:01:28.286
Tatoeba is really multilingual.

00:01:28.286 --> 00:01:31.726
All the languages are interconnected.

00:01:32.188 --> 00:01:36.788
If an Icelandic sentence has a translation in English,

00:01:36.788 --> 00:01:40.708
and the English sentence has a translation in Swahili,

00:01:40.708 --> 00:01:45.114
then indirectly, this will provide a Swahili translation

00:01:45.114 --> 00:01:47.452
for the Icelandic sentence.

00:01:47.883 --> 00:01:52.959
Languages that would have never found themselves together in a traditional system

00:01:52.959 --> 00:01:56.003
can be connected in Tatoeba.

00:01:56.003 --> 00:01:58.052
Awesome, right?

00:01:58.652 --> 00:02:01.717
But, where do we get the sentences?

00:02:01.717 --> 00:02:04.129
And how do we translate them?

00:02:04.129 --> 00:02:08.188
Obviously, this cannot be the work of one person.

00:02:08.726 --> 00:02:12.452
This is why Tatoeba is collaborative.

00:02:12.575 --> 00:02:15.240
Everyone is free to contribute.

00:02:15.240 --> 00:02:19.243
And everyone has the ability to contribute.

00:02:19.243 --> 00:02:22.148
It doesn't require you to be a polyglot.

00:02:22.148 --> 00:02:24.262
Everyone speaks a language.

00:02:24.262 --> 00:02:26.037
Everyone can feed the database

00:02:26.037 --> 00:02:28.704
to illustrate new vocabulary.

00:02:28.704 --> 00:02:32.748
Everyone can help ensure that sentences sound correct,

00:02:32.748 --> 00:02:35.082
and are correctly spelled.

00:02:35.082 --> 00:02:39.760
And actually, this project needs everyone.

00:02:39.760 --> 00:02:42.728
Languages are not carved in stone.

00:02:42.728 --> 00:02:45.766
Languages live through all of us.

00:02:45.766 --> 00:02:50.004
We want to capture all the uniqueness of each language.

00:02:50.004 --> 00:02:54.122
And we want to capture their evolution through time.

00:02:54.122 --> 00:02:56.044
But you know, it would be sad

00:02:56.044 --> 00:03:00.520
to collect all these sentences and keep them for ourselves.

00:03:00.520 --> 00:03:04.360
Because there's so much you can do with them.

00:03:04.360 --> 00:03:07.571
Which is why Tatoeba is open.

00:03:07.571 --> 00:03:09.160
Our source code is open,

00:03:09.160 --> 00:03:11.983
Our data is open.

00:03:11.983 --> 00:03:13.972
We're releasing all the sentences we collect

00:03:13.972 --> 00:03:17.775
under the Creative Commons Attribution license.

00:03:18.006 --> 00:03:22.281
This means you can reuse them freely for a textbook,

00:03:22.281 --> 00:03:23.994
for an application,

00:03:23.994 --> 00:03:26.252
for a research project,

00:03:26.252 --> 00:03:29.083
for anything!

00:03:29.452 --> 00:03:31.917
So that's Tatoeba,

00:03:31.917 --> 00:03:35.019
But that's not the whole picture.

00:03:35.342 --> 00:03:38.923
Tatoeba is not just an open, collaborative,

00:03:38.923 --> 00:03:42.373
multilingual dictionary of sentences.

00:03:42.819 --> 00:03:46.382
It's part of an ecosystem that we want to build.

00:03:46.382 --> 00:03:49.951
We want to bring language tools to the next level.

00:03:49.951 --> 00:03:54.153
We want to see innovation in the language learning landscape.

00:03:54.153 --> 00:03:58.671
And this cannot happen without open language resources

00:03:58.671 --> 00:04:02.138
which cannot be built without a community,

00:04:02.138 --> 00:04:06.231
which cannot contribute without efficient platforms.

00:04:06.877 --> 00:04:09.841
So ultimately, with Tatoeba,

00:04:09.841 --> 00:04:12.960
we are only building the foundations

00:04:12.960 --> 00:04:14.444
to make the Web

00:04:14.444 --> 00:04:23.298
a better place for language learning.