WEBVTT 00:00:00.000 --> 00:00:04.894 Tatoeba: A bridge between languages. 00:00:05.961 --> 00:00:11.279 What is Tatoeba? 00:00:11.387 --> 00:00:14.317 Tatoeba is a language dictionary. 00:00:14.434 --> 00:00:16.010 You can search words 00:00:16.010 --> 00:00:17.926 and get translations. 00:00:18.541 --> 00:00:22.570 But it's not exactly a typical dictionary. 00:00:23.277 --> 00:00:25.415 It's all about sentences, 00:00:25.415 --> 00:00:26.717 Not words. 00:00:26.717 --> 00:00:30.191 You can search sentences containing a certain word 00:00:30.191 --> 00:00:33.696 And get translations for these sentences. 00:00:34.327 --> 00:00:37.077 "Why sentences?" you may ask. 00:00:37.077 --> 00:00:40.642 Well, because, sentences are more interesting. 00:00:40.688 --> 00:00:43.345 Sentences bring context to the words. 00:00:43.345 --> 00:00:45.797 Sentences have personalities. 00:00:45.797 --> 00:00:48.538 They can be funny, smart, silly 00:00:48.538 --> 00:00:50.378 insightful, touching, 00:00:50.378 --> 00:00:51.763 hurtful. 00:00:51.886 --> 00:00:54.338 Sentences can teach us a lot, 00:00:54.338 --> 00:00:56.745 and a lot more than just words. 00:00:57.160 --> 00:00:59.628 So we love sentences. 00:01:00.074 --> 00:01:03.677 But, even more, we love languages. 00:01:03.677 --> 00:01:07.265 And what we really want is to have many sentences 00:01:07.265 --> 00:01:10.320 in many—and any—languages. 00:01:10.751 --> 00:01:14.218 This is why Tatoeba is multilingual. 00:01:14.880 --> 00:01:17.588 But not that kind of multilingual— 00:01:17.588 --> 00:01:19.618 not the kind where languages 00:01:19.618 --> 00:01:22.111 are being simply paired up together, 00:01:22.111 --> 00:01:24.637 and where some pairs are left behind. 00:01:25.067 --> 00:01:28.286 Tatoeba is really multilingual. 00:01:28.286 --> 00:01:31.726 All the languages are interconnected. 00:01:32.188 --> 00:01:36.788 If an Icelandic sentence has a translation in English, 00:01:36.788 --> 00:01:40.708 and the English sentence has a translation in Swahili, 00:01:40.708 --> 00:01:45.114 then indirectly, this will provide a Swahili translation 00:01:45.114 --> 00:01:47.452 for the Icelandic sentence. 00:01:47.883 --> 00:01:52.959 Languages that would have never found themselves together in a traditional system 00:01:52.959 --> 00:01:56.003 can be connected in Tatoeba. 00:01:56.003 --> 00:01:58.052 Awesome, right? 00:01:58.652 --> 00:02:01.717 But, where do we get the sentences? 00:02:01.717 --> 00:02:04.129 And how do we translate them? 00:02:04.129 --> 00:02:08.188 Obviously, this cannot be the work of one person. 00:02:08.726 --> 00:02:12.452 This is why Tatoeba is collaborative. 00:02:12.575 --> 00:02:15.240 Everyone is free to contribute. 00:02:15.240 --> 00:02:19.243 And everyone has the ability to contribute. 00:02:19.243 --> 00:02:22.148 It doesn't require you to be a polyglot. 00:02:22.148 --> 00:02:24.262 Everyone speaks a language. 00:02:24.262 --> 00:02:26.037 Everyone can feed the database 00:02:26.037 --> 00:02:28.704 to illustrate new vocabulary. 00:02:28.704 --> 00:02:32.748 Everyone can help ensure that sentences sound correct, 00:02:32.748 --> 00:02:35.082 and are correctly spelled. 00:02:35.082 --> 00:02:39.760 And actually, this project needs everyone. 00:02:39.760 --> 00:02:42.728 Languages are not carved in stone. 00:02:42.728 --> 00:02:45.766 Languages live through all of us. 00:02:45.766 --> 00:02:50.004 We want to capture all the uniqueness of each language. 00:02:50.004 --> 00:02:54.122 And we want to capture their evolution through time. 00:02:54.122 --> 00:02:56.044 But you know, it would be sad 00:02:56.044 --> 00:03:00.520 to collect all these sentences and keep them for ourselves. 00:03:00.520 --> 00:03:04.360 Because there's so much you can do with them. 00:03:04.360 --> 00:03:07.571 Which is why Tatoeba is open. 00:03:07.571 --> 00:03:09.160 Our source code is open, 00:03:09.160 --> 00:03:11.983 Our data is open. 00:03:11.983 --> 00:03:13.972 We're releasing all the sentences we collect 00:03:13.972 --> 00:03:17.775 under the Creative Commons Attribution license. 00:03:18.006 --> 00:03:22.281 This means you can reuse them freely for a textbook, 00:03:22.281 --> 00:03:23.994 for an application, 00:03:23.994 --> 00:03:26.252 for a research project, 00:03:26.252 --> 00:03:29.083 for anything! 00:03:29.452 --> 00:03:31.917 So that's Tatoeba, 00:03:31.917 --> 00:03:35.019 But that's not the whole picture. 00:03:35.342 --> 00:03:38.923 Tatoeba is not just an open, collaborative, 00:03:38.923 --> 00:03:42.373 multilingual dictionary of sentences. 00:03:42.819 --> 00:03:46.382 It's part of an ecosystem that we want to build. 00:03:46.382 --> 00:03:49.951 We want to bring language tools to the next level. 00:03:49.951 --> 00:03:54.153 We want to see innovation in the language learning landscape. 00:03:54.153 --> 00:03:58.671 And this cannot happen without open language resources 00:03:58.671 --> 00:04:02.138 which cannot be built without a community, 00:04:02.138 --> 00:04:06.231 which cannot contribute without efficient platforms. 00:04:06.877 --> 00:04:09.841 So ultimately, with Tatoeba, 00:04:09.841 --> 00:04:12.960 we are only building the foundations 00:04:12.960 --> 00:04:14.444 to make the Web 00:04:14.444 --> 00:04:23.298 a better place for language learning.