1 00:00:00,000 --> 00:00:04,894 Tatoeba: A bridge between languages. 2 00:00:05,961 --> 00:00:11,279 What is Tatoeba? 3 00:00:11,387 --> 00:00:14,317 Tatoeba is a language dictionary. 4 00:00:14,434 --> 00:00:16,010 You can search words 5 00:00:16,010 --> 00:00:17,926 and get translations. 6 00:00:18,541 --> 00:00:22,570 But it's not exactly a typical dictionary. 7 00:00:23,277 --> 00:00:25,415 It's all about sentences, 8 00:00:25,415 --> 00:00:26,717 Not words. 9 00:00:26,717 --> 00:00:30,191 You can search sentences containing a certain word 10 00:00:30,191 --> 00:00:33,696 And get translations for these sentences. 11 00:00:34,327 --> 00:00:37,077 "Why sentences?" you may ask. 12 00:00:37,077 --> 00:00:40,642 Well, because, sentences are more interesting. 13 00:00:40,688 --> 00:00:43,345 Sentences bring context to the words. 14 00:00:43,345 --> 00:00:45,797 Sentences have personalities. 15 00:00:45,797 --> 00:00:48,538 They can be funny, smart, silly 16 00:00:48,538 --> 00:00:50,378 insightful, touching, 17 00:00:50,378 --> 00:00:51,763 hurtful. 18 00:00:51,886 --> 00:00:54,338 Sentences can teach us a lot, 19 00:00:54,338 --> 00:00:56,745 and a lot more than just words. 20 00:00:57,160 --> 00:00:59,628 So we love sentences. 21 00:01:00,074 --> 00:01:03,677 But, even more, we love languages. 22 00:01:03,677 --> 00:01:07,265 And what we really want is to have many sentences 23 00:01:07,265 --> 00:01:10,320 in many—and any—languages. 24 00:01:10,751 --> 00:01:14,218 This is why Tatoeba is multilingual. 25 00:01:14,880 --> 00:01:17,588 But not that kind of multilingual— 26 00:01:17,588 --> 00:01:19,618 not the kind where languages 27 00:01:19,618 --> 00:01:22,111 are being simply paired up together, 28 00:01:22,111 --> 00:01:24,637 and where some pairs are left behind. 29 00:01:25,067 --> 00:01:28,286 Tatoeba is really multilingual. 30 00:01:28,286 --> 00:01:31,726 All the languages are interconnected. 31 00:01:32,188 --> 00:01:36,788 If an Icelandic sentence has a translation in English, 32 00:01:36,788 --> 00:01:40,708 and the English sentence has a translation in Swahili, 33 00:01:40,708 --> 00:01:45,114 then indirectly, this will provide a Swahili translation 34 00:01:45,114 --> 00:01:47,452 for the Icelandic sentence. 35 00:01:47,883 --> 00:01:52,959 Languages that would have never found themselves together in a traditional system 36 00:01:52,959 --> 00:01:56,003 can be connected in Tatoeba. 37 00:01:56,003 --> 00:01:58,052 Awesome, right? 38 00:01:58,652 --> 00:02:01,717 But, where do we get the sentences? 39 00:02:01,717 --> 00:02:04,129 And how do we translate them? 40 00:02:04,129 --> 00:02:08,188 Obviously, this cannot be the work of one person. 41 00:02:08,726 --> 00:02:12,452 This is why Tatoeba is collaborative. 42 00:02:12,575 --> 00:02:15,240 Everyone is free to contribute. 43 00:02:15,240 --> 00:02:19,243 And everyone has the ability to contribute. 44 00:02:19,243 --> 00:02:22,148 It doesn't require you to be a polyglot. 45 00:02:22,148 --> 00:02:24,262 Everyone speaks a language. 46 00:02:24,262 --> 00:02:26,037 Everyone can feed the database 47 00:02:26,037 --> 00:02:28,704 to illustrate new vocabulary. 48 00:02:28,704 --> 00:02:32,748 Everyone can help ensure that sentences sound correct, 49 00:02:32,748 --> 00:02:35,082 and are correctly spelled. 50 00:02:35,082 --> 00:02:39,760 And actually, this project needs everyone. 51 00:02:39,760 --> 00:02:42,728 Languages are not carved in stone. 52 00:02:42,728 --> 00:02:45,766 Languages live through all of us. 53 00:02:45,766 --> 00:02:50,004 We want to capture all the uniqueness of each language. 54 00:02:50,004 --> 00:02:54,122 And we want to capture their evolution through time. 55 00:02:54,122 --> 00:02:56,044 But you know, it would be sad 56 00:02:56,044 --> 00:03:00,520 to collect all these sentences and keep them for ourselves. 57 00:03:00,520 --> 00:03:04,360 Because there's so much you can do with them. 58 00:03:04,360 --> 00:03:07,571 Which is why Tatoeba is open. 59 00:03:07,571 --> 00:03:09,160 Our source code is open, 60 00:03:09,160 --> 00:03:11,983 Our data is open. 61 00:03:11,983 --> 00:03:13,972 We're releasing all the sentences we collect 62 00:03:13,972 --> 00:03:17,775 under the Creative Commons Attribution license. 63 00:03:18,006 --> 00:03:22,281 This means you can reuse them freely for a textbook, 64 00:03:22,281 --> 00:03:23,994 for an application, 65 00:03:23,994 --> 00:03:26,252 for a research project, 66 00:03:26,252 --> 00:03:29,083 for anything! 67 00:03:29,452 --> 00:03:31,917 So that's Tatoeba, 68 00:03:31,917 --> 00:03:35,019 But that's not the whole picture. 69 00:03:35,342 --> 00:03:38,923 Tatoeba is not just an open, collaborative, 70 00:03:38,923 --> 00:03:42,373 multilingual dictionary of sentences. 71 00:03:42,819 --> 00:03:46,382 It's part of an ecosystem that we want to build. 72 00:03:46,382 --> 00:03:49,951 We want to bring language tools to the next level. 73 00:03:49,951 --> 00:03:54,153 We want to see innovation in the language learning landscape. 74 00:03:54,153 --> 00:03:58,671 And this cannot happen without open language resources 75 00:03:58,671 --> 00:04:02,138 which cannot be built without a community, 76 00:04:02,138 --> 00:04:06,231 which cannot contribute without efficient platforms. 77 00:04:06,877 --> 00:04:09,841 So ultimately, with Tatoeba, 78 00:04:09,841 --> 00:04:12,960 we are only building the foundations 79 00:04:12,960 --> 00:04:14,444 to make the Web 80 00:04:14,444 --> 00:04:23,298 a better place for language learning.