1 00:00:06,677 --> 00:00:11,306 How is it that so many intergalactic species in movies and TV 2 00:00:11,306 --> 00:00:14,483 just happen to speak perfect English? 3 00:00:14,483 --> 00:00:17,886 The short answer is that no one wants to watch a starship crew 4 00:00:17,886 --> 00:00:21,774 spend years compiling an alien dictionary. 5 00:00:21,774 --> 00:00:23,392 But to keep things consistent, 6 00:00:23,392 --> 00:00:26,789 the creators of Star Trek and other science-fiction worlds 7 00:00:26,789 --> 00:00:30,514 have introduced the concept of a universal translator, 8 00:00:30,514 --> 00:00:35,012 a portable device that can instantly translate between any languages. 9 00:00:35,012 --> 00:00:38,539 So is a universal translator possible in real life? 10 00:00:38,539 --> 00:00:42,137 We already have many programs that claim to do just that, 11 00:00:42,137 --> 00:00:45,954 taking a word, sentence, or entire book in one language 12 00:00:45,954 --> 00:00:49,004 and translating it into almost any other, 13 00:00:49,004 --> 00:00:52,337 whether it's modern English or Ancient Sanskrit. 14 00:00:52,337 --> 00:00:55,913 And if translation were just a matter of looking up words in a dictionary, 15 00:00:55,913 --> 00:00:59,825 these programs would run circles around humans. 16 00:00:59,825 --> 00:01:03,299 The reality, however, is a bit more complicated. 17 00:01:03,299 --> 00:01:07,349 A rule-based translation program uses a lexical database, 18 00:01:07,349 --> 00:01:10,302 which includes all the words you'd find in a dictionary 19 00:01:10,302 --> 00:01:13,283 and all grammatical forms they can take, 20 00:01:13,283 --> 00:01:18,925 and set of rules to recognize the basic linguistic elements in the input language. 21 00:01:18,925 --> 00:01:22,396 For a seemingly simple sentence like, "The children eat the muffins," 22 00:01:22,396 --> 00:01:27,050 the program first parses its syntax, or grammatical structure, 23 00:01:27,050 --> 00:01:29,587 by identifying the children as the subject, 24 00:01:29,587 --> 00:01:32,317 and the rest of the sentence as the predicate 25 00:01:32,317 --> 00:01:34,368 consisting of a verb "eat," 26 00:01:34,368 --> 00:01:37,422 and a direct object "the muffins." 27 00:01:37,422 --> 00:01:40,249 It then needs to recognize English morphology, 28 00:01:40,249 --> 00:01:44,681 or how the language can be broken down into its smallest meaningful units, 29 00:01:44,681 --> 00:01:46,124 such as the word muffin 30 00:01:46,124 --> 00:01:49,755 and the suffix "s," used to indicate plural. 31 00:01:49,755 --> 00:01:52,449 Finally, it needs to understand the semantics, 32 00:01:52,449 --> 00:01:56,178 what the different parts of the sentence actually mean. 33 00:01:56,178 --> 00:01:58,074 To translate this sentence properly, 34 00:01:58,074 --> 00:02:01,982 the program would refer to a different set of vocabulary and rules 35 00:02:01,982 --> 00:02:05,166 for each element of the target language. 36 00:02:05,166 --> 00:02:07,020 But this is where it gets tricky. 37 00:02:07,020 --> 00:02:11,820 The syntax of some languages allows words to be arranged in any order, 38 00:02:11,820 --> 00:02:16,954 while in others, doing so could make the muffin eat the child. 39 00:02:16,954 --> 00:02:19,647 Morphology can also pose a problem. 40 00:02:19,647 --> 00:02:23,243 Slovene distinguishes between two children and three or more 41 00:02:23,243 --> 00:02:27,097 using a dual suffix absent in many other languages, 42 00:02:27,097 --> 00:02:30,532 while Russian's lack of definite articles might leave you wondering 43 00:02:30,532 --> 00:02:33,575 whether the children are eating some particular muffins, 44 00:02:33,575 --> 00:02:36,719 or just eat muffins in general. 45 00:02:36,719 --> 00:02:39,708 Finally, even when the semantics are technically correct, 46 00:02:39,708 --> 00:02:42,757 the program might miss their finer points, 47 00:02:42,757 --> 00:02:45,809 such as whether the children "mangiano" the muffins, 48 00:02:45,809 --> 00:02:47,794 or "divorano" them. 49 00:02:47,794 --> 00:02:51,558 Another method is statistical machine translation, 50 00:02:51,558 --> 00:02:55,762 which analyzes a database of books, articles, and documents 51 00:02:55,762 --> 00:02:59,488 that have already been translated by humans. 52 00:02:59,488 --> 00:03:02,959 By finding matches between source and translated text 53 00:03:02,959 --> 00:03:05,393 that are unlikely to occur by chance, 54 00:03:05,393 --> 00:03:09,345 the program can identify corresponding phrases and patterns, 55 00:03:09,345 --> 00:03:12,429 and use them for future translations. 56 00:03:12,429 --> 00:03:14,969 However, the quality of this type of translation 57 00:03:14,969 --> 00:03:17,690 depends on the size of the initial database 58 00:03:17,690 --> 00:03:21,357 and the availability of samples for certain languages 59 00:03:21,357 --> 00:03:23,383 or styles of writing. 60 00:03:23,383 --> 00:03:27,140 The difficulty that computers have with the exceptions, irregularities 61 00:03:27,140 --> 00:03:30,994 and shades of meaning that seem to come instinctively to humans 62 00:03:30,994 --> 00:03:35,045 has led some researchers to believe that our understanding of language 63 00:03:35,045 --> 00:03:39,251 is a unique product of our biological brain structure. 64 00:03:39,251 --> 00:03:43,101 In fact, one of the most famous fictional universal translators, 65 00:03:43,101 --> 00:03:46,439 the Babel fish from "The Hitchhiker's Guide to the Galaxy", 66 00:03:46,439 --> 00:03:49,726 is not a machine at all but a small creature 67 00:03:49,726 --> 00:03:54,210 that translates the brain waves and nerve signals of sentient species 68 00:03:54,210 --> 00:03:57,005 through a form of telepathy. 69 00:03:57,005 --> 00:03:59,726 For now, learning a language the old fashioned way 70 00:03:59,726 --> 00:04:05,106 will still give you better results than any currently available computer program. 71 00:04:05,106 --> 00:04:06,749 But this is no easy task, 72 00:04:06,749 --> 00:04:09,014 and the sheer number of languages in the world, 73 00:04:09,014 --> 00:04:12,989 as well as the increasing interaction between the people who speak them, 74 00:04:12,989 --> 00:04:18,004 will only continue to spur greater advances in automatic translation. 75 00:04:18,004 --> 00:04:21,409 Perhaps by the time we encounter intergalactic life forms, 76 00:04:21,409 --> 00:04:24,660 we'll be able to communicate with them through a tiny gizmo, 77 00:04:24,660 --> 00:04:29,026 or we might have to start compiling that dictionary, after all.