1 00:00:00,285 --> 00:00:03,178 This is an image of the planet Earth. 2 00:00:03,178 --> 00:00:06,271 It looks very much like the Apollo pictures 3 00:00:06,271 --> 00:00:07,882 that are very well known. 4 00:00:07,882 --> 00:00:09,952 There is something different; 5 00:00:09,952 --> 00:00:11,399 you can click on it, 6 00:00:11,399 --> 00:00:12,597 and if you click on it, 7 00:00:12,597 --> 00:00:15,669 you can zoom in on almost any place on the Earth. 8 00:00:15,669 --> 00:00:17,668 For instance, this is a bird's-eye view 9 00:00:17,668 --> 00:00:20,334 of the EPFL campus. 10 00:00:20,334 --> 00:00:22,442 In many cases, you can also see 11 00:00:22,442 --> 00:00:26,182 how a building looks from a nearby street. 12 00:00:26,182 --> 00:00:27,604 This is pretty amazing. 13 00:00:27,604 --> 00:00:31,031 But there's something missing in this wonderful tour: 14 00:00:31,031 --> 00:00:33,219 It's time. 15 00:00:33,219 --> 00:00:36,289 i'm not really sure when this picture was taken. 16 00:00:36,289 --> 00:00:37,701 I'm not even sure it was taken 17 00:00:37,701 --> 00:00:43,784 at the same moment as the bird's-eye view. 18 00:00:43,784 --> 00:00:45,993 In my lab, we develop tools 19 00:00:45,993 --> 00:00:47,757 to travel not only in space 20 00:00:47,757 --> 00:00:50,315 but also through time. 21 00:00:50,315 --> 00:00:52,185 The kind of question we're asking is 22 00:00:52,185 --> 00:00:53,578 Is it possible to build something 23 00:00:53,578 --> 00:00:55,756 like Google Maps of the past? 24 00:00:55,756 --> 00:00:59,066 Can I add a slider on top of Google Maps 25 00:00:59,066 --> 00:01:00,869 and just change the year, 26 00:01:00,869 --> 00:01:02,660 seeing how it was 100 years before, 27 00:01:02,660 --> 00:01:04,329 1,000 years before? 28 00:01:04,329 --> 00:01:06,452 Is that possible? 29 00:01:06,452 --> 00:01:08,704 Can I reconstruct social networks of the past? 30 00:01:08,704 --> 00:01:11,753 Can I make a Facebook of the Middle Ages? 31 00:01:11,753 --> 00:01:15,529 So, can I build time machines? 32 00:01:15,529 --> 00:01:18,094 Maybe we can just say, "No, it's not possible." 33 00:01:18,094 --> 00:01:21,904 Or, maybe, we can think of it from an information point of view. 34 00:01:21,904 --> 00:01:25,094 This is what I call the information mushroom. 35 00:01:25,094 --> 00:01:26,677 Vertically, you have the time. 36 00:01:26,677 --> 00:01:29,417 and horizontally, the amount of digital information available. 37 00:01:29,417 --> 00:01:32,899 Obviously, in the last 10 years, we have much information. 38 00:01:32,899 --> 00:01:36,447 And obviously the more we go in the past, the less information we have. 39 00:01:36,447 --> 00:01:38,765 If we want to build something like Google Maps of the past, 40 00:01:38,765 --> 00:01:40,259 or Facebook of the past, 41 00:01:40,259 --> 00:01:41,833 we need to enlarge this space, 42 00:01:41,833 --> 00:01:43,771 we need to make that like a rectangle. 43 00:01:43,771 --> 00:01:45,281 How do we do that? 44 00:01:45,281 --> 00:01:47,379 One way is digitization. 45 00:01:47,395 --> 00:01:49,174 There's a lot of material available -- 46 00:01:49,190 --> 00:01:55,460 newspaper, printed books, thousands of printed books. 47 00:01:55,460 --> 00:01:57,228 I can digitize all these. 48 00:01:57,228 --> 00:01:59,965 I can extract information from these. 49 00:01:59,965 --> 00:02:03,508 Of course, the more you go in the past, the less information you will have. 50 00:02:03,508 --> 00:02:06,154 So, it might not be enough. 51 00:02:06,154 --> 00:02:08,562 So, I can do what historians do. 52 00:02:08,562 --> 00:02:10,086 I can extrapolate. 53 00:02:10,086 --> 00:02:14,556 This is what we call, in computer science, simulation. 54 00:02:14,556 --> 00:02:16,307 If I take a log book, 55 00:02:16,307 --> 00:02:18,711 I can consider, it's not just a log book 56 00:02:18,711 --> 00:02:21,683 of a Venetian captain going to a particular journey. 57 00:02:21,683 --> 00:02:23,326 I can consider it is actually a log book 58 00:02:23,326 --> 00:02:25,908 which is representative of many journeys of that period. 59 00:02:25,908 --> 00:02:28,153 I'm extrapolating. 60 00:02:28,153 --> 00:02:30,191 If I have a painting of a facade, 61 00:02:30,191 --> 00:02:32,942 I can consider it's not just that particular building, 62 00:02:32,942 --> 00:02:36,874 but probably it also shares the same grammar 63 00:02:36,874 --> 00:02:40,915 of buildings where we lost any information. 64 00:02:40,915 --> 00:02:43,773 So if we want to construct a time machine, 65 00:02:43,773 --> 00:02:45,112 we need two things. 66 00:02:45,112 --> 00:02:47,346 We need very large archives, 67 00:02:47,346 --> 00:02:50,088 and we need excellent specialists. 68 00:02:50,088 --> 00:02:51,962 The Venice Time Machine, 69 00:02:51,962 --> 00:02:53,767 the project I'm going to talk to you about, 70 00:02:53,767 --> 00:02:56,787 is a joint project between the EPFL 71 00:02:56,787 --> 00:02:59,765 and the University of Venice Ca'Foscari. 72 00:02:59,765 --> 00:03:01,930 There's something very peculiar about Venice, 73 00:03:01,930 --> 00:03:04,604 that its administration has been 74 00:03:04,604 --> 00:03:06,798 very, very bureaucratic. 75 00:03:06,798 --> 00:03:08,991 They've been keeping track of everything, 76 00:03:08,991 --> 00:03:11,906 almost like Google today. 77 00:03:11,906 --> 00:03:13,420 At the Archivio di Stato, 78 00:03:13,420 --> 00:03:15,184 you have 80 kilometers of archives 79 00:03:15,184 --> 00:03:17,193 documenting every aspect 80 00:03:17,193 --> 00:03:19,439 of the life of Venice over more than 1,000 years. 81 00:03:19,439 --> 00:03:21,359 You have every boat that goes out, 82 00:03:21,359 --> 00:03:22,435 every boat that comes in. 83 00:03:22,435 --> 00:03:25,232 You have every change that was made in the city. 84 00:03:25,232 --> 00:03:28,523 This is all there. 85 00:03:28,523 --> 00:03:32,431 We are setting up a 10-year digitization program 86 00:03:32,431 --> 00:03:34,108 which has the objective of transforming 87 00:03:34,108 --> 00:03:35,492 this immense archive 88 00:03:35,492 --> 00:03:37,918 into a giant information system. 89 00:03:37,918 --> 00:03:39,775 The type of objective we want to reach 90 00:03:39,775 --> 00:03:44,501 is 450 books a day that can be digitized. 91 00:03:44,501 --> 00:03:46,748 Of course, when you digitize, that's not enough, 92 00:03:46,748 --> 00:03:48,035 because these documents, 93 00:03:48,035 --> 00:03:50,674 most of them are in Latin, in Tuscan, 94 00:03:50,689 --> 00:03:52,204 in Venetian dialect, 95 00:03:52,204 --> 00:03:53,879 so you need to transcribe them, 96 00:03:53,879 --> 00:03:55,560 to translate them in some cases, 97 00:03:55,560 --> 00:03:56,680 to index them, 98 00:03:56,680 --> 00:03:58,844 and this is obviously not easy. 99 00:03:58,844 --> 00:04:02,688 In particular, traditional optical character recognition method 100 00:04:02,688 --> 00:04:04,112 that can be used for printed manuscripts, 101 00:04:04,112 --> 00:04:08,116 they do not work well on the handwritten document. 102 00:04:08,116 --> 00:04:10,246 So the solution is actually to take inspiration 103 00:04:10,246 --> 00:04:13,147 from another domain: speech recognition. 104 00:04:13,147 --> 00:04:15,202 This is a domain of something that seems impossible, 105 00:04:15,202 --> 00:04:17,739 which can actually be done, 106 00:04:17,739 --> 00:04:19,933 simply by putting additional constraints. 107 00:04:19,933 --> 00:04:21,519 If you have a very good model 108 00:04:21,519 --> 00:04:23,045 of a language which is used, 109 00:04:23,045 --> 00:04:25,131 if you have a very good model of a document, 110 00:04:25,131 --> 00:04:26,563 how well they are structured. 111 00:04:26,563 --> 00:04:27,916 And these are administrative documents. 112 00:04:27,931 --> 00:04:30,063 They are well structured in many cases. 113 00:04:30,063 --> 00:04:33,371 If you divide this huge archive into smaller subsets 114 00:04:33,371 --> 00:04:36,248 where a smaller subset actually shares similar features, 115 00:04:36,248 --> 00:04:40,279 then there's a chance of success. 116 00:04:42,761 --> 00:04:45,196 If we reach that stage, then there's something else: 117 00:04:45,196 --> 00:04:48,718 we can extract from this document events. 118 00:04:48,718 --> 00:04:51,016 Actually probably 10 billion events 119 00:04:51,016 --> 00:04:52,947 can be extracted from this archive. 120 00:04:52,947 --> 00:04:54,671 And this giant information system 121 00:04:54,671 --> 00:04:56,487 can be searched in many ways. 122 00:04:56,487 --> 00:04:57,855 You can ask questions like, 123 00:04:57,855 --> 00:05:00,615 "Who lived in this palazzo in 1323?" 124 00:05:00,615 --> 00:05:02,837 "How much cost a sea bream at the Realto market 125 00:05:02,837 --> 00:05:04,561 in 1434?" 126 00:05:04,561 --> 00:05:06,021 "What was the salary 127 00:05:06,021 --> 00:05:08,066 of a glass maker in Murano 128 00:05:08,066 --> 00:05:09,472 maybe over a decade?" 129 00:05:09,472 --> 00:05:10,894 You can ask even bigger questions 130 00:05:10,894 --> 00:05:13,632 because it will be semantically coded. 131 00:05:13,632 --> 00:05:15,772 And then what you can do is put that in space, 132 00:05:15,772 --> 00:05:17,945 because much of this information is spatial. 133 00:05:17,945 --> 00:05:19,880 And from that, you can do things like 134 00:05:19,880 --> 00:05:21,993 reconstructing this extraordinary journey 135 00:05:21,993 --> 00:05:25,349 of that city that managed to have a sustainable development 136 00:05:25,349 --> 00:05:27,475 over a thousand years, 137 00:05:27,475 --> 00:05:29,095 managing to have all the time 138 00:05:29,095 --> 00:05:31,956 a form of equilibrium with its environment. 139 00:05:31,956 --> 00:05:33,204 You can reconstruct that journey, 140 00:05:33,204 --> 00:05:36,100 visualize it in many different ways. 141 00:05:36,100 --> 00:05:38,799 But of course, you cannot understand Venice if you just look at the city. 142 00:05:38,799 --> 00:05:41,195 You have to put it in a larger European context. 143 00:05:41,195 --> 00:05:44,016 So the idea is also to document all the things 144 00:05:44,016 --> 00:05:46,439 that worked at the European level. 145 00:05:46,439 --> 00:05:48,403 We can reconstruct also the journey 146 00:05:48,403 --> 00:05:50,393 of the Venetian maritime empire, 147 00:05:50,393 --> 00:05:53,559 how it progressively controlled the Adriatic Sea, 148 00:05:53,559 --> 00:05:57,305 how it became the most powerful medieval empire 149 00:05:57,305 --> 00:05:58,866 of its time, 150 00:05:58,866 --> 00:06:01,038 controlling most of the sea routes 151 00:06:01,038 --> 00:06:03,971 from the east to the south. 152 00:06:05,305 --> 00:06:07,621 But you can even do other things, 153 00:06:07,621 --> 00:06:09,898 because in these maritime routes, 154 00:06:09,898 --> 00:06:11,873 there are regular patterns. 155 00:06:11,889 --> 00:06:14,382 You can go one step beyond 156 00:06:14,382 --> 00:06:16,502 and actually create a simulation system, 157 00:06:16,502 --> 00:06:19,317 create a Mediterranean simulator 158 00:06:19,317 --> 00:06:21,910 which is capable actually of reconstructing 159 00:06:21,910 --> 00:06:24,112 even the information we are missing, 160 00:06:24,112 --> 00:06:27,100 which would enable us to have questions you could ask 161 00:06:27,100 --> 00:06:30,088 like if you were using a route planner. 162 00:06:30,088 --> 00:06:33,159 "If I am in Corfu in June 1323 163 00:06:33,159 --> 00:06:35,685 and want to go to Constantinople, 164 00:06:35,685 --> 00:06:37,828 where can I take a boat?" 165 00:06:37,828 --> 00:06:39,195 Probably we can answer this question 166 00:06:39,195 --> 00:06:43,668 with one or two or three days' precision. 167 00:06:43,668 --> 00:06:45,275 "How much will it cost?" 168 00:06:45,275 --> 00:06:48,867 "What are the chance of encountering pirates?" 169 00:06:48,867 --> 00:06:50,678 Of course, you understand, 170 00:06:50,678 --> 00:06:53,287 the central scientific challenge of a project like this one 171 00:06:53,287 --> 00:06:57,016 is qualifying, quantifying and representing 172 00:06:57,016 --> 00:07:00,346 uncertainty and inconsistency at each step of this process. 173 00:07:00,346 --> 00:07:03,058 There are errors everywhere, 174 00:07:03,058 --> 00:07:05,547 errors in the document, it's the wrong name of the captain, 175 00:07:05,547 --> 00:07:08,760 some of the boats never actually took to sea. 176 00:07:08,760 --> 00:07:13,617 There are errors in translation, interpretative biases, 177 00:07:13,624 --> 00:07:17,090 and on top of that, if you add algorithmic processes, 178 00:07:17,090 --> 00:07:20,039 you're going to have errors in recognition, 179 00:07:20,039 --> 00:07:22,000 errors in extraction, 180 00:07:22,000 --> 00:07:26,481 so you have very, very uncertain data. 181 00:07:26,481 --> 00:07:30,238 So how can we detect and correct these inconsistencies? 182 00:07:30,238 --> 00:07:33,898 How can we represent that form of uncertainty? 183 00:07:33,898 --> 00:07:35,995 It's difficult. One thing you can do 184 00:07:35,995 --> 00:07:38,221 is document each step of the process, 185 00:07:38,221 --> 00:07:40,669 not only coding the historical information 186 00:07:40,669 --> 00:07:43,348 but what we call the meta-historical information, 187 00:07:43,348 --> 00:07:46,011 how is historical knowledge constructed, 188 00:07:46,011 --> 00:07:48,009 documenting each step. 189 00:07:48,009 --> 00:07:49,654 That will not guarantee that we actually converge 190 00:07:49,654 --> 00:07:52,104 toward a single story of Venice, 191 00:07:52,104 --> 00:07:54,242 but probably we can actually reconstruct 192 00:07:54,242 --> 00:07:57,290 a fully documented potential story of Venice. 193 00:07:57,290 --> 00:07:58,749 Maybe there's not a single map. 194 00:07:58,749 --> 00:08:00,869 Maybe there are several maps. 195 00:08:00,869 --> 00:08:03,085 The system should allow for that, 196 00:08:03,085 --> 00:08:05,944 because we have to deal with a new form of uncertainty, 197 00:08:05,944 --> 00:08:10,585 which is really new for this type of giant databases. 198 00:08:10,585 --> 00:08:12,775 And how should we communicate 199 00:08:12,790 --> 00:08:16,769 this new research to a large audience? 200 00:08:16,769 --> 00:08:19,432 Again, Venice is extraordinary for that. 201 00:08:19,432 --> 00:08:21,603 With the millions of visitors that come every year, 202 00:08:21,603 --> 00:08:23,366 it's actually one of the best places 203 00:08:23,366 --> 00:08:26,354 to try to invent the museum of the future. 204 00:08:26,354 --> 00:08:29,658 Imagine, horizontally you see the reconstructed map 205 00:08:29,658 --> 00:08:30,944 of a given year, 206 00:08:30,944 --> 00:08:33,902 and vertically, you see the document 207 00:08:33,902 --> 00:08:35,413 that served the reconstruction, 208 00:08:35,413 --> 00:08:38,813 paintings, for instance. 209 00:08:38,813 --> 00:08:41,393 Imagine an immersive system that permits 210 00:08:41,393 --> 00:08:44,895 to go and dive and reconstruct the Venice of a given year, 211 00:08:44,895 --> 00:08:47,610 some experience you could share within a group. 212 00:08:47,610 --> 00:08:49,856 On the contrary, imagine actually that you start 213 00:08:49,856 --> 00:08:52,063 from a document, a Venetian manuscript, 214 00:08:52,063 --> 00:08:55,112 and you show, actually, what you can construct out of it, 215 00:08:55,112 --> 00:08:56,884 how it is decoded, 216 00:08:56,884 --> 00:08:59,299 how the context of that document can be recreated. 217 00:08:59,299 --> 00:09:01,184 This is an image from an exhibit 218 00:09:01,184 --> 00:09:03,460 which is currently conducted in Geneva 219 00:09:03,460 --> 00:09:05,814 with that type of system. 220 00:09:05,814 --> 00:09:07,989 So to conclude, we can say that 221 00:09:07,989 --> 00:09:11,068 research in the humanities is about to undergo 222 00:09:11,068 --> 00:09:12,870 an evolution which is maybe similar 223 00:09:12,870 --> 00:09:17,452 to what happened to life sciences 30 years ago. 224 00:09:17,452 --> 00:09:22,128 It's really a question of scale. 225 00:09:22,130 --> 00:09:25,433 We see projects which are 226 00:09:25,433 --> 00:09:29,276 much beyond any single research team can do, 227 00:09:29,276 --> 00:09:31,519 and this is really new for the humanities, 228 00:09:31,519 --> 00:09:35,388 which very often take the habit of working 229 00:09:35,388 --> 00:09:39,396 in small groups or only with a couple of researchers. 230 00:09:39,396 --> 00:09:41,514 When you visit the Archivio di Stato, 231 00:09:41,514 --> 00:09:44,336 you feel this is beyond what any single team can do, 232 00:09:44,336 --> 00:09:48,170 and that should be a joint and common effort. 233 00:09:48,170 --> 00:09:51,276 So what we must do for this paradigm shift 234 00:09:51,276 --> 00:09:53,178 is actually foster a new generation 235 00:09:53,178 --> 00:09:54,715 of "digital humanists" 236 00:09:54,715 --> 00:09:56,805 that are going to be ready for this shift. 237 00:09:56,805 --> 00:09:58,764 I thank you very much. 238 00:09:58,764 --> 00:10:02,764 (Applause)