Return to Video

How I built an information time machine

  • 0:00 - 0:03
    This is an image of the planet Earth.
  • 0:03 - 0:06
    It looks very much like the Apollo pictures
  • 0:06 - 0:08
    that are very well known.
  • 0:08 - 0:10
    There is something different;
  • 0:10 - 0:11
    you can click on it,
  • 0:11 - 0:13
    and if you click on it,
  • 0:13 - 0:16
    you can zoom in on almost any place on the Earth.
  • 0:16 - 0:18
    For instance, this is a bird's-eye view
  • 0:18 - 0:20
    of the EPFL campus.
  • 0:20 - 0:22
    In many cases, you can also see
  • 0:22 - 0:26
    how a building looks from a nearby street.
  • 0:26 - 0:28
    This is pretty amazing.
  • 0:28 - 0:31
    But there's something missing in this wonderful tour:
  • 0:31 - 0:33
    It's time.
  • 0:33 - 0:36
    i'm not really sure when this picture was taken.
  • 0:36 - 0:38
    I'm not even sure it was taken
  • 0:38 - 0:44
    at the same moment as the bird's-eye view.
  • 0:44 - 0:46
    In my lab, we develop tools
  • 0:46 - 0:48
    to travel not only in space
  • 0:48 - 0:50
    but also through time.
  • 0:50 - 0:52
    The kind of question we're asking is
  • 0:52 - 0:54
    Is it possible to build something
  • 0:54 - 0:56
    like Google Maps of the past?
  • 0:56 - 0:59
    Can I add a slider on top of Google Maps
  • 0:59 - 1:01
    and just change the year,
  • 1:01 - 1:03
    seeing how it was 100 years before,
  • 1:03 - 1:04
    1,000 years before?
  • 1:04 - 1:06
    Is that possible?
  • 1:06 - 1:09
    Can I reconstruct social networks of the past?
  • 1:09 - 1:12
    Can I make a Facebook of the Middle Ages?
  • 1:12 - 1:16
    So, can I build time machines?
  • 1:16 - 1:18
    Maybe we can just say, "No, it's not possible."
  • 1:18 - 1:22
    Or, maybe, we can think of it from an information point of view.
  • 1:22 - 1:25
    This is what I call the information mushroom.
  • 1:25 - 1:27
    Vertically, you have the time.
  • 1:27 - 1:29
    and horizontally, the amount of digital information available.
  • 1:29 - 1:33
    Obviously, in the last 10 years, we have much information.
  • 1:33 - 1:36
    And obviously the more we go in the past, the less information we have.
  • 1:36 - 1:39
    If we want to build something like Google Maps of the past,
  • 1:39 - 1:40
    or Facebook of the past,
  • 1:40 - 1:42
    we need to enlarge this space,
  • 1:42 - 1:44
    we need to make that like a rectangle.
  • 1:44 - 1:45
    How do we do that?
  • 1:45 - 1:47
    One way is digitization.
  • 1:47 - 1:49
    There's a lot of material available --
  • 1:49 - 1:55
    newspaper, printed books, thousands of printed books.
  • 1:55 - 1:57
    I can digitize all these.
  • 1:57 - 2:00
    I can extract information from these.
  • 2:00 - 2:04
    Of course, the more you go in the past,
    the less information you will have.
  • 2:04 - 2:06
    So, it might not be enough.
  • 2:06 - 2:09
    So, I can do what historians do.
  • 2:09 - 2:10
    I can extrapolate.
  • 2:10 - 2:15
    This is what we call, in computer science, simulation.
  • 2:15 - 2:16
    If I take a log book,
  • 2:16 - 2:19
    I can consider, it's not just a log book
  • 2:19 - 2:22
    of a Venetian captain going to a particular journey.
  • 2:22 - 2:23
    I can consider it is actually a log book
  • 2:23 - 2:26
    which is representative of
    many journeys of that period.
  • 2:26 - 2:28
    I'm extrapolating.
  • 2:28 - 2:30
    If I have a painting of a facade,
  • 2:30 - 2:33
    I can consider it's not just that particular building,
  • 2:33 - 2:37
    but probably it also shares the same grammar
  • 2:37 - 2:41
    of buildings where we lost any information.
  • 2:41 - 2:44
    So if we want to construct a time machine,
  • 2:44 - 2:45
    we need two things.
  • 2:45 - 2:47
    We need very large archives,
  • 2:47 - 2:50
    and we need excellent specialists.
  • 2:50 - 2:52
    The Venice Time Machine,
  • 2:52 - 2:54
    the project I'm going to talk to you about,
  • 2:54 - 2:57
    is a joint project between the EPFL
  • 2:57 - 3:00
    and the University of Venice Ca'Foscari.
  • 3:00 - 3:02
    There's something very peculiar about Venice,
  • 3:02 - 3:05
    that its administration has been
  • 3:05 - 3:07
    very, very bureaucratic.
  • 3:07 - 3:09
    They've been keeping track of everything,
  • 3:09 - 3:12
    almost like Google today.
  • 3:12 - 3:13
    At the Archivio di Stato,
  • 3:13 - 3:15
    you have 80 kilometers of archives
  • 3:15 - 3:17
    documenting every aspect
  • 3:17 - 3:19
    of the life of Venice over
    more than 1,000 years.
  • 3:19 - 3:21
    You have every boat that goes out,
  • 3:21 - 3:22
    every boat that comes in.
  • 3:22 - 3:25
    You have every change that was made in the city.
  • 3:25 - 3:29
    This is all there.
  • 3:29 - 3:32
    We are setting up a 10-year digitization program
  • 3:32 - 3:34
    which has the objective of transforming
  • 3:34 - 3:35
    this immense archive
  • 3:35 - 3:38
    into a giant information system.
  • 3:38 - 3:40
    The type of objective we want to reach
  • 3:40 - 3:45
    is 450 books a day that can be digitized.
  • 3:45 - 3:47
    Of course, when you digitize, that's not enough,
  • 3:47 - 3:48
    because these documents,
  • 3:48 - 3:51
    most of them are in Latin, in Tuscan,
  • 3:51 - 3:52
    in Venetian dialect,
  • 3:52 - 3:54
    so you need to transcribe them,
  • 3:54 - 3:56
    to translate them in some cases,
  • 3:56 - 3:57
    to index them,
  • 3:57 - 3:59
    and this is obviously not easy.
  • 3:59 - 4:03
    In particular, traditional optical
    character recognition method
  • 4:03 - 4:04
    that can be used for printed manuscripts,
  • 4:04 - 4:08
    they do not work well on the handwritten document.
  • 4:08 - 4:10
    So the solution is actually to take inspiration
  • 4:10 - 4:13
    from another domain: speech recognition.
  • 4:13 - 4:15
    This is a domain of something
    that seems impossible,
  • 4:15 - 4:18
    which can actually be done,
  • 4:18 - 4:20
    simply by putting additional constraints.
  • 4:20 - 4:22
    If you have a very good model
  • 4:22 - 4:23
    of a language which is used,
  • 4:23 - 4:25
    if you have a very good model of a document,
  • 4:25 - 4:27
    how well they are structured.
  • 4:27 - 4:28
    And these are administrative documents.
  • 4:28 - 4:30
    They are well structured in many cases.
  • 4:30 - 4:33
    If you divide this huge archive into smaller subsets
  • 4:33 - 4:36
    where a smaller subset
    actually shares similar features,
  • 4:36 - 4:40
    then there's a chance of success.
  • 4:43 - 4:45
    If we reach that stage, then there's something else:
  • 4:45 - 4:49
    we can extract from this document events.
  • 4:49 - 4:51
    Actually probably 10 billion events
  • 4:51 - 4:53
    can be extracted from this archive.
  • 4:53 - 4:55
    And this giant information system
  • 4:55 - 4:56
    can be searched in many ways.
  • 4:56 - 4:58
    You can ask questions like,
  • 4:58 - 5:01
    "Who lived in this palazzo in 1323?"
  • 5:01 - 5:03
    "How much cost a sea bream at the Realto market
  • 5:03 - 5:05
    in 1434?"
  • 5:05 - 5:06
    "What was the salary
  • 5:06 - 5:08
    of a glass maker in Murano
  • 5:08 - 5:09
    maybe over a decade?"
  • 5:09 - 5:11
    You can ask even bigger questions
  • 5:11 - 5:14
    because it will be semantically coded.
  • 5:14 - 5:16
    And then what you can do is put that in space,
  • 5:16 - 5:18
    because much of this information is spatial.
  • 5:18 - 5:20
    And from that, you can do things like
  • 5:20 - 5:22
    reconstructing this extraordinary journey
  • 5:22 - 5:25
    of that city that managed to
    have a sustainable development
  • 5:25 - 5:27
    over a thousand years,
  • 5:27 - 5:29
    managing to have all the time
  • 5:29 - 5:32
    a form of equilibrium with its environment.
  • 5:32 - 5:33
    You can reconstruct that journey,
  • 5:33 - 5:36
    visualize it in many different ways.
  • 5:36 - 5:39
    But of course, you cannot understand
    Venice if you just look at the city.
  • 5:39 - 5:41
    You have to put it in a larger European context.
  • 5:41 - 5:44
    So the idea is also to document all the things
  • 5:44 - 5:46
    that worked at the European level.
  • 5:46 - 5:48
    We can reconstruct also the journey
  • 5:48 - 5:50
    of the Venetian maritime empire,
  • 5:50 - 5:54
    how it progressively controlled the Adriatic Sea,
  • 5:54 - 5:57
    how it became the most powerful medieval empire
  • 5:57 - 5:59
    of its time,
  • 5:59 - 6:01
    controlling most of the sea routes
  • 6:01 - 6:04
    from the east to the south.
  • 6:05 - 6:08
    But you can even do other things,
  • 6:08 - 6:10
    because in these maritime routes,
  • 6:10 - 6:12
    there are regular patterns.
  • 6:12 - 6:14
    You can go one step beyond
  • 6:14 - 6:17
    and actually create a simulation system,
  • 6:17 - 6:19
    create a Mediterranean simulator
  • 6:19 - 6:22
    which is capable actually of reconstructing
  • 6:22 - 6:24
    even the information we are missing,
  • 6:24 - 6:27
    which would enable us to have
    questions you could ask
  • 6:27 - 6:30
    like if you were using a route planner.
  • 6:30 - 6:33
    "If I am in Corfu in June 1323
  • 6:33 - 6:36
    and want to go to Constantinople,
  • 6:36 - 6:38
    where can I take a boat?"
  • 6:38 - 6:39
    Probably we can answer this question
  • 6:39 - 6:44
    with one or two or three days' precision.
  • 6:44 - 6:45
    "How much will it cost?"
  • 6:45 - 6:49
    "What are the chance of encountering pirates?"
  • 6:49 - 6:51
    Of course, you understand,
  • 6:51 - 6:53
    the central scientific challenge
    of a project like this one
  • 6:53 - 6:57
    is qualifying, quantifying and representing
  • 6:57 - 7:00
    uncertainty and inconsistency
    at each step of this process.
  • 7:00 - 7:03
    There are errors everywhere,
  • 7:03 - 7:06
    errors in the document, it's
    the wrong name of the captain,
  • 7:06 - 7:09
    some of the boats never actually took to sea.
  • 7:09 - 7:14
    There are errors in translation, interpretative biases,
  • 7:14 - 7:17
    and on top of that, if you add algorithmic processes,
  • 7:17 - 7:20
    you're going to have errors in recognition,
  • 7:20 - 7:22
    errors in extraction,
  • 7:22 - 7:26
    so you have very, very uncertain data.
  • 7:26 - 7:30
    So how can we detect and
    correct these inconsistencies?
  • 7:30 - 7:34
    How can we represent that form of uncertainty?
  • 7:34 - 7:36
    It's difficult. One thing you can do
  • 7:36 - 7:38
    is document each step of the process,
  • 7:38 - 7:41
    not only coding the historical information
  • 7:41 - 7:43
    but what we call the meta-historical information,
  • 7:43 - 7:46
    how is historical knowledge constructed,
  • 7:46 - 7:48
    documenting each step.
  • 7:48 - 7:50
    That will not guarantee that we actually converge
  • 7:50 - 7:52
    toward a single story of Venice,
  • 7:52 - 7:54
    but probably we can actually reconstruct
  • 7:54 - 7:57
    a fully documented potential story of Venice.
  • 7:57 - 7:59
    Maybe there's not a single map.
  • 7:59 - 8:01
    Maybe there are several maps.
  • 8:01 - 8:03
    The system should allow for that,
  • 8:03 - 8:06
    because we have to deal with
    a new form of uncertainty,
  • 8:06 - 8:11
    which is really new for this type of giant databases.
  • 8:11 - 8:13
    And how should we communicate
  • 8:13 - 8:17
    this new research to a large audience?
  • 8:17 - 8:19
    Again, Venice is extraordinary for that.
  • 8:19 - 8:22
    With the millions of visitors that come every year,
  • 8:22 - 8:23
    it's actually one of the best places
  • 8:23 - 8:26
    to try to invent the museum of the future.
  • 8:26 - 8:30
    Imagine, horizontally you see the reconstructed map
  • 8:30 - 8:31
    of a given year,
  • 8:31 - 8:34
    and vertically, you see the document
  • 8:34 - 8:35
    that served the reconstruction,
  • 8:35 - 8:39
    paintings, for instance.
  • 8:39 - 8:41
    Imagine an immersive system that permits
  • 8:41 - 8:45
    to go and dive and reconstruct
    the Venice of a given year,
  • 8:45 - 8:48
    some experience you could share within a group.
  • 8:48 - 8:50
    On the contrary, imagine actually that you start
  • 8:50 - 8:52
    from a document, a Venetian manuscript,
  • 8:52 - 8:55
    and you show, actually, what
    you can construct out of it,
  • 8:55 - 8:57
    how it is decoded,
  • 8:57 - 8:59
    how the context of that document can be recreated.
  • 8:59 - 9:01
    This is an image from an exhibit
  • 9:01 - 9:03
    which is currently conducted in Geneva
  • 9:03 - 9:06
    with that type of system.
  • 9:06 - 9:08
    So to conclude, we can say that
  • 9:08 - 9:11
    research in the humanities is about to undergo
  • 9:11 - 9:13
    an evolution which is maybe similar
  • 9:13 - 9:17
    to what happened to life sciences 30 years ago.
  • 9:17 - 9:22
    It's really a question of scale.
  • 9:22 - 9:25
    We see projects which are
  • 9:25 - 9:29
    much beyond any single research team can do,
  • 9:29 - 9:32
    and this is really new for the humanities,
  • 9:32 - 9:35
    which very often take the habit of working
  • 9:35 - 9:39
    in small groups or only with a couple of researchers.
  • 9:39 - 9:42
    When you visit the Archivio di Stato,
  • 9:42 - 9:44
    you feel this is beyond what any single team can do,
  • 9:44 - 9:48
    and that should be a joint and common effort.
  • 9:48 - 9:51
    So what we must do for this paradigm shift
  • 9:51 - 9:53
    is actually foster a new generation
  • 9:53 - 9:55
    of "digital humanists"
  • 9:55 - 9:57
    that are going to be ready for this shift.
  • 9:57 - 9:59
    I thank you very much.
  • 9:59 - 10:03
    (Applause)
Title:
How I built an information time machine
Speaker:
Frederic Kaplan
Description:

Imagine if you could surf Facebook ... from the Middle Ages. Well, it may not be as far off as it sounds. In a fun and interesting talk, researcher and engineer Frederic Kaplan shows off the Venice Time Machine, a project to digitize 80 kilometers of books to create a historical and geographical simulation of Venice across 1000 years. (Filmed at TEDxCaFoscariU.)

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
10:20

English subtitles

Revisions Compare revisions