Return to Video

How I built an information time machine

  • 0:01 - 0:03
    This is an image of the planet Earth.
  • 0:03 - 0:06
    It looks very much like the Apollo pictures
  • 0:06 - 0:08
    that are very well-known.
  • 0:08 - 0:10
    There is something different;
  • 0:10 - 0:12
    you can click on it,
  • 0:12 - 0:13
    and if you click on it,
  • 0:13 - 0:16
    you can zoom in on almost any place on the Earth.
  • 0:16 - 0:17
    For instancce, this is a bird's-eye view
  • 0:17 - 0:21
    of the EPFL Compass
  • 0:21 - 0:23
    In many cases, you can also see
  • 0:23 - 0:26
    how a building looks from a nearby street.
  • 0:26 - 0:28
    This is pretty amazing.
  • 0:28 - 0:31
    But there's something missing in this wonderful tour:
  • 0:31 - 0:33
    It's time.
  • 0:33 - 0:37
    i'm not really sure when this picture was taken.
  • 0:37 - 0:38
    I'm not even sure it was taken
  • 0:38 - 0:44
    at the same moment as the bird-eye's view.
  • 0:44 - 0:46
    In my lab, we develop tools
  • 0:46 - 0:48
    to not only travel in space
  • 0:48 - 0:50
    but also through time.
  • 0:50 - 0:53
    The kind of question we're asking is
  • 0:53 - 0:54
    Is it possible to build something
  • 0:54 - 0:56
    like Google Maps of the past?
  • 0:56 - 0:59
    Can I add a slider on top of Google Maps
  • 0:59 - 1:01
    and just change the year?
  • 1:01 - 1:03
    Seeing as it was 100 years before
  • 1:03 - 1:04
    a thousand years before,
  • 1:04 - 1:07
    is that possible?
  • 1:07 - 1:09
    Can we construct social networks of the past?
  • 1:09 - 1:12
    Can i Make a Facebook of the Middle Age?
  • 1:12 - 1:16
    So, can I build time machines?
  • 1:16 - 1:18
    You can just say, "No, it's not possible."
  • 1:18 - 1:23
    Or, maybe, we can think of it from an information point of view.
  • 1:23 - 1:26
    This is what I call the Information Mushroom.
  • 1:26 - 1:27
    Vertically, you have the time.
  • 1:27 - 1:30
    and horizontally, the amount of digital information available.
  • 1:30 - 1:33
    Obviously, in the last ten years, we have much information.
  • 1:33 - 1:37
    And obviously the more we go in the past, the less information we have.
  • 1:37 - 1:39
    If you want to build something like the Google Map of the past,
  • 1:39 - 1:41
    or Facebook of the past,
  • 1:41 - 1:42
    we need to enlarge this space,
  • 1:42 - 1:44
    make it like a rectangle.
  • 1:44 - 1:46
    How do we do that?
  • 1:46 - 1:48
    One way is digitization.
  • 1:48 - 1:49
    There's a lot of material available.
  • 1:49 - 1:55
    Newspaper, printed books, thousands of printed books
  • 1:55 - 1:57
    I can digitize all these.
  • 1:57 - 2:00
    I can extract information from these.
  • 2:00 - 2:03
    Of course, the more you go in the past,
    the less information you will have.
  • 2:03 - 2:06
    So, it might not be enough.
  • 2:06 - 2:09
    So, I can do what historians do.
  • 2:09 - 2:10
    I can extrapolate.
  • 2:10 - 2:14
    This is what we call, in computer science, simulation.
  • 2:14 - 2:17
    If I take a log book,
  • 2:17 - 2:19
    I can consider, it's not just a log book
  • 2:19 - 2:22
    of a Venetian captain going to a prodigal journey.
  • 2:22 - 2:23
    I can consider it is actually a log book
  • 2:23 - 2:26
    which is representative of
    many journeys of that period.
  • 2:26 - 2:28
    I'm extrapolating.
  • 2:28 - 2:30
    If I have a painting of a facade,
  • 2:30 - 2:33
    I can consider it's not just that particular building,
  • 2:33 - 2:37
    but probably it also shares the same grammar
  • 2:37 - 2:40
    of buildings we lost any information.
  • 2:42 - 2:44
    So if we want to construct a time machine,
  • 2:44 - 2:46
    we need two things.
  • 2:46 - 2:48
    We need very large archives,
  • 2:48 - 2:51
    and we need excellent specialists.
  • 2:51 - 2:52
    The Venice Time Machine,
  • 2:52 - 2:54
    the project I'm going to talk to you about,
  • 2:54 - 2:57
    is a joint project between the EPFL
  • 2:57 - 3:00
    and the University of Venice Ca'Foscari.
  • 3:00 - 3:02
    There's something very peculiar about Venice
  • 3:02 - 3:04
    is that its administration has been
  • 3:04 - 3:07
    very, very bureaucratic.
  • 3:07 - 3:09
    They've been keeping trace of everything,
  • 3:09 - 3:12
    almost like Google today.
  • 3:12 - 3:14
    At the Archivio di Stato,
  • 3:14 - 3:15
    you have 80 kilometers of archives
  • 3:15 - 3:17
    documenting every aspect
  • 3:17 - 3:20
    of the life of Venice over
    more than a thousand years.
  • 3:20 - 3:22
    You have every boat that goes out,
  • 3:22 - 3:23
    every boat that comes in.
  • 3:23 - 3:25
    You have every change that was made in the city.
  • 3:25 - 3:29
    This is all there.
  • 3:29 - 3:33
    We are setting up a 10-year digitization program
  • 3:33 - 3:34
    which has the objective of transforming
  • 3:34 - 3:36
    this immense archive
  • 3:36 - 3:38
    into a giant information system.
  • 3:38 - 3:40
    The type of objective we want to reach
  • 3:40 - 3:45
    is 450 books a day that can be digitized.
  • 3:45 - 3:47
    Of course, when you digitize, that's not enough,
  • 3:47 - 3:48
    because these documents,
  • 3:48 - 3:50
    most of them are in Latin, in Tuscan,
  • 3:50 - 3:52
    in Venetian dialect,
  • 3:52 - 3:54
    so you need to transcribe them,
  • 3:54 - 3:56
    to translate them in some cases,
  • 3:56 - 3:57
    to index them,
  • 3:57 - 3:59
    and this is obviously not easy.
  • 3:59 - 4:03
    In particular, traditional optical
    character recognition method
  • 4:03 - 4:04
    that can be used for printed manuscripts,
  • 4:04 - 4:09
    they do not work well on the written document.
  • 4:09 - 4:10
    So the solution is actually to take inspiration
  • 4:10 - 4:13
    from another domain: speech recognition.
  • 4:13 - 4:15
    This is a domain of something that seems impossible
  • 4:15 - 4:18
    could actually be done,
  • 4:18 - 4:20
    simply by putting additional constraints.
  • 4:20 - 4:22
    If you have very good model
  • 4:22 - 4:23
    of a language which is used,
  • 4:23 - 4:26
    if you have a very good model of a document,
  • 4:26 - 4:27
    how well they are structured.
  • 4:27 - 4:28
    And these are administrative documents.
  • 4:28 - 4:30
    They are well-structured in many cases.
  • 4:30 - 4:34
    If you divide this archive into smaller subsets
  • 4:34 - 4:36
    where a smaller subset
    actually share similar features,
  • 4:36 - 4:40
    then there's a chance of success.
  • 4:43 - 4:45
    If we reach that stage, then there's something else:
  • 4:45 - 4:49
    we can extract from this document events.
  • 4:49 - 4:51
    Actually probably 10 billions of events
  • 4:51 - 4:53
    can be extracted from this archive.
  • 4:53 - 4:55
    And this giant information system
  • 4:55 - 4:57
    can be searched in many ways.
  • 4:57 - 4:58
    You can ask questions like,
  • 4:58 - 5:01
    "Who lived in this palazzo in 1323?"
  • 5:01 - 5:03
    "How much cost a sea bream at the Realto market
  • 5:03 - 5:05
    in 1434?"
  • 5:05 - 5:06
    "What was the salary
  • 5:06 - 5:08
    of a glass maker in Murano
  • 5:08 - 5:10
    maybe over a decade?"
  • 5:10 - 5:11
    You can ask even bigger questions
  • 5:11 - 5:14
    because it will be semantically coded.
  • 5:14 - 5:16
    And then what you can do is put that in space,
  • 5:16 - 5:18
    because many of these information are special.
  • 5:18 - 5:20
    And from that, you can do things like
  • 5:20 - 5:22
    reconstructing this extraordinary journey
  • 5:22 - 5:25
    of that city which managed to
    have a sustainable development
  • 5:25 - 5:27
    over a thousand years,
  • 5:27 - 5:29
    managing to have all the time
  • 5:29 - 5:32
    form of equilibrium with its environment.
  • 5:32 - 5:33
    You can reconstruct that journey,
  • 5:33 - 5:36
    visualize in many different ways.
  • 5:36 - 5:38
    But of course, you cannot understand Venice
  • 5:38 - 5:39
    if you just look at the city.
  • 5:39 - 5:41
    You have to put it in a larger European context.
  • 5:41 - 5:44
    So the idea is also to document all the things
  • 5:44 - 5:47
    that worked at the European level.
  • 5:47 - 5:49
    We can reconstruct also the journey
  • 5:49 - 5:51
    of the Venetian maritime empire,
  • 5:51 - 5:54
    how it progressively controlled the Adriatic Sea,
  • 5:54 - 5:58
    how it became the most powerful medieval empire
  • 5:58 - 5:59
    of its time,
  • 5:59 - 6:01
    controlling most of the sea routes
  • 6:01 - 6:04
    from the east to the south.
  • 6:06 - 6:08
    But you can even do other things,
  • 6:08 - 6:10
    because in these maritime routes,
  • 6:10 - 6:12
    there are regular patterns.
  • 6:12 - 6:15
    You can go one step beyond
  • 6:15 - 6:17
    and actually create a simulation system,
  • 6:17 - 6:20
    create a Mediterranean simulator
  • 6:20 - 6:22
    which is capable actually of reconstructing
  • 6:22 - 6:24
    even the information we are missing,
  • 6:24 - 6:27
    which would enable to have questions you could ask
  • 6:27 - 6:30
    like if you were using a route planner.
  • 6:30 - 6:33
    "If I am in Corfu in June 1323
  • 6:33 - 6:35
    and want to go to Constantinople,
  • 6:35 - 6:38
    when can I take a boat?"
  • 6:38 - 6:39
    Probably we can answer this question
  • 6:39 - 6:44
    with one or two or three days' precision.
  • 6:44 - 6:45
    "How much will it cost?"
  • 6:45 - 6:49
    "What are the chance of encountering pirates?"
  • 6:49 - 6:51
    Of course, you understand,
  • 6:51 - 6:53
    the central scientific challenge
    of a project like this one
  • 6:53 - 6:57
    is qualifying, quantifying, and representing
  • 6:57 - 7:00
    uncertainty and inconsistency
    at each step of this process.
  • 7:00 - 7:03
    There are errors everywhere,
  • 7:03 - 7:06
    errors in the document, it's
    the wrong name of the captain,
  • 7:06 - 7:09
    some of the boats never actually took to sea.
  • 7:09 - 7:10
    There are errors in translation,
  • 7:10 - 7:14
    errors in traduction, interpretative biases,
  • 7:14 - 7:17
    and on top of that, if you add algorithmic processes,
  • 7:17 - 7:20
    you're going to have errors in recognition,
  • 7:20 - 7:22
    errors in extraction,
  • 7:22 - 7:27
    so you have very, very uncertain data.
  • 7:27 - 7:30
    So how can we detect and
    correct these inconsistencies?
  • 7:30 - 7:34
    How can we represent that form of uncertainty?
  • 7:34 - 7:36
    It's difficult. One thing you can do
  • 7:36 - 7:38
    is document each step of the process,
  • 7:38 - 7:41
    not only coding the historical information
  • 7:41 - 7:43
    but what we call the meta-historical information,
  • 7:43 - 7:46
    how is historical knowledge constructed,
  • 7:46 - 7:48
    documenting each step.
  • 7:48 - 7:50
    That will not guarantee that we actually converge
  • 7:50 - 7:52
    toward a single story of Venice,
  • 7:52 - 7:54
    but probably we can actually reconstruct
  • 7:54 - 7:57
    fully documented potential story of Venice.
  • 7:57 - 7:59
    Maybe there's not a single map.
  • 7:59 - 8:01
    Maybe there are several maps.
  • 8:01 - 8:03
    The system should allow for that,
  • 8:03 - 8:06
    because we have to deal with
    a new form of uncertainty,
  • 8:06 - 8:10
    which is really new for this type of giant databases.
  • 8:12 - 8:13
    And how should we communicate
  • 8:13 - 8:16
    this new research to a large audience?
  • 8:16 - 8:20
    Again, Venice is extraordinary for that.
  • 8:20 - 8:22
    With the millions of visitors that comes every year,
  • 8:22 - 8:24
    it's actually one of the best places
  • 8:24 - 8:27
    to try to invent the museum of the future.
  • 8:27 - 8:30
    Imagine, horizontally you see the reconstructed map
  • 8:30 - 8:31
    of a given year,
  • 8:31 - 8:34
    and vertically, you see the document
  • 8:34 - 8:36
    that served as the reconstruction,
  • 8:36 - 8:39
    paintings, for instance.
  • 8:39 - 8:41
    Imagine an immersive system that permits
  • 8:41 - 8:45
    to go and dive and reconstruct
    the Venice of a given year.
  • 8:45 - 8:48
    Some experience you could share within a group.
  • 8:48 - 8:50
    On the contrary, imagine actually that you start
  • 8:50 - 8:52
    from a document, a Venetian manuscript,
  • 8:52 - 8:55
    and you show, actually, what
    you can construct out of it,
  • 8:55 - 8:57
    how it is decoded,
  • 8:57 - 9:00
    how the context of that document can be recreated.
  • 9:00 - 9:01
    This is an image from an exhibit
  • 9:01 - 9:04
    which is currently conducted in Geneva
  • 9:04 - 9:06
    with that type of system.
  • 9:06 - 9:08
    So to conclude, we can say that
  • 9:08 - 9:11
    research into humanities is about to undergo
  • 9:11 - 9:13
    an evolution which is maybe similar
  • 9:13 - 9:18
    to what happened to life science 30 years ago.
  • 9:18 - 9:20
    It's really
  • 9:20 - 9:22
    a question of scale.
  • 9:22 - 9:26
    We see projects which are
  • 9:26 - 9:30
    much beyond any single research team can do,
  • 9:30 - 9:32
    and this is really new for the humanities,
  • 9:32 - 9:36
    which are very often take the habit of working
  • 9:36 - 9:40
    in small groups or only with a couple of researchers.
  • 9:40 - 9:41
    When you visit the Archivio di Stato,
  • 9:41 - 9:44
    you feel this is beyond what any single team can do,
  • 9:44 - 9:46
    and that should be a joint
  • 9:46 - 9:48
    and common effort.
  • 9:48 - 9:51
    So what we must do for this paradigm shift
  • 9:51 - 9:54
    is actually foster a new generation
  • 9:54 - 9:55
    of "digital humanists"
  • 9:55 - 9:57
    that are going to be ready for this shift.
  • 9:57 - 9:59
    I thank you very much.
  • 9:59 - 10:03
    (Applause)
Title:
How I built an information time machine
Speaker:
Frederic Kaplan
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
10:20

English subtitles

Revisions Compare revisions