Return to Video

Archive your Domain

  • 0:01 - 0:03
    This is a Web Archiving Service video tutorial
  • 0:03 - 0:07
    on how to archive your domain or large website.
  • 0:07 - 0:10
    Before you archive your domain or large website,
  • 0:10 - 0:13
    you will want to set up a broad test capture.
  • 0:13 - 0:15
    Running this test capture will help you
  • 0:15 - 0:20
    get organized before you actually start archiving items.
  • 0:20 - 0:22
    Start at your homepage.
  • 0:22 - 0:26
    In this case, we will be using "berkeley.edu".
  • 0:26 - 0:31
    Check for any robots.txt exclusions.
  • 0:31 - 0:34
    To do this, go to the URL you want to capture,
  • 0:34 - 0:40
    and add "/robots.txt" at the end.
  • 0:40 - 0:43
    In this case, berkeley.edu is asking us
  • 0:43 - 0:45
    to run the crawls slowly,
  • 0:45 - 0:49
    which the WAS crawler will do automatically.
  • 0:49 - 0:52
    For more information on robots.txt exclusions,
  • 0:52 - 0:56
    check out our tutorial video on rights management.
  • 0:56 - 1:00
    Once you have checked for any robots.txt exclusions,
  • 1:00 - 1:03
    it's time to create your site.
  • 1:03 - 1:05
    From the WAS homepage, click "Create site,"
  • 1:05 - 1:12
    input a name and your seed URL.
  • 1:12 - 1:15
    Choose "Host Site" for your scope,
  • 1:15 - 1:18
    and click "Yes" for "Capture Linked Pages."
  • 1:18 - 1:21
    Change your maximum time to a full capture
  • 1:21 - 1:23
    of 36 hours.
  • 1:23 - 1:32
    Click "Save" and run your crawl.
  • 1:32 - 1:34
    You will receive an email when your crawl is complete.
  • 1:34 - 1:39
    Log back into the WAS site to view your capture results.
  • 1:39 - 1:47
    You may use the "Reports" tab to view the "Hosts" report.
  • 1:47 - 1:50
    This report every host name that came up
  • 1:50 - 1:54
    during the capture of your site.
  • 1:54 - 1:57
    Similar information can also be found
  • 1:57 - 1:59
    under the "Related Sites" tab.
  • 1:59 - 2:01
    This list can be narrowed to make it easier
  • 2:01 - 2:04
    to find relevant sites.
  • 2:04 - 2:05
    For example, in this case,
  • 2:05 - 2:07
    it is advantageous to search for hosts
  • 2:07 - 2:12
    with a URL containing the word "berkeley."
  • 2:12 - 2:13
    There are two ways to quickly add
  • 2:13 - 2:17
    these related sites to your archive.
  • 2:17 - 2:19
    One option is to use the WAS browser button
  • 2:19 - 2:22
    to add sites individually.
  • 2:22 - 2:24
    The WAS browser button can be found
  • 2:24 - 2:25
    under the "Sites" tab.
  • 2:25 - 2:29
    You only need to install the button once.
  • 2:29 - 2:31
    Once the browser button is installed,
  • 2:31 - 2:33
    simply click on any individual link
  • 2:33 - 2:35
    on the list of related sites,
  • 2:35 - 2:42
    and add it to WAS.
  • 2:42 - 2:43
    Then choose whether you would like to
  • 2:43 - 2:45
    add the URL to a newly created site,
  • 2:45 - 2:49
    or to a site from the dropdown menu.
  • 2:49 - 2:51
    The second option for quickly adding
  • 2:51 - 2:53
    related sites to an archive
  • 2:53 - 2:55
    involves working together with the WAS helpdesk
  • 2:55 - 2:58
    to batch import site information
  • 2:58 - 3:01
    using a CSV file.
  • 3:01 - 3:03
    To do this, download the related sites list
  • 3:03 - 3:05
    in a comma-delimited format
  • 3:05 - 3:09
    from the "Related Sites" page.
  • 3:09 - 3:11
    After the CSV file has been downloaded,
  • 3:11 - 3:14
    open it in Excel and provide site names
  • 3:14 - 3:18
    for the URLs you want to add to your project.
  • 3:18 - 3:19
    WAS is happy to provide you with
  • 3:19 - 3:21
    a template for these purposes
  • 3:21 - 3:22
    if you are interested.
  • 3:22 - 3:25
    Simply contact the helpdesk.
  • 3:25 - 3:27
    Once you are done creating your CSV file,
  • 3:27 - 3:30
    contact the helpdesk to import the file to your project.
  • 3:30 - 3:34
    Once you are satisfied with your test project,
  • 3:34 - 3:36
    and the linked pages you have found,
  • 3:36 - 3:38
    chances are you will want to change the settings
  • 3:38 - 3:41
    so that you are not capturing linked pages any more.
  • 3:41 - 3:43
    This will eliminate the high number
  • 3:43 - 3:45
    of irrelevant pages that may otherwise be captured.
  • 3:45 - 3:51
    These techniques will get you a good start
  • 3:51 - 3:52
    on archiving your domain or large website,
  • 3:52 - 3:54
    but the archivist will still need
  • 3:54 - 3:56
    to do some leg-work to ensure that the project
  • 3:56 - 3:58
    is comprehensive.
  • 3:58 - 4:01
    This has been a Web Archiving Service video tutorial
  • 4:01 - 4:04
    on how to archive your domain or large website.
  • 4:04 - 4:05
    As always, if you have questions,
  • 4:05 - 4:09
    feel free to contact us at washelp@ucop.edu.
Title:
Archive your Domain
Description:

This Web Archiving Service video tutorial will help you understand how to archive your domain or large website.

more » « less
Video Language:
English

English subtitles

Revisions