Archive your Domain

0:01 - 0:03

This is a Web Archiving Service video tutorial
0:03 - 0:07

on how to archive your domain or large website.
0:07 - 0:10

Before you archive your domain or large website,
0:10 - 0:13

you will want to set up a broad test capture.
0:13 - 0:15

Running this test capture will help you
0:15 - 0:20

get organized before you actually start archiving items.
0:20 - 0:22

Start at your homepage.
0:22 - 0:26

In this case, we will be using "berkeley.edu".
0:26 - 0:31

Check for any robots.txt exclusions.
0:31 - 0:34

To do this, go to the URL you want to capture,
0:34 - 0:40

and add "/robots.txt" at the end.
0:40 - 0:43

In this case, berkeley.edu is asking us
0:43 - 0:45

to run the crawls slowly,
0:45 - 0:49

which the WAS crawler will do automatically.
0:49 - 0:52

For more information on robots.txt exclusions,
0:52 - 0:56

check out our tutorial video on rights management.
0:56 - 1:00

Once you have checked for any robots.txt exclusions,
1:00 - 1:03

it's time to create your site.
1:03 - 1:05

From the WAS homepage, click "Create site,"
1:05 - 1:12

input a name and your seed URL.
1:12 - 1:15

Choose "Host Site" for your scope,
1:15 - 1:18

and click "Yes" for "Capture Linked Pages."
1:18 - 1:21

Change your maximum time to a full capture
1:21 - 1:23

of 36 hours.
1:23 - 1:32

Click "Save" and run your crawl.
1:32 - 1:34

You will receive an email when your crawl is complete.
1:34 - 1:39

Log back into the WAS site to view your capture results.
1:39 - 1:47

You may use the "Reports" tab to view the "Hosts" report.
1:47 - 1:50

This report every host name that came up
1:50 - 1:54

during the capture of your site.
1:54 - 1:57

Similar information can also be found
1:57 - 1:59

under the "Related Sites" tab.
1:59 - 2:01

This list can be narrowed to make it easier
2:01 - 2:04

to find relevant sites.
2:04 - 2:05

For example, in this case,
2:05 - 2:07

it is advantageous to search for hosts
2:07 - 2:12

with a URL containing the word "berkeley."
2:12 - 2:13

There are two ways to quickly add
2:13 - 2:17

these related sites to your archive.
2:17 - 2:19

One option is to use the WAS browser button
2:19 - 2:22

to add sites individually.
2:22 - 2:24

The WAS browser button can be found
2:24 - 2:25

under the "Sites" tab.
2:25 - 2:29

You only need to install the button once.
2:29 - 2:31

Once the browser button is installed,
2:31 - 2:33

simply click on any individual link
2:33 - 2:35

on the list of related sites,
2:35 - 2:42

and add it to WAS.
2:42 - 2:43

Then choose whether you would like to
2:43 - 2:45

add the URL to a newly created site,
2:45 - 2:49

or to a site from the dropdown menu.
2:49 - 2:51

The second option for quickly adding
2:51 - 2:53

related sites to an archive
2:53 - 2:55

involves working together with the WAS helpdesk
2:55 - 2:58

to batch import site information
2:58 - 3:01

using a CSV file.
3:01 - 3:03

To do this, download the related sites list
3:03 - 3:05

in a comma-delimited format
3:05 - 3:09

from the "Related Sites" page.
3:09 - 3:11

After the CSV file has been downloaded,
3:11 - 3:14

open it in Excel and provide site names
3:14 - 3:18

for the URLs you want to add to your project.
3:18 - 3:19

WAS is happy to provide you with
3:19 - 3:21

a template for these purposes
3:21 - 3:22

if you are interested.
3:22 - 3:25

Simply contact the helpdesk.
3:25 - 3:27

Once you are done creating your CSV file,
3:27 - 3:30

contact the helpdesk to import the file to your project.
3:30 - 3:34

Once you are satisfied with your test project,
3:34 - 3:36

and the linked pages you have found,
3:36 - 3:38

chances are you will want to change the settings
3:38 - 3:41

so that you are not capturing linked pages any more.
3:41 - 3:43

This will eliminate the high number
3:43 - 3:45

of irrelevant pages that may otherwise be captured.
3:45 - 3:51

These techniques will get you a good start
3:51 - 3:52

on archiving your domain or large website,
3:52 - 3:54

but the archivist will still need
3:54 - 3:56

to do some leg-work to ensure that the project
3:56 - 3:58

is comprehensive.
3:58 - 4:01

This has been a Web Archiving Service video tutorial
4:01 - 4:04

on how to archive your domain or large website.
4:04 - 4:05

As always, if you have questions,
4:05 - 4:09

feel free to contact us at washelp@ucop.edu.

Title:: Archive your Domain
Description:: This Web Archiving Service video tutorial will help you understand how to archive your domain or large website.

more » « less
Video Language:: English

cpwillett edited English subtitles for Archive your Domain

English subtitles

Revisions

Revision 1 Edited (legacy editor)

cpwillett

Archive your Domain

Revisions

Our website uses cookies

Operating cookies (Required)