-
This is a Web Archiving Service video tutorial
-
on how to archive your domain or large website.
-
Before you archive your domain or large website,
-
you will want to set up a broad test capture.
-
Running this test capture will help you
-
get organized before you actually start archiving items.
-
Start at your homepage.
-
In this case, we will be using "berkeley.edu".
-
Check for any robots.txt exclusions.
-
To do this, go to the URL you want to capture,
-
and add "/robots.txt" at the end.
-
In this case, berkeley.edu is asking us
-
to run the crawls slowly,
-
which the WAS crawler will do automatically.
-
For more information on robots.txt exclusions,
-
check out our tutorial video on rights management.
-
Once you have checked for any robots.txt exclusions,
-
it's time to create your site.
-
From the WAS homepage, click "Create site,"
-
input a name and your seed URL.
-
Choose "Host Site" for your scope,
-
and click "Yes" for "Capture Linked Pages."
-
Change your maximum time to a full capture
-
of 36 hours.
-
Click "Save" and run your crawl.
-
You will receive an email when your crawl is complete.
-
Log back into the WAS site to view your capture results.
-
You may use the "Reports" tab to view the "Hosts" report.
-
This report every host name that came up
-
during the capture of your site.
-
Similar information can also be found
-
under the "Related Sites" tab.
-
This list can be narrowed to make it easier
-
to find relevant sites.
-
For example, in this case,
-
it is advantageous to search for hosts
-
with a URL containing the word "berkeley."
-
There are two ways to quickly add
-
these related sites to your archive.
-
One option is to use the WAS browser button
-
to add sites individually.
-
The WAS browser button can be found
-
under the "Sites" tab.
-
You only need to install the button once.
-
Once the browser button is installed,
-
simply click on any individual link
-
on the list of related sites,
-
and add it to WAS.
-
Then choose whether you would like to
-
add the URL to a newly created site,
-
or to a site from the dropdown menu.
-
The second option for quickly adding
-
related sites to an archive
-
involves working together with the WAS helpdesk
-
to batch import site information
-
using a CSV file.
-
To do this, download the related sites list
-
in a comma-delimited format
-
from the "Related Sites" page.
-
After the CSV file has been downloaded,
-
open it in Excel and provide site names
-
for the URLs you want to add to your project.
-
WAS is happy to provide you with
-
a template for these purposes
-
if you are interested.
-
Simply contact the helpdesk.
-
Once you are done creating your CSV file,
-
contact the helpdesk to import the file to your project.
-
Once you are satisfied with your test project,
-
and the linked pages you have found,
-
chances are you will want to change the settings
-
so that you are not capturing linked pages any more.
-
This will eliminate the high number
-
of irrelevant pages that may otherwise be captured.
-
These techniques will get you a good start
-
on archiving your domain or large website,
-
but the archivist will still need
-
to do some leg-work to ensure that the project
-
is comprehensive.
-
This has been a Web Archiving Service video tutorial
-
on how to archive your domain or large website.
-
As always, if you have questions,
-
feel free to contact us at washelp@ucop.edu.