-
This is a Web Archiving Service video tutorial
-
on rights management.
-
This tutorial is a brief overview
-
of rights management as it relates to web archiving.
-
It is highly recommended that you utilize the additional information
-
we provide on the WAS help page.
-
We provide two very useful documents:
-
one on WAS rights management practices,
-
and one with answers to frequently asked questions about robots.txt.
-
There are two areas of copyright law
-
that are most relevant to web archiving.
-
Section 108 of the Copyright Act,
-
which relates to the limitations on exclusive rights
-
as they pertain to reproductions
-
by libraries and archives;
-
and Fair Use, addressed in Section 107
-
of the Copyright Act,
-
which states that permission
-
for reproduction and display of works
-
protected by copyright
-
isn't required in certain circumstances
-
such as scholarship and research.
-
Guidance for web archiving best practices
-
regarding copyright can be found
-
in two documents.
-
The first is the section 108 Study Group Report,
-
which contains recommendations
-
from copyright and library experts
-
representing libraries, copyright holders,
-
and others, on how copyright law's
-
library exceptions could be updated for the digital world.
-
The second is the Association of Research Libraries
-
2012 Code of Best Practices
-
in Fair Use for Academic and Research Libraries.
-
Links to these documents can be found
-
in the WAS rights management practices file.
-
WAS has distilled 9 assertions
-
regarding the rights of website owners
-
from the Section 108 Study Group Report
-
and the ARL Code of Best Practices.
-
This tutorial will go over each assertion
-
and explain how they inform WAS policies.
-
Assertion #1 states that libraries and archives
-
should have the right to capture and archive
-
publicly available websites without requesting
-
advanced permission.
-
WAS does not enforce any requirement
-
for proof of permission before capture,
-
and both the Study Group Report and
-
the Code of Best Practices support this policy.
-
Assertion #2 states that US Federal, State
-
and local government agencies,
-
political parties, political candidates and
-
political action committees
-
should not have the right to prevent
-
publicly available content from being harvested.
-
While WAS respects robots.txt exclusions by default,
-
curators do have the ability
-
to override these settings
-
in order to capture and archive content.
-
Assertion #3 states that claims of fair use
-
relating to material posted with
-
'bot exclusion' headers to ward off
-
automatic harvesting may be stronger
-
when the institution has adopted
-
and follows a consistent policy.
-
The Section 108 Study Group addressed
-
government and political sites specifically.
-
The ARL Code provides broader guidance
-
supporting WAS's practice of having default
-
settings that respect robots.txt exclusions,
-
that a curator is able to override.
-
In addition, WAS enforces consistent
-
requirements when curators choose to
-
override robots.txt.
-
Assertion #4 states that,
-
to the extent reasonably possible,
-
the legal proprietors of the sites
-
in question should be identified
-
according to the prevailing conventions
-
of attribution.
-
In order to override robots.txt exclusions,
-
WAS curators must provide
-
owner attribution for a site.
-
This owner attribution will appear
-
on the site details screen for any
-
archived website where robots.txt rules
-
have been overridden.
-
Assertion #5 states that libraries and archives
-
should be required to label prominently
-
all copies of captured online content
-
that are made accessible to users,
-
stating that the content is an archived copy
-
for use only for private study, scholarship and research
-
and providing the date of capture.
-
CDL's WAS service agreement expressly states,
-
"The Curatorial Partner recognizes Web Content
-
captured with WAS is for education purposes only
-
and may not be used for commercial purposes
-
by the Curatorial Partner or other third party."
-
In addtiion, all publicly displayed content
-
is marked "This document is an archived copy
-
for study and research."
-
Assertion #6 states that archived materials
-
are represented as they were captured,
-
with appropriate information on mode
-
of harvesting and date.
-
It is important to know that WAS does not
-
edit or alter content in any way.
-
In fact, CDL's preservation repository
-
has an ongoing fixity checking process
-
for all harvested web content,
-
to ensure that no changes have been made.
-
In addition, details of harvest and date
-
are available on the "Show Description" screen
-
of any publicly displayed archived file.
-
Assertion #7 states that an embargo of
-
a "reasonable period of time"
-
should be observed before archived materials
-
are made available to the general public.
-
WAS complies with this assertion
-
by enforcing an embargo between
-
the time that content is harvested and
-
when that archived material can become publicly accessible.
-
At the moment, this embargo time is 6 months.
-
Assertion #8 states that libraries and archives
-
should be prohibited in engaging
-
in any activies that are likely
-
to materially harm the value or operations
-
of the Internet site hosting the online content
-
that is sought to be captured and made available.
-
WAS service agreements with CDL's Curatorial Partners
-
clearly state that CDL may intervene or stop
-
a capture in progress based on an objection
-
raised by the content owner.
-
In addition, WAS crawlers are always run
-
with a significant delay, to avoid
-
impacting a site's performance.
-
Assertion #9 states that libraries
-
should provide copyright owners
-
with a simple tool for registering objections
-
to making items from such a collection
-
available online, and respond to such objections promptly.
-
WAS crawlers leave identification information
-
on each server that they interact with.
-
This includes a URL to a webpage
-
explaining the crucial role of web archiving
-
in preserving our cultural heritage and history.
-
This page contains a phone number, a webform,
-
and email address to contact CDL.
-
It is also possible for curators and CDL
-
to block specific archived files, directories
-
or all content from public view.
-
This provides the ability to block content
-
that has been requested to be taken down
-
without blocking an entire archive.
-
Selecting your desired robots.txt settings
-
can be easily done when you create a site.
-
Under the "Capture Settings" screen
-
you have the option of leaving the default
-
"yes" to honor robots.txt,
-
or to choose "no."
-
If you are interested in changing
-
the robots.txt settings for a previously
-
created site, find the site you wish
-
to adjust on your "Manage Sites" screen.
-
Choose "Edit," and then adjust the robots.txt
-
settings as desired, and click "Save."
-
This has been a Web Archiving Service
-
video tutorial on rights management.
-
As always, if you have questions,
-
contact us at washelp@ucop.edu