Fenton_D2S1_Overview

Download Report

Transcript Fenton_D2S1_Overview

Overview of LOCKSS
Session Learning Objectives




Provide an overview of the LOCKSS
architecture.
Describe the LOCKSS polling
process
Describe how LOCKSS private
networks differ.
Provide a vocabulary of technical
terms used frequently with LOCKSS
networks
Architectural Components





Provider Sites (digital collections)
LOCKSS nodes (aka “peers”)
Plugins / Plugin Repository
Cache Manager
Title Database / Conspectus
Database
Provider Sites


Prepare a digital collection so that it is
web accessible to the preservation nodes
Expose a “manifest” web page for each
collection, according to LOCKSS
specifications.



Grants permission for LOCKSS to crawl
Gives starting point for crawl
Provide information sufficient to create a
LOCKSS plugin for the collection (or else
create the plugin themselves and reposit
that plugin with the LOCKSS network)
LOCKSS Peer Nodes




Data caches for harvested content
Caches organized into archival units
(AUs)
Nodes can select which AUs to crawl
and preserve
There must be >= 6 copies of an
AU in order for the polling process
to work properly
Plugins / Plugin Repository



Tell LOCKSS where, how and how
often to crawl a provider site for
AUs
Plugins are Java based
Distinct from core LOCKSS software
Cache Manager


Distributed separately from LOCKSS
Can remotely inspect and manage
the caches on the various peer
nodes
Title / Conspectus Databases


Title database on each node
describes and manages which AUs
to preserve on that node
Conspectus Database designed for
MetaArchive Project, provides more
extensive metadata about the
preserved digital collections, and
feeds the Title database with entries
Digital Collection 1
AU 1
DC1
Plugin
Repository
DC1
DC2
DC2
Private LOCKSS
Network Nodes
DC2
1
Web Site
AU 2
2
Manifest
page
DC1
3
DC1
DC2
4
DC2
Digital Collection 2
AU 1
5
AU 2
Source
Code
DC1
Web
Site
DC2
DC1
AU 3
SQL Dump
DC2
Manifest
page
DC2
9
8
7
DC1
6
The Polling Process
Polling Process resulting in
“landslide loss”, AU repair
DC2-AU1
SHA1
There is a “landslide” of
valid, disagreeing votes
against the Node 5’s SHA1
digest of DC2-AU1
SHA1
participate in poll
2
4
Once repair is completed, Node
5 immediately calls a new poll,
which effectively verifies, or
invalidates and corrects, the
repair
DC2-AU1
SHA1
1
Node 5 discovers
new peers through
nomination process
DC2-AU1
Invited nodes
create
Pollfresh
Effort Proof is
Affirmative
SHA1cryptographically
digest
of
PollChallenge
the
AU and sent
derived
message
responses
to
affirmative
allow that inner
voter’s
circle challenges
node to
Node 5 calls
poll on AU 1
of Digital
Collection 2
DC2-AU1
SHA1
5
Node 5 invites some
recently encountered peers
to vote.
Since agreeing
(Eachvotes
node are
maintains
below a
reference
threshold,
list Node
of the5
recently
picksencountered
a random
disagreeing
peers) voter
from the inner circle
Those invited are the
“inner circle” for this
opinion poll.
DC2-AU1
SHA1
9
DC2-AU1
Nominated Nodes 7 and 8
belong to the “outer circle”,
can be invited to subsequent
voting rounds by Node 5
DC2-AU1
8
7
Polling Refresh Timer


A peer sets a refresh timer for a
given AU to determine the interval
between successive polls
System parameter R is the mean for
the possible random values
generated for the refresh timer
System Parameter – ‘Quorum’



Q = # of valid inner circle votes
required to conclude a poll
successfully
Q = 6 is the thoroughly tested value
in use
If votes < Q, poller invites
additional peers, or else aborts the
opinion poll
Polling Outcome – ‘Landslide Win’



The poller considers its current copy
to have integrity
This is the only scenario in which an
opinion poll concludes successfully
The poller updates its reference list
and then waits until the next polling
period (determined by the refresh
timer)
Reference List Update




Happens only after a successful poll
Poller removes the inner circle peers
who had valid votes in the last
opinion poll
Culls peers it has not been able to
contact for some time
Adds outer circle peers whose votes
were valid and eventually agreeing
Polling Outcome - Inconclusive






D = max allowed “minority” votes
If Agreeing Votes > D, and
Agreeing Votes < Total valid votes – D,
Then the poll is inconclusive, raises alarm
Human intervention needed to determine
if nodes have been compromised
Peers voting in agreement with a known
bad copy are blacklisted if that peer node
can’t be identified or it won’t cooperate
Further Details on Polling Process


Petros Maniatis, Mema Roussopoulos, TJ Giuli, David
S. H. Rosenthal, Mary Baker, and Yanto Muliadi,
"LOCKSS: A Peer-to-Peer Digital Preservation
System", ACM Transactions on Computer Systems
(TOCS).
http://www.eecs.harvard.edu/~mema/publications/
TOCS2005.pdf
See also LOCKSS related publications at
http://www.lockss.org/lockss/Publications
The LOCKSS Private Network
Difference

More flexible (not appliance based)

Can run on any operating system that
supports Java


LOCKSS Team maintains rpm packages
for Linux installations
Peer Node administrators have greater
discretion configuring access,
customizing functionality, e.g. altering
system parameters
The LOCKSS Private Network
Difference (cont.)


Can extend LOCKSS core
functionality with supplemental
tools and methods to fit new use
cases
E.g. the MetaArchive Conspectus
database
Vocabulary

(Please refer to the workshop
binder for terminology and
definitions)
Overview of LCAP version 3