slides - Disco Lab
Download
Report
Transcript slides - Disco Lab
FreeNet: A Distributed Anonymous
Information Storage and Retrieval System
Ian Clark, Oskar Sandberg, Brandon Wiley and Theodore Hong
FreeNet
• P2P network for anonymous publishing and
retrieval of data
–
–
–
–
–
Decentralized
Nodes collaborate in storage and routing
Data centric routing
Adapts to demands
Addresses privacy & availability concerns
Motivation
• Problem - Querying the network
– Source - Requestor
– Destination – Provider
• It’s a distributed search problem
– Approximating global knowledge with local
knowledge
– Other systems – Chord, Tapestry, Pastry
• Privacy and availability
– Protect authorship, prevent denial attacks
Goals of Freenet
•
•
•
•
•
Anonymity for producers and consumers
Deniability for information storers
Resistance to denial attacks
Efficient storing and routing
Does NOT provide
– Permanent file storage
– Load balancing
– Anonymity for general n/w usage
Architecture
Request:
1. key
2. Hops to live
3. ID
4. Depth
•
•
•
•
Each node – local data store + routing table
Request file through location independent keys
Routing - chain of proxy requests - decision is local
Graph structure actively evolves over time
Key Based Searching
•Keyword signed key(KSK)
•Easy for retrieval – only need ‘D’
•Minimal protection against tampering
D
‘D’– key generation Pb + Pr ; SHA(Pb)
FILE
+ Pr
E(FILE, D)
Encrypted FILE
Signature
KSK
Keys and Searching…..
• Problems with KSK – flat namespace (collisions),
key squatting, dictionary attacks
• Signed Subspace Key (SSK)
Randomly generated key pair namespace ID
SSK = SHA(‘D’) ^ SHA(Pb)
(-)Advertisement – subspace Pb + ‘D’
(+)Owner can construct hierarchical space of arbitrary
depth - using indirect files
– (+)Reduces collision greatly
–
–
–
–
Keys and Searching…
• Problems with SSK - updating, versioning
• Content Hash Keys (CHK)
– Encrypted by a random encryption key
– Publish CHK + decryption key
– CHK + SSK easily updateable files
• 2 step process – publish file, publish pointer
• Results in pointers to newer version
• Older versions accessed thru CHK
– Can be used for splitting files
Retrieving Files
• How do u locate the keys?
– Hypertext spider
– Indirect files – published with KSK of search words
– Publish bookmarks
• File retrieval
– Request forwarded to node in RT with closest
lexicographic match for the binary key
– Request routing follows steepest-ascent hill
climbing: first choice failure backtrack
second choice
Still Retrieving….
c
a
b
f
e
d
• Timers, hops - curtail request threads
• Files cached all along the retrieval path
• Self-reinforcing cycle – results in key expertise
Ring Topology
•1000 nodes in ring topology
•Datastore = 50 items
•RT = 250 items
•Keys associated with links
are hash of destn IPs
Self Reinforced Routing
• Snapshots using 300
requests with hops = 500
• As network converges it
drops to 6 - “six degrees
of separation”
Retrieval Discussion
• No controlled replication no persistence
• No correlation between keys and content
– (+) Documents related to a subject are scattered
• Geographical fault resilience
– (-) No spatial locality – search latencies can suffer
• Building indexes by other means
Publishing
• Similar to retrieval but, 2 step process
– Detect collisions – ‘all clear’ if no collision
– Publish to node in RT with closest key match
• Are CD and publish paths same?
– Can result in collision during publish step
• Inserts allow new nodes to advertise themselves
• (+) Key-squatting is not effective
Data Management
• Finite data stores - nodes resort to LRU
• Routing table entries linger after data eviction
• Outdated (or unpopular) docs disappear
automatically
• Bipartite eviction – short term policy
– New files replace most recent files
– Prevents established files being evicted by attacks
Network Growth
• New nodes have to know one or more guys
• Problem: How to consistently decide on what
key the new node specializes in?
– Needs to be consensus decision – else denial attacks
• Advertisement IP + H(random seed s0)
– Commitment - H(H(H(s0) ^ H(s1)) ^ H(s2))…….
– Key for new node = XOR of all seeds
• Each node adds a RT entry for the new node
Network Growth
• Key assigned to new
nodes = H(IP)
• Scales as log(n) until
n ~ 40000
• At 40000, RTs are full
Protocol
• Nodes with frequently changing IPs use ARKs
• Return address specified in requests – threat?
• Messages do not always terminate when hopsto-live reaches 1
• Depth is initialized by original requestor to
arbitrarily small value
• Request state maintained at each node – timers
- LRU
Fault Resilience
• Median path length
< 20 at 30% node
failures?
• N/w becomes
ineffective at 40%
failures ???
Small World
• Most nodes form local
clusters
• Few high link connecting
nodes
• Power law distribution
provides high degree of
fault tolerance
Security Concerns
• Pre- routing – mesg. encrypted by public keys which determine
path of pre-routing
• Protecting data source – using random and probabilistic methods
Security
• File integrity - KSK vulnerable to dictionary
attacks
• DOS attacks – Hash Cash to slow down
• Attempts to displace valid files are constrained
by the insert procedure
Conclusion
• Provides a n/w to anonymously store and request files
• Adaptive routing who’s efficiency increases with
experience
• Deals with privacy and data integrity in various
scenarios
• Applications?
– Freedom of speech
– Unaccountable, decentralized Napster