Privacy Issues in Peer-To
Download
Report
Transcript Privacy Issues in Peer-To
Privacy Issues in
Peer-To-Peer
Systems
Raj Dandage, Tim Gorton,
Ngozika Nwaneri, Mark Tompkins
[email protected]
4/26/01
Agenda
Introduction & Status Report
Definition of peer-to-peer
Privacy Concerns (Threat Model)
What do we care about?
Legal Issues affecting privacy on P2P systems
What it is, what is isn’t, what it used to be, what it should do
What does that law care about?
A few examples of current P2P systems
Analyze w.r.t. goals, privacy concerns, legal issues, etc.
Recommendations
Synthesis, and Conclusion
Status Report: Goals
develop criteria for evaluating peer-to-peer
applications and architectures with regard to
technical, business, and public policy goals
identify different peer-to-peer applications and
architectures
evaluate these applications and architectures in
terms of the goals set forth and privacy issues
explore legal issues surrounding p2p architectures
develop recommendations for the modification and
design of peer-to-peer systems in order to resolve
privacy concerns and encourage the design of
privacy-enhancing systems
What is P2P? What isn’t?
Old-school “P2P”
Usenet
DNS
WWW Hyperlinks
Today’s P2P
Leveraging a new Internet usage model
Transient connectivity at the “fringes”
Peer-to-Peer Defined
Peer-to-peer is NOT simply illegally sharing
copyrighted material.
Peer-to-peer computing is sharing of
computer resources and services by direct
exchange. It is about decentralized
networking applications. The “litmus test” for
peer-to-peer:
“does it allow for variable connectivity and
temporary network addresses?
does it give the nodes at the edges of the network
significant autonomy?”
Clay Shirky in Peer-to-Peer
Peer-to-Peer: Hybrid Systems
Hybrid Systems (brokered peer-to-peer
system) uses a centralized server to connect
to computers together before a direct
exchange takes place.
Repeater – someone who publicly shares files that
they are not authors of; Republishing someone
else’s work.
Metadata - the collection of information from
various sources, related and managed in a central
directory for the use of linkage and file sharing.
Privacy Concerns (Threat Models)
Anonymity
… of your identity
… of your online activity
… of your publications
Authentication
Access to your data
data on your local machine
data transmitted on the ‘net
Possible “Attackers”
Malicious hacker
Governments (court order, wiretapping)
Employers
ISP’s
Operators of P2P systems (ex Napster)
Another everyday user
Legal Issues affecting P2P privacy
Arenas of Concern
Copyright
Libel
Censorship (more political than legal)
Who is liable/in danger?
ISP’s?
Service operators?
Individual developers?
End users?
Copyright
Direct Infringement
Contributory Infringement
when end users do Bad Things
Some act of direct infringement by someone else
Defendant “knew or should have known” of infringement
Defendant “materially contributed” to infringement
Vicarious Infringement (Napster)
Some act of direct infringement by someone else
Defendant had the “right or ability to control” the infringer
Defender derived a “direct financial benefit” from the
infringement (Napster has no business model.)
Digital Millennium Copyright
Act of 1998 (DMCA)
Prohibits “circumvent[ing] a technological measure
that effectively controls access to a work protected
under this title”
Exempts “service providers” from copyright liability if:
they block copyrighted material after they are notified by a
copyright holder,
they identify an infringing user to a copyright holder upon
being issued a subpoena,
and they don’t interfere with “standard technical measures”
used to protect or identify copyrighted material
Who are “service providers”?
“an entity offering the transmission, routing,
or providing of connections for digital online
communications, between or among points
specified by a user, of material of the user's
choosing, without modification to the content
of the material as sent or received.” sec 512
(k)(1)
Also “provider of online services or network
access, or the operator of facilities therefor”
ISP’s, P2P system operators… end users?
Libel: CDA
CDA immunizes providers and users of “interactive
computer systems” from being treated as speakers or
publishers of information provided by a 3rd party
“‘interactive computer system’ means any information
service, system, or access software provider that
provides or enables computer access by multiple users
to a computer server, including specifically a
service
or system that provides access to the Internet and
such systems operated or services offered by libraries
or educational institutions.”
so… your computer might be a “server”
Censorship
Subverting censorship of authoritarian governments
by providing anonymous publication is a stated goal
of several P2P systems
Examples of authoritarian governments:
Australian law would make supplying R-rated material
illegal
US Courts have ruled that the DMCA makes supplying the
DeCSS code or linking to a site that supplies the DeCSS
code illegal
Naturally, there are others…
Who’s in legal trouble?
P2P system operators
ISP’s
Users’ copyright violations--ISP’s must disable access when
notified by copyright holder
P2P system developers
Must disable access when notified of copyright infringement,
may serve as a circumvention of a TPM as per DMCA
DMCA: they may produce TPM circumvention technology
P2P users
They’re often doing Bad Things. But what if they’re just
forwarding content, perhaps unknowingly? Libel? Copyright?
Targeted by authoritarian regimes?
Example P2P Systems
Possible threats to privacy and usability
Example P2P systems/protocols:
What is it?
How does it work?
What are its business and public policy
goals?
How does it address the threats in our
model?
Possible Privacy Threats to
P2P Systems
Monitoring of transactions
Tracking systems placed on network
Monitoring of data at or going through a node
Manipulation of transactions
Forgery of data
Filtration of transaction information
Impersonation and misrepresentation
Identification of individuals or nodes
Legal action
Social pressure and external action
Possible Usability Threats to
P2P Systems
Denial of service
Unreliability and transient availability of
resources
Blocking of access to network resources
Malicious content
Firewalls
NATs
Viruses
Freeloading and inequitable use of resources
Example P2P Applications and
Networks
Napster
Gnutella (BearShare)
SETI@home
Freenet (Espra)
FreeHaven
Mojo Nation
Jabber / AOL Instant Messenger
Groove.net
Napster: What is it?
“The largest, most diverse online
community of music lovers in history."
A file transfer system for music lovers to
search for and trade mp3’s
Also features:
user hotlist
chatrooms
instant messaging
Napster: How does it work?
“hybrid” P2P architecture
centralized server takes all file requests,
searches dynamically updated database
server brokers connections between clients
for decentralized downloads
Napster: Original Business
and Public Policy Goals
create an easy way to search for and
share music for free over the internet
take advantage of latent disk space on
edges of internet
avoid copyright issues by having each
user responsible for their own content
Napster: Current Business
and Public Policy Goals
Avoid lawsuits!
Metallica
Filename filtering
Monthly fee?
Get musicians on their side
“empower yourself!”
Get activists on their side
Napster Action Network
Napster: How does it address
the threats in our model?
Monitoring of transactions, identifying
individuals
Tracking programs
Users can log usernames/files downloaded from
them
Possible to search entire shared file directory of a
user (hotlist)
Impersonation and misrepresentation
Only one username per program – cannot change
Napster: How does it address the
threats in our model? (cont’d)
Legal action
Denial of Service Attack
Very vulnerable, as we have seen
Would prevent searches, but not file
transfers
Malicious Content
Everything is mp3 format
Gnutella: What is it?
A protocol, not an actual program
Completely decentralized architecture –
“pure” P2P
Used for file transfer
Open source, so many other programs have
built off of it
BearShare
LimeWare
GnuFrog
Gnutella: How does it work?
Works like the real world (gossip, wordof-mouth)
Makes initial connection to other hosts
in cache (ping)
Broadcasts, propagates queries to these
hosts
Responses travel back along same path
Connects directly to transfer files
Gnutella: How does it work?
(cont’d)
Gnutella: Business and Public
Policy Goals
“internet on top of the internet”
Decentralization
New real-time search engine model
No single point of failure
Open source code
Allows for new innovations, freelance
application development
Gnutella: How does it address
the threats in our model?
Monitoring of Transactions, Identification
Tracking programs
Users can see requests passed through their node,
but not original sender
Users can log IP’s of nodes with whom they
transfer files
Zeropaid.com’s Wall of Shame
Legal Action
Who can copyright holders realistically sue?
Gnutella: How does it address the
threats in our model? (cont’d)
Denial of Service Attacks
Unreliability of resources
Malicious content
Finding initial group of peers
Mandragore scare
Know what you’re downloading
Trust who you’re downloading from
Freeloading
Increases the length of search requests
Some software, like LimeWare, allows users to
have “preferences” to nodes who are also sharing
material
Gnutella: Scalability Issues and
Bandwidth Inequity
Clip2 Reflectors – “super peers”
Gnutella: Scalability Issues and
Bandwidth Inequity (cont’d)
BearShare v. 3.0.0 Alpha
3 modes
Client (low bandwidth)
Server/Defender (high bandwidth)
Peer (normal)
Centralizes system somewhat, provides
targets, but increases efficiency
Copyright Violation Trackers
on Napster and Gnutella
Copyright Agent
Roy Orbison fans beware!
Media Tracker
Masquerades as a user
Logs IP’s, ISP’s, files
Operated from outside US, so not subject
to US privacy laws
Monitoring of Transactions on
Napster and Gnutella (cont’d)
Screenshot of Media-Tracker
SETI@home: What is it?
Allows PC owners to help in the search
for extraterrestrial intelligence
Free screensaver, analyzes radio
telescope data when PC is idle
SETI@home: How does it
work?
Not “pure” P2P
Central server sends data to hosts
Hosts compute FFT’s on data, send results
back to server
No inter-host communication
Example of how processing power can
be shared among computers
SETI@home: What are its
business and public policy goals?
Find more aliens in less time
Create a community of extraterrestrial
enthusiasts using a participatory
medium
Other possible applications for
distributed computing
Code breaking
Genetic analysis
SETI@home: How does it
address the threats in our model?
Manipulation of Transactions
Doctored versions
Trying to find better ways to compute FFT’s
No open source code
Doctored result files
Encryption, checksums
SETI@home: How does it address
the threats in our model? (cont’d)
Identification of individuals or nodes
Denial of Service
Unreliability of resources
Redundant data units distributed
Malicious content
Downloads data, not executables
Freenet: What is it?
Distributed, decentralized, anonymous
publishing system
Like one enormous, shared hard drive
Freenet: How does it work?
Every data has a key
Need to know key to access data
No effective search mechanism yet
Key search: uses a depth-first search along
nodes
If a node does not have a key, it directs to node
with “closest” key
Unique ID’s, routing data back, nodes cache data
along way
more scalable, efficient than broadcast – routes
you closer each hop
Freenet: How does it work?
(cont’d)
Every node allocates space to be used by
network
Cannot update files
Sends key request w/ unique ID
InsertRequest
Checks if data already exists
DataRequest
If next node contains key, returns data along
same path
If not, finds the “closest key”, forwards to
that node
Freenet: How does it work?
(cont’d)
Key/data stack model
Freenet: What are its business
and public policy goals?
Prevent censorship of documents
Provide anonymity of users
Plausible deniability for node operators
Must trace back requests through every
node in path
Remove any single point of control
Keep most requested data, not most
“acceptable” data
Freenet: How does it address
the threats in our model?
Monitoring of transactions
Manipulation of transactions
Hard unless you have control of many nodes
Attacker cannot forge data or update it
Every node checks key for validity of document
while it is being forwarded back
Impersonation and misrepresentation
No way to know where data comes from anyway
Identification of individuals or nodes
Legal action
Plausible deniability for requests
Raj’s pictures
FreeHaven: What is it?
Network that allows users to publish
documents
Provides anonymity, server
accountability, and equitability of
resource distribution
FreeHaven: How does it work?
Distributed network of servers
Servers communicate through anonymous
channels, such as reply blocks sent via remailers
Data enters and propagates through the
network through the process of trading
Files are divided into pieces and distributed
among servers, only a subset of which are
needed to reconstruct the file
All data is encrypted and signed before
transfer or storage
FreeHaven: What are its business
and public policy goals?
Business goals
To be used in conjunction with services such as
FreeHaven to provide long-term, popularity
independent data storage
Public policy goals
Anonymity of author, publisher, reader, document,
server, and query
System accountability (as opposed to user
accountability)
Equity of resource distribution
FreeHaven: How does it address
the threats in our model?
Monitoring of transactions
Manipulation of transactions
All FreeHaven traffic is encrypted in transit and in
storage
Document requests are forwarded through the
system via anonymous re-mailers
All data segments are signed
Only a subset of the segments are required to
reconstruct the data
Impersonation and misrepresentation
FreeHaven: How does it address
the threats in our model? (cont’d)
Identification of individuals or nodes
Author/publisher anonymity through trading
Server anonymity through pseudonyms and
anonymous communication via re-mailer reply
blocks
Legal action, social pressure, external action
No central authority to be held accountable
“Plausible deniability:” server does not know what
data it is storing or what is being requested
Only a subset of the servers must be available to
reconstruct the data
Data cannot be revoked from the network
FreeHaven: How does it address the
threats in our model? (Cont’d)
Denial of service, unreliability of resources
Only a subset of the servers must be available to
reconstruct data
Accountability mechanisms for servers
Blocking of access to network resources
Malicious content
Freeloading and inequitable resource use
Must donate space to publish data
Mojo Nation: What is it?
Distributed, micro-payment based
publishing/resource distribution system
Resource consumers and providers
make “capitalist” exchanges of
resources (storage space, computation)
Mojo Nation: How does it
work?
Content trackers keep list of content
pieces and addresses of nodes that
have them
Query different nodes until you have all
of the parts needed to reconstruct the
file
Mojo Nation: What are its business
and public policy goals
Business goals
Public policy goals
Mojo Nation: How does it address
the threats in our model?
Monitoring of transactions
Manipulation of transactions
Impersonation and misrepresentation
Identification of individuals
May be addressed in future by payment for “hops”
over a number of nodes, but not currently
addressed
Legal action
“Plausible deniability” because server does not
have enough of a document to reconstruct it
Jabber/AIM: What are they?
Instant messaging platforms
Jabber provides universal connectivity
to other IM services, including AIM,
ICQ, MSN Messenger
Jabber designed as protocol to allow for
person-to-person as well as app-to-app
communication
Jabber/AIM: How do they
work?
AIM
Client/server: almost all data relayed through AOL
servers
Jabber
Distributed system of servers, each presiding over
a namespace
When a server receives a message, it will forward
it to its peers if recipient not in its namespace
Communicate via XML or proprietary protocols
where necessary
Jabber/AIM: What are their
business and public policy goals?
AIM
Business goals
Large scale IM solution, centralized
Supported by advertisements
Public policy goals
Jabber
Business goals
Open source, open structure for naming, presence, and
"roster" (buddy list) information
Allow users to have one client for multiple IM protocols
Public policy goals
Jabber/AIM: How do they address
the threats in our model?
Monitoring of transactions
Data generally sent clear-text through (possibly)
untrusted servers
Jabber’s XML structure allows for security for
certain apps using encryption and vCard, but not
supported in the standard
Manipulation of transactions
Impersonation and misrepresentation
There have been several cases of ID theft and
password fraud on AIM
Jabber allows for dialback to prevent spoofing
Identification of individuals or nodes
Jabber/AIM: How do they address
the threats in our model? (Cont’d)
Legal action, social pressure, denial of
service
AIM servers all centralized
Jabber servers distributed, each presides
over separate namespace
Blocking of access to resources
Unreliability of resources
Malicious content
Groove: What is it?
“Shared space” for real-time
collaboration
Chat, IM, whiteboard, group web
browsing, calendar, discussion board,
integration with other applications
Groove: How does it work?
End-user application connects directly
with peers, but can use gateway
servers if necessary
All data in XML format
Different modes of operation to provide
different levels of anonymity of
participants
Groove: What are its business
and public policy goals?
Business goals
Public policy goals
Groove: How does it address
the threats in our model?
Monitoring of transactions
Manipulation of transactions
All data is signed so it cannot be manipulated
Impersonation and misrepresentation
All data is encrypted in transit and in storage
Key distribution system uses SDSI-type attributes
All invitation messages are signed and sent with
signer’s public key
Recipient can compute “fingerprint” from public
key and check it against previously known value
Identification, legal action, etc.
Groove: How does it address the
threats in our model? (Cont’d)
Denial of service
Blocking of access to network
Central servers used only when necessary
Can work through gateway servers designed to
tunnel through firewalls, etc.
Unreliability and transient availability
All communication is mirrored locally for all
participants
Malicious content
Freeloading and inequity