Kim`s Presentation
Download
Report
Transcript Kim`s Presentation
An Analysis of P3P
Deployment
Hyun Jin Kim
Sensitive Information
in a Wired World
November 11, 2003
Introduction
Privacy Policies
US self-regulatory approach to online privacy
protection
Description of a company’s data practices
What information they collect from individuals and
what they do with it
P3P Specifications
Developed by World Wide Web
Consortium (W3C) over 5 years of work
Became an official W3C
“Recommendation” just over a year ago on
April 16, 2002
P3P Specifications
P3P Evaluation System Design
Automated process to measure P3P adoption and
gather data from P3P-enabled web sites
By Lorrie Faith Cranor, Simon Byers, and David Kormann
(AT&T Labs-Research)
Five major components
URL Collection Mechanism
P3P Policy Retriever
Scripted Interface to the W3C P3P Validator
P3P Policy Evaluator
Generic Data Analysis Tools
URL Collector
To identify sets of sites of interest
Existing lists of URLs
Newly constructed lists that focus on
particular web sites
Web spidering technique
Gather information from web directories and other
sources
P3P Policy Retriever
Pearl Script to retrieve P3P information
All policies, policy reference files, compact
header policies
P3P Validator
W3C P3P Validator
Fetches P3P policy reference files, policy files
and compact policies
Checks them for compliance with the P3P 1.0
Specification
Stops validation upon encountering an error
Scripted interface to the W3C P3P
Validator
Retrieve P3P policies from sites with errors in
their policy reference files
P3P Policy Evaluator
Compares a web site’s policy with a user’s
privacy preferences
Finds a mismatch between the P3P policy
and the privacy preferences
Data Analysis
Outputs of policy evaluations gathered in a
rectangular matrix
Row – policy from a web site
Column – APPEL rule set file
Run a Pearl script over the matrix
Produce various tabulations
i.e., number of sites that returned mismatch
between privacy preferences and P3P policies
Web Site Selection
Focus on the sites frequently visited by users
PFF Most Popular
PFF Random
209 domains from the PFF Random list that were in the top 5,625 domains in
October 2001 by Nielsen/NetRatings
Excludes adult sites, children’s sites, business-to-business sites, and non-dot-coms
Netscore Top 500
Random sample of 302 of the 7821 domains with at least 39,000 unique monthly
visitors in October 2001 by Nielsen/NetRatings
PFF Refined Random
85 of the 100 busiest sites determined by the October 2001 Nielsen/NetRatings
ranking of sites with the most unique visitors per month
Excludes adult sites, children’s sites, business-to-business sites, and sites not in the
.com top level domain
500 domains with the most unique visitors during July 2002 by comScore Media
Matrix netScore Standard Traffic Measurement report
Key Measures
Top 500 domains with the most unique visitors during July 2002 by comScore Media
Matrix Key Measures report
Includes “third-party” sites
Web Site Selection (Cont.)
Alexia
Froogle
900 sites obtained by crawling www.yahooligans.com in April 2003
Sites for children ages 7-12
Firstgov
1,017 sites obtained by crawling the www.froogle.com web sites in April
2003
Sites offer products for sale
Yahooligans
Top 500 domains by Alexia Traffic Ranking on Feb.4, 2003
Includes non-US domains and adult sites
344 government sites indexed at www.firstgov.gov in April 2003
Includes US federal and state government sites and sites for some quasigovernment organizations
News
2,429 sites by news.google.com in April 2003
Includes a variety of news-reporting organizations from the US and other
countries
P3P Adoption on May 2003
P3P Adoption (Cont.)
P3P adoption increasing over time
Highest for the most popular web sites
Key Measures site lists higher than Netscore
Presence of “third-party” sites
Alexa top 500 list lowest
To avoid having their cookies blocked by IE6
International nature
Large number of adults sites
One third of the P3P-enabled sites had errors flagged by
W3C P3P Validator
7% had errors that prevented their evaluation by Privacy
Bird evaluation engine
Omit required components of a P3P policy
Improperly referencing data elements
Privacy Bird Evaluation
Definition of not sharing data
Sites share data only with agents that use it only to complete the
transaction for which it was provided or with delivery companies
Data sharing occurs only under an opt-in policy
3 standard settings
Low
Trigger a red bird – policy does not match the preferences
Collects health/medical info
Share it with other companies
Use it for analysis, marketing or to make decisions what content or ads
the user sees
Engage in marketing but do not provide a way to opt-out
Privacy Bird Evaluation (Cont.)
Medium
Same as low
Sites sharing PII (physical contact info, online contact info,
government-issued identifier), financial info, or purchase info with
other companies
Sites collecting PII but provide no access provisions
High
Same as medium
Sites sharing any personal info (including non-identified info) with other
companies
Use it to determine the user’s habits, interests, or other characteristics
Sites contacting users for marketing
Sites using financial or purchase info for analysis, marketing, or to make
decisions that may affect what content or ads the user sees
Privacy Bird Evaluation (Cont.)
Privacy Bird Evaluation (Cont.)
Red bird on 24% of the evaluated sites
No opt-out of marketing and/or telemarketing ability offered
Most popular sites receive both green bird on low setting and
red bird on high setting
Green bird - Greater awareness of the importance of the
“choice” principle
Red bird - Most offer rich ecommerce environments that rely
heavily on targeted marketing and profiling visitors
Red birds on Froogle and Yahooligans most likely
Collect health and medical info
Types of Data Collected
Types of Data Collected
(Cont.)
Most collected data
HTTP protocol used for retrieving content from website
Less by Froogle and gov’t web sites
Mostly by news web sites
Fewer collected financial info (excludes purchase process)
Demographic data
Online contact info, physical contact info, interactive data, unique ids
Preference info, purchase info, and state management info (cookies)
Least collected data
Computer info and click stream info
Content (email msgs, bulletin board postings, etc.)
Government-issued identifiers
Health information
Political information
Location information (ie. GPS positioning data)
Information not falling into any other pre-defined categories
No government websites collect government-issued identifiers
Data Usage
Data Usage (Cont.)
Almost all websites used data for
Majority of sites used data for
Telemarketing
Profiling in which individuals are identified by name or other PII
Very few sites used data for
Email and postal mail marketing
One-time tailoring of the site content
Two-forms of pseudonymous profiling
Fewer sites used data for
Completion and support of the activity for which data was provided
Web site and system administration
Research and development
Historical preservation (Not by government sites)
Other purposes that do not fall into these categories
News web sites use data for almost every purpose.
Data Recipients and Sharing
Data Recipients and Sharing
(Cont.)
Half the websites share PII with parties
other than agents who use data for the
purpose for which it was provided
Most likely by
News web sites
Froogle list sites with delivery company
Least likely by
Government web sites
Choice Options
Choice Options (Cont.)
Top sites most likely to engage in marketing than
less popular sites
Top sites most likely to offer choices (opt-in/out)
Internal choices (telemarketing and other
marketing) offered more opt-out than opt-in
Third-party choices offered more opt-in than
opt-out
Access Provisions
Access Provisions (Cont.)
92% of sites collecting identified data
provides some access provisions
Most provides access to both contact info and
other data
Smaller number provides access to only contact
info or to all identified data
Very few provides no access
None provides access only to non-contact info
Dispute Resolution Options
and Remedies
Dispute Resolution Options
and Remedies
Individuals can contact customer service to
resolve their disputes on most sites
About one-third offered resolution via
independent organization (ie. Privacy seal
provider)
by most popular sites
Very few indicated resolution of dispute under an
applicable law
Almost none indicated resolution in court
Data Retention Policies
Data Retention Policies
(Cont.)
Majority did not have a data retention
policy for all of the data they collected
Government web sites more likely to have a
policy of not retaining info or to have a
retention policy based on a legal
requirement
Conclusion
P3P adoption is increasing over time, especially for the most
popular web sites
Yahooligans (sites for children) most likely to offer opt-in
policies
Large number of websites with technical errors in their P3P
policies
Debates continue about the need for further privacy
legislation and the effectiveness of industry self-regulation
in the privacy area.
Essential to have good statistics and privacy policies
US government web sites began posting P3P policies to
comply with the privacy requirements of section 208 of the
E-Government Act of 2002
Continue web sweeps of gov’t web sites to monitor compliance
with these requirements