WebWatching The UK Higher Education Community

Download Report

Transcript WebWatching The UK Higher Education Community

Technical Issues Concerning
The Use Of Personal Data
On The Internet
Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath, BA2 7AY
1
Email
[email protected]
URL
http://www.ukoln.ac.uk/
UKOLN is funded by the British Library Research and Innovation Centre, the Joint
Information Systems Committee of the Higher Education Funding Councils, as well as
by project funding from the JISC’s Electronic Libraries Programme and the European
Union. UKOLN also receives support from the University of Bath where it is based.
Contents
About UK Web Focus
Personal Information and the Internet
•
•
•
•
End User issues
Information Provider issues
System Administrator issues
Management Issues
Solutions
• Technical
• Protocol developments
• Organisational
Conclusions
2
About UK Web Focus
UK Web Focus:
• Three year post funded by JISC
• Provides advice and support to the UK HE
community on Web matters
• Activities:
– Monitoring web developments
– Talks and presentations (e.g. Technical Threats to
Copyright and IPR at Talisman seminar on Legal
Risks on the Internet in January 1998)
– Represent JISC on World Wide Web Consortium
(W3C)
– Other related activities
3
Personal Data and The Internet
What are the issues peculiar to the Internet?
Information Providers
• Ease of use
• Ability to reuse
Systems
data
Managers
End Users
• Junk email
• Big brother
• Searching
4
• Log files
• Preventing
misuse
Management
• Liability
• Central Policies vs.
Departmental action
• Student use
• Confidentiality
What Else?
End User
What are the privacy implications for end
users of Internet services:
• A user of email or Usenet News
• A student who uses a public access
PC to access Web resources
• A member of staff who uses a PC in
his office
5
Mailing Lists and Usenet
Mailing Lists
Mailbase, for example, provides
search facilities for finding:
• Membership of lists
• Details of postings
http://www.mailbase.ac.uk/
search.html
Usenet News
Usenet News articles:
• Are archived
• Can be searched
6
http://www.altavista.digital.com/
Institutional Mailing Lists
• Many institutions
use the
HyperMail
software to
archive internal
mailing lists
• Robot software
can index these
archives.
7
Using ACDC to search for "Brian
Kelly" reveals contributions to
mailing lists.
UK Directory Services
Many Universities run
an X.500 directory
service
• X.500 is a distributed
directory protocol
• Originally dedicated
clients were used to
access X.500
• Now it's much easier
using the Web
8
http://www.brunel.ac.uk/
x500/search-form-gb.html
Finding People
Various other directory searching services
are available:
Whowhere
<URL: http://www.whowhere.com/>
BigFoot
<URL: http://www.
bigfoot.com/>
IAF (Internet Address Find)
<URL: http://www.
iaf.net/>
Advertising revenue can make
these a commercial proposition
9
Ahoy!
Ahoy! is a
research project
which uses AI
techniques to find
(a small number
of) personal home
pages
AI techniques will
make it easier to
find personal
information
10
http://www.ahoy.cs.
washington.edu:6060/
Web Browsers and Privacy
Client Caches
Web browsers store viewed
resources in a local cache (on
hard disk on network drive).
These resources
can be re-used.
Potentially these
files could be
accessed by other
users of PC or a
system
administrator
11
Web Browsers and Privacy
Cookies
Cookies enable
information to be stored
on your local PC which
can be reused by the
remote server.
Cookies are useful in
applications, such as
"shopping baskets", CBL,
etc.
However there are privacy implications, since
cookies can be used to record paths through a
website.
12
Information Providers
What personal information is provided on
the web?
Corporate Information
Individual /
Societies
13
Changing Context
Technologies such as
Frames can change the
context of resources on
the web by:
• Pointing to text
• Pointing to graphics
There has reportedly
been a "Babes on the
Web" page.
Document held remotely
14
Web Forms
Web forms are now
trivial to set up
 Save time and effort
 Information may be
reused easily
 Are information
providers aware of
implications of reusing
information?
15
System Administrators
System Administrators can:
• Read incoming and outgoing
messages and Usenet postings
• Analyse cache log files to find
popular websites - and potentially
who's been accessing them
• Deny access to specified websites
• Publish statistics on hits to pages
16
Web Statistics
Many web
administrators
publish their web
statistics:
• Access by
country
• Access by
domain name
• Most popular
pages
17
Restricting Access
It is possible to restrict
access to sites containing
dubious content
It is also possible to record
email address and take
action if persistent access
attempted
Is this:
• Sensible action
• Breach of privacy?
18
Solutions
There are a variety of solutions to the
issues concerned with Personal Data and
the Internet:
• Don't use the Internet
• Information providers' "tricks"
• System administrators' "tricks"
• Protocol Developments
• Auditing
Education is important throughout
19
Solutions - Denying Access
• Information published on
the web can be easily
processed by robots
• Can prevent (well-behaved)
robots from accessing
resources using the Robot
Exclusion Protocol (REP)
(robots.txt file)
User-agent: *
disallow: /stats/
Alta Vista search for "Brian Kelly"
gives 2,800 hits
But:
• Not widely used: ~30% of UK universities
• Not easily scaleable (single file at web root)
20
Solutions - For Info Providers
• REP implemented by system administrator
• Possible (but not easy?) to create master
robot.txt file by merging departmental ones
• HTML 4.0 <META NAME="ROBOTS"
CONTENT="NOINDEX, NOFOLLOW"> element
enables individual files to contain robot directives
 New and not yet widely supported
• Since robots tend not to follow
Conference Details
CGI programs, could hide
information behind a button
Campus map
 Not elegant
Participants
<FORM ACTION="part.html">
<INPUT TYPE="submit" VALUE="...">
21
Preventing Misuse
There are technical ways of:
• Preventing resources from being used
in frames
• Preventing images from being "stolen"
Solutions are being considered mainly for
copyright protection
However such solutions aren't widely
deployed as:
• They may prevent the resource from
being reused in valid ways
• No user / political pressure?
22
Political Developments
Global Information Networks
• European Conference in Bonn in June 97
• Raised issues of:
– Data protection
– Technological solutions
• See <URL: http://www2.echo.lu/
bonn/conference.html>
23
W3C Response
World Wide Web Consortium (W3C)
responded to Bonn paper:
• Summarised technological solutions:
– DSig: a web of trust
– PICS: content selection without censorship
– P3P: privacy project
– IPR: intellectual property rights
• See <URL: http://www.w3.org/
TR/NOTE-eu-conf-970711>
24
DSig
DSig:
• W3C's Digital Signature Initiative
• Helps users to decide who to trust
• Based on digitally signed assertions:
"This web page comes from Bath University Courses
office and gives a legally binding list of courses"
• See <URL: http://www.w3.org/
Security/DSig/Activity.html>
25
PICS
PICS:
• Platform for Internet Content Selection
• Mechanism for rating web pages
e.g. X, A, PG, U
• Decision to accept resource made by
end user (or end user organisation)
• Choice devolved - no censorship of
originating resource
• See <URL: http://www.w3.org/
PICS/>
26
IPR
W3C's IPR activity:
• Intellectual Property Rights and the Web:
– Does use of a cache infringe copyright
– Can links to resources be made freely
–…
• Asks the contentious question:
Does the nature of the technology require us to
change the legal understanding or status of
copyright as it stands now?
• See <URL: http://www.w3.org/IPR/
Activity.html>
27
P3P
P3P:
• Platform for Privacy Preferences
• Will develop specification and
demonstration of way of expressing
privacy practices and preferences by
Web sites and users
• Architecture and grammar work
complete (Oct 1997)
• See <URL: http://www.w3.org/
Privacy/Activity.html>
28
P3P Deliverables
General Overview of the P3P Architecture
• Document describes the P3P model
Grammatical Model
• Grammar and vocabulary for machine-readable
statements:
Data Categories: e.g. name, email, ...
Practices:
Use: e.g. system admin, research, customisation
Transfer: divulge information within organisation
Release: divulge info to other organisation
Access: ability of data subject to view information
29
See <URL: http://www.w3.org/TR/
NOTE-IPWG-Practices.html>
JTAP Calls
Digital Signatures
Studies to identify appropriate protocols and to test
deployment. Seeking to fund an overview report
and a technology deployment pilot
Certificate Based Infrastructure Services
Technical overview and pilot. Seeking to fund an
overview and technology watch project at a cost of
£25,000, followed by one or two deployment pilots
Work to start in Dec 1998
See <URL: http://www.jtap.ac.uk/bid/
c14_98.html>
30
Privacy Services
TRUSTe:
• An "independent, non-profit, privacy initiative
dedicated to building users' trust .. on the Internet"
• TRUSTe sites agree to:
– Maintain an approved Privacy Statement
– Explain information gathering practices:
– What personal information will be used for
– Whether information will be disclosed
– Display the TRUSTe's Mark
• TRUSTe will periodically check conformance
• See <URL: http://www.etrust.org/>
31
What's Happening in UK?
Number of universities
have provided guidelines
governing Internet use:
• Data Protection
• Computer Misuse
• ..
But:
• Is work being duplicated? http://www.cam.ac.uk/
CS/DPA.html
• Is it still relevant?
32
What's Needed?
Auditing Software
WebWatch
• Project based at UKOLN
• Monitors web technologies (not content)
• Potential for auditing robots.txt files?
Do we want software for auditing at a
national or institutional level?
Can we follow the TRUSTe model?
33
What's Needed?
Catalogue of Guidelines
A catalogue of UK HE
web resources is being
produced:
• Uses ROADS (cf.
SOSIG, OMNI, etc.)
• Various categories
planned:
– AUP
– Guidelines for authors
– Local search engines
• Feedback welcome
34
What's Needed?
Education
Need for education for:
• End users
• Information providers
• System administrators
• Managers
Who provides training materials?
Who delivers the training?
35
Conclusions
• Widespread use of the Internet / ease of
publishing has increased privacy concerns
• Need for education and awareness:
– End users
– Information providers
– System administrators (central & departmental)
•
•
•
•
36
Do we want a system like TRUSTe?
Need for auditing tools locally / nationally?
Need to share experiences
Need to be aware of (implement?)
technical solutions