From Physics to .com
Download
Report
Transcript From Physics to .com
Next Generation Information Systems
Avi Silberschatz
Department of Computer Science
Yale University
URL: www.cs.yale.edu/~avi
1
The Digital Age
Digital information forms the glue for blending the fields of computing,
communication and entertainment.
At the center of this revolution is data that is stored, accessed and
delivered in digital format. Some of the major issues surrounding this
type of data are:
Data is to be available to the users anytime and anywhere and
with the desired QoS.
Data access must adhere to privacy and security policies.
Data Interoperability.
Fast access to data, which implies support for queries with
approximate answers.
Data analysis and mining capabilities over very large datasets.
Many of the advances in information systems are due to
development of new technologies. These advances, in turn, are
pushing the developments of even newer technologies.
Next Generation Information Systems
2
Silberschatz
Research Challenges
Storage retrieval and delivery of multimedia data
Storage System Issues
QoS issues of continuous media data (e.g., video and audio)
Approximate answers
useful for very large data sets
useful for Web searching
Data mining
Discovering “interesting” patterns in very large data sets
Discovering “interesting” patterns from incomplete information
Data Interoperability
Privacy and security
Next generation Networks
Converged networks
Network Management
Next Generation Information Systems
3
Silberschatz
Multimedia Data
Regular Data
text, binary, image
Database Data
tuples, objects
Continuous Media Data
Video Data
The display (playback) of the data must be continuous with a
fixed rate, which is typically 30 frames/second.
A viewer may wish to control the way the data is to be displayed
by applying various VCR-type operations to the video data.
Audio Data
The playback must be continuous with a fixed rate, which is
dependent on the sample rate.
A listener may wish to control the way the data is played back.
Next Generation Information Systems
4
Silberschatz
Storage System Issues
Rapid growth in storage capacity demand
world-wide installed storage:
738 PetaByte in 2000
over 75% per year storage capacity increase over the next 5 years
reaches ZettaByte in 2009
data stored at Global 2500 companies double every 18 months
data stored at e-commerce companies grow at 400% a year
Management
40-50% of company IT budget is spent on storage
fraction of IT budget spent on storage is expected to grow
cost for storage management exceeds cost of storage equipment
management: $300 per GB per year
low-end storage: $14 - $50 per GB (packaged, powered,
networked)
management cost is expected to grow
Storage Requirement
24 x 7
Disaster recover
Next Generation Information Systems
5
Silberschatz
Storage is Moving Into the Network
Motivation
Use commodity IP based networks
IT staff know-how
Distance and universal access
Applications
Disaster recovery
Archiving
Backups
Content Distribution
Managed storage
Value added storage services
Consolidation of storage
Next Generation Information Systems
6
Silberschatz
IP-Based Network Storage
Storage is managed
Client site #1
possible by different
domains
LAN
Storage devices are
Client site #2
connected over networking
infrastructure
LAN
Metro/WAN
file
server
LAN
SAN
Next Generation Information Systems
LAN
file
servers
7
SAN
Silberschatz
file
servers
IP-based Network Storage (Cont.)
IETF standards are being drafted
Most popular: iSCSI and FCIP
Almost all networking and storage companies are participating in these
standards
Issues
Performance
Reliability
Future
end-to-end iSCSI;
end-to-end IP storage networking?
demise of FC?
Hybrid?
FC (InfiniBand) SAN islands connected over IP networks
FC SANs in data centers accessed by IP networks
Next Generation Information Systems
8
Silberschatz
Network Storage Security
Customers may not trust the storage service provider (SSP)
Storage consolidation over different customers is essential to make
storage outsourcing viable. However, customers may not trust each
other
Threat model
Disclosure of data to an eavesdropper intercepting communication
Disclosure of data to storage service provider (SSP) and to other
customers of the SSP
Manipulation of communication by an attacker
Manipulation of data by the SSP or other customers of the SSP
Challenges
high throughput encryption (e.g., 1Gbps, 10 Gbps)
security without hindering performance
Next Generation Information Systems
9
Silberschatz
Multimedia Storage and Delivery Issues
The size of some databases is enormous, especially those that
are used for data mining (e.g., cash register transactions).
30 terabytes largest commercial database
Some information sources generate data at an astonishing rate
(e.g., satellite images).
EOS – 1-2 terabytes per day
The BBC is planning to digitize the last 50 years of programming.
Continuous media data is voluminous:
100 minute MPEG-1 video requires 1.125GB
100 minute HDTV video requires 15GB
Continuous media data require support for QoS.
Next Generation Information Systems
10
Silberschatz
System Resources to be Managed for QoS
Storage Server Resources
Tertiary
Storage
I/O Bus
Secondary
Storage
I/O Bus
Buffer Space
Processor(s)
Network
Next Generation Information Systems
11
Silberschatz
Research Issues
Admission control
Disk Scheduling
Buffer Management
Storage Management
data layout
varying disk transfer rates
disk striping
meta data
fault-tolerance
Tertiary storage
Next Generation Information Systems
12
Silberschatz
Cycle-based Scheduling
Let T be the length of a service cycle
Maintain a queue of requests R1 , R 2 . . . R n . Each Ri
corresponding to a request to view a CM clip. Each
an associated rate ri.
request has
For each request, a buffer is allocated of size 2 T ri .
Requests in the queue are served in a cyclic order using double
buffering. In each cycle I:
get data from disk to buffer (I mod 2)
transfer data from the (I + 1 mod 2) buffer to the client
Next Generation Information Systems
13
Silberschatz
Disk Scheduling
Request are serviced in service cycles (rounds).
In the beginning of a service cycle requests are ordered in
C-SCAN order.
In the beginning of every service cycle, it is ensured that
2 T ri B
T ri
t rot t settle 2 t seek T
rdisk
hold. (where t rot , t settle , t seek are the rotational delay, settle time, and seek time,
respectively, and B is the buffer pool size).
The value of T is adjusted depending on the workload.
In every service cycle,
m in
T ri ,
2 T ri offset of last retrieved - offset of last consum ed
bits of data retrieved for each request.
Next Generation Information Systems
14
Silberschatz
Admissions Control
Queue is bounded by an admission control scheme
For each request, the service time for a request is estimated.
A request is admitted only if the sum of the estimated service times for
all admitted requests does not exceed the duration of service cycle T.
Next Generation Information Systems
15
Silberschatz
Admission Control (cont.)
Reserve a fraction of service cycle T, say T ( 0 1) for continuous
media requests.
A request (real-time, non-real-time), is admitted if
T ri
t rot t settle
rdisk
ni
t rot t settle 2 t seek T
rdisk
A real-time request is admitted if
T ri
t
t
2 t seek T
rot
settle
rdisk
Above scheme ensures
both continuous and non-continuous media requests are allocated time
during a service cycle.
any time during a service cycle unused by continuous media requests is
allocated to non-continuous media requests.
Next Generation Information Systems
16
Silberschatz
Length of T
What about the length of T?
Next Generation Information Systems
17
Silberschatz
Buffer Space Constraints
Let B be the available buffer size
Let N be the number of admitted clients
Assume infinite disk bandwidth
Requirements:
N
2 T ri B
i 1
N
T
For a given buffer size B, the larger T, the fewer clients can be admitted.
Next Generation Information Systems
18
Silberschatz
Disk Bandwidth Constraints
Assume infinite buffer space
Use C-SCAN disk scheduling
N
T ri
i 1
rdisk
Requirements: 2 t settle N ( t rot t settle )
T
N
T
The larger T the larger N is
Next Generation Information Systems
19
Silberschatz
Combining Disk & Buffer Constraints
N
disk constraint
buffer constraint
T
The optimal T is obtained by solving a quadratic equation of the disk
and buffer space constraints.
Next Generation Information Systems
20
Silberschatz
Minimizing Response Time
Under some workloads (e.g., request with small ri ' s such as 64
Kbps), the value of T that maximizes throughput can be high
(e.g., 20 secs.).
This might yield high response times.
Solution:
maintain small T values
in order not to degrade throughput, for each request Ri data
is prefetched from disk in every ki service cycles (instead of
in every service cycle)
The maximum amount of data prefetched is k i T ri
buffer space allocated to Ri is k i 1 T ri
Next Generation Information Systems
21
Silberschatz
Minimizing Response Time (contd.)
Issues:
Calculation of ki’s
Admission control:
lcm k 1 , k 2 , ... , k n
service cycles to manage
For a request Ri, finding the least loaded service cycles
ui k i l , 0 l
lcm k 1 , k 2 , ... k n
1
ki
In order to reduce response time, start a new request Ri in
the first possible service cycle and then move it
incrementally to the selected least loaded service cycle.
This solution also provides higher throughput for workloads with
small ri’s
Next Generation Information Systems
22
Silberschatz
Querying Huge Data Sets
Give me all objects (e.g., images) that look like this.
If we are dealing with PetaBytes of data, this may take days or
weeks.
One solution is to capture “meta data” information about the
stored objects as the objects are stored in the database.
Querying is done against the “meta data”.
Major issue – nature of the meta data.
Another solution is to provide support for “approximate
answers”.
Next Generation Information Systems
23
Silberschatz
Providing Approximate Answers
Traditional databases provide exact answers to queries, but...
In massive data environments, can take minutes to hours due to
disk I/Os
In distributed environments, data may be remote or currently
unavailable
In real-time environments, even single I/O may be too slow
Next Generation Information Systems
24
Silberschatz
Providing Approximate Answers (Cont.)
Trade-off accuracy for performance: e.g., 30 minutes for exact
answer vs. 3 seconds for an approximate answer with 5% error
Examples where fast approximate answers are preferred:
drill-down query sequence in data mining: searching for the
“interesting” queries
tentative answer when base data unavailable
leading digits suffice (e.g., 3.5 million vs. 3.512 million)
Can proceed to the exact answer, if desired
Next Generation Information Systems
25
Silberschatz
The AQUA System
Approximate Query Engine for data warehousing
(Fast) Query on
the Aqua synopses
(Slow) Query on
the warehouse data
SQL
Query Q
SQL
Query Q’
Network
Browser
Excel
DBMS
for
Large Data
Warehouse
Result
(w/ error bounds)
HTML
XML
Aqua
synopses
Aqua precomputes and maintains small synopses of the data
Aqua provides approximate answers with accuracy guarantees, by
rewriting user queries as depicted above
Next Generation Information Systems
26
Silberschatz
Aqua Synopses: The Key Ingredient
(Small) Surrogate for the actual data.
Must accurately estimate the exact answers from the synopses.
As data is updated, must keep synopses up-to-date.
We developed new techniques for summarizing data,
and for adapting these summaries to changes in
both the data and the query mix.
First system to provide fast, highly-accurate approximate answers
for a broad class of queries arising in data warehousing scenarios
Next Generation Information Systems
27
Silberschatz
Private, Public, and Sensitive Information in a Wired World
Private information
Only the data subject has a right to it.
Public information
Everyone has a right to it.
Sensitive information
“Legitimate users” have a right to it.
It can harm data subjects, data owners, or data users if it is
misused.
Next Generation Information Systems
28
Silberschatz
Erosion of Privacy
“You have zero privacy. Get over it.” – Scott McNealy, 1999
Changes in technology are making privacy harder.
increased use of computers and networks
reduced cost for data storage
increased ability to process large amounts of data
Becoming more critical as public awareness, potential misuse, and
conflicting goals increase.
Next Generation Information Systems
29
Silberschatz
“Public Records” in the Internet Age
Depending on State and Federal law, “public records” can include:
Birth, death, marriage, and divorce records
Court documents and arrest warrants (including those who were acquitted)
Property ownership and tax-compliance records
Driver’s license information
Occupational certification
They are, by definition, “open to inspection by any person.”
Traditionally: Many public records were “practically obscure.”
Stored at the local level on hard-to-search media, e.g., paper, microfiche, or
offline computer disks.
Not often accurately and usefully indexed.
Now: More and more public records, especially Federal records, are being put on
public web pages in standard, searchable formats.
Issues
Should some Internet-accessible public records be only conditionally
accessible?
Should data subjects have more control?
Should data collectors be legally obligated to correct mistakes?
Next Generation Information Systems
30
Silberschatz
Examples of Sensitive Information
Copyright works
Certain financial information
Health Information
Question: Should some information now in “public records” be
reclassified as “sensitive”?
Next Generation Information Systems
31
Silberschatz
State of Technology
We have the ability (if not always the will) to prevent improper
access to private information. Encryption is very helpful here.
We have little or no ability to prevent improper use of sensitive
information. Encryption is less helpful here.
Next Generation Information Systems
32
Silberschatz
The PORTIA Project
PORTIA: Privacy, Obligations, and Rights in Technology of Information
Assessment
Large ITR grant from NSF. It is five-year multi-institutional, multi-
disciplinary, multi-modal research project on end-to-end handling of
sensitive information in a wired world
Researchers from:
Stanford: Dan Boneh, Hector Garcia-Molina, John Mitchell, Rajeev Motwani
Yale: Joan Feigenbaum, Ravi Kennan, Avi Silberschatz
University of NM: Stephanie Forrest
Stevens Institute: Rebecca Wright
NYU: Helen Nissenbaum
Plus participation by software industry, key user communities,
advocacy organizations, and non-CS academics.
http://crypto.stanford.edu/portia
Next Generation Information Systems
33
Silberschatz
PORTIA Goals
Produce a next generation of technology for handling sensitive
information that is qualitatively better than the current generation’s.
Enable end-to-end handling of sensitive information over the course of
its lifetime.
Formulate an effective conceptual framework for policy making and
philosophical inquiry into the rights and responsibilities of data
subjects, data owners, and data users.
Next Generation Information Systems
34
Silberschatz
Five Major Research Themes
Privacy-preserving data mining and privacy-preserving surveillance
Database policy enforcement tools
Sensitive data in P2P systems
Policy-enforcement tools for database systems
Identity theft and identity privacy
Next Generation Information Systems
35
Silberschatz
Privacy and Security on the Web
An increasing number of web sites require user registration, which enables
personalized services. This however, raises some concerns.
Privacy concerns: providing the same user name
(or e-mail) allows creation of comprehensive dossiers; providing your email address reveals your true identity
Security concerns: using the same user name and password at
multiple web sites enables password from insecure sites to be used to
help determine password at secure sites
Junk e-mail: giving your e-mail address makes you susceptible to junk
e-mail
Inconvenience: people have to invent and remember multiple user
names and passwords
Next Generation Information Systems
36
Silberschatz
The LPWA system
A tool for combining privacy, security and convenience . Enables personalized
services by generating consistent, untraceable aliases for use on the web.
quote.com
axyz, x45t
LPWA
Czar, 4rt5
my.yahoo.com
Boss, 56yh
Arun Netravali
expedia
Next Generation Information Systems
37
Silberschatz
The LPWA Proxy
Properties
Privacy: web sites cannot collude to create dossiers
Security: different passwords for different web sites
Convenience: no need to remember multiple user names and
passwords
Alias e-mail addresses support communication from web sites back
to users and allow control of junk e-mail
Next Generation Information Systems
38
Silberschatz
Generation of Aliases
At the first invocation of the LPWA proxy
User provides:
user’s e-mail address id
a secret S (random string)
Registering
User types \u, \p, \@ for username, password and e-mail
address, resp.
LPWA uses id , S , and the domain-name of the web-site
being visited to compute the users’ alias
Repeat Visits
User again types \u and \p for username and password
LPWA computes the same alias-username/password.
Next Generation Information Systems
39
Silberschatz
Network System Challenges
Next-generation network -- will be simpler, lower cost, and will provide
customized services for consumers and businesses
Converged networks -- will incorporate the best features of today’s
voice and data networks
Network management – automate many of the functions that are
currently done by people.
Next Generation Information Systems
40
Silberschatz
Next-generation networks
Yesterday’s Networks
NM
Next-Generation Networks
NM
NM
NM
5E
5E
5E
Service
Layer
Local
ISP
CLEC
5E
Video
Data
Voice
PSTN
ADM
ADM
Electronic
Layer
ADM
ADM
ADM
ADM
DCS
DCS
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
DCS
DCS
ADM
Optical
Layer
Point-to-point optical links
Circuit switched, centrally managed
Separate networks for voice, data, video
Fixed, closed
Next Generation Information Systems
41
All-Optical mesh backbone
Packet switched, distributed
Unified network for customized multimedia
services
Open APIs for ISV services
Silberschatz
Next generation converged networks
NEXT
G E N E R A T IO N
NETW O RK
Data
Network
Next Generation Information Systems
Converged
Applications
42
Voice
Network
Silberschatz
Network Management Challenges
Managing today’s networks is extremely challenging due to their increased
complexity
Networks contain hundreds of network elements and thousands of
physical links
Network elements follow a multitude of protocols (e.g., BGP, OSPF,
ISIS, RIP)
Networks are heterogeneous and contain equipment from multiple
different vendors
Manually managing networks
is tedious, labor-intensive, time-consuming and error-prone
is not cost-effective due to severe shortages of and high costs of
skilled labor
Critical need for software tools that automate network management tasks
Next Generation Information Systems
43
Silberschatz
Next-Generation Network Management
Next-Generation network management software functionality includes
Keeping track of network inventory and topology
Monitoring network link bandwidth and latency
Storing, analyzing and reporting network performance data
Load balancing by appropriately configuring network parameters
Automating and simplifying network configuration tasks (e.g., VPNs)
Value Proposition:
Ease management and configuration of ISP networks
Optimize utilization of network resources
Goal: Make networks self-administering and self-tuning
Next Generation Information Systems
44
Silberschatz
There are many approaches to predicting the future
I think there is a world market
for maybe five computers.
(Thomas Watson, 1943)
Video won’t be able to hold onto
any market it captures after the
first six months. People will soon
get tired of staring at a plywood
box every night.
(Darryl F. Zanuck, head of 20th
Century Fox, 1946)
640K ought to be enough for
anybody. (Bill Gates, 1981)
“How do you want it – the crystal
mumbo-jumbo or statistical probability?”
Next Generation Information Systems
45
Silberschatz
Five predictions for the new millennium
1
A mega-network of networks will
enfold the earth in a communications
“skin” with ubiquitous connectivity and
enormous bandwidth.
Next Generation Information Systems
46
Silberschatz
Five predictions for the new millennium
2
By 2010, there will be so many
interconnected devices that the
volume of “infrachatter” among
communicating machines will
surpass communications among
humans.
Next Generation Information Systems
47
Silberschatz
Five predictions for the new millennium
3
Bandwidth will be too
cheap to meter.
$
Next Generation Information Systems
48
Silberschatz
Five predictions for the new millennium
4
Consumers and businesses will
have a vast variety of
individualized, custom services -written by countless programmers
on an open mega-network.
Next Generation Information Systems
49
Silberschatz
Five predictions for the new millennium
5
Virtual reality will become a reality
and will transform the way people
live and conduct their business.
This lecture will be given from the
comfort of my office without me
having to travel.
Next Generation Information Systems
50
Silberschatz