Data, Data Everywhere

Download Report

Transcript Data, Data Everywhere

Data, Data Everywhere
Why We Need Broadband
Connectivity
By Ruzena Bajcsy
Who Generates the Data?
•
•
•
•
•
•
•
•
Astronomers
Biologists
High Energy Physicists
Geophysicists
Archeologists and Anthropologists
Psychologists
Engineers
Artists
Center for Information Technology Research
in the Interest of Society
A Year of Innovation and
Accomplishment
UC Santa Cruz
Solving Societal-Scale Problems
 Energy Conservation
 Emergency Response and
Homeland Defense
 Transportation Efficiency
Solving Societal-Scale Problems
 Monitoring Health Care
 Land and Environment
 Education
Societal-Scale Systems
Secure, non-stop utility
Diverse components
Adapts to interfaces/users
Always connected
Massive Cluster
Gigabit Ethernet
“Server”
“Client”
Information
Appliances
MEMS
Sensors
Clusters
Scalable, Reliable,
Secure Services
February 2000
August 2001
February 2001
February 2002
Seismic Monitoring of Buildings:
Before CITRIS
$8,000 each
Seismic Monitoring of Buildings:
With CITRIS Wireless Motes
$70 each
•
Ad-hoc
sensor
networks
work
29 Palms Marine Base, March 2001
– 10 Motes dropped from an airplane
landed, formed a wireless network,
detected passing vehicles, and
radioed information back
• Intel Developers Forum, Aug 2001
– 800 Motes running TinyOS hidden
in auditorium seats started up and
formed a wireless network as
participants passed them around
• tinyos.millennium.berkeley.edu
Recent Progress:
Energy Efficiency
and
Smart Buildings
The Inelasticity of California’s Electrical
Supply
800
700
$/MWh
600
500
400
300
200
100
0
20000
25000
30000
35000
40000
45000
MW
Power-exchange market price for electricity versus load
(California, Summer 2000)
How to Address the Inelasticity of the
Supply
• Spread demand over time (or reduce peak)
– Make cost of energy
• visible to end-user
• function of load curve (e.g. hourly pricing)
– “demand-response” approach
• Reduce average demand (demand side)
– Eliminate wasteful consumption
– Improve efficiency of equipment and appliances
• Improve efficiency of generation and
distribution network (supply side)
Enabled by Information!
Energy Consumption in Buildings
(US 1997)
End Use
Space heating
Space cooling
Water heating
Refrigerator/Freezer
Lighting
Cooking
Clothes dryers
Color TVs
Ventilation/Furnace fans
Office equipment
Miscellaneous
Total
Residential
6.7
1.5
2.7
1.7
1.1
0.6
0.6
0.8
0.4
3.0
19.0
(Units: quads per year = 1.05 EJ y-1)
Source: Interlaboratory Working Group, 2000
Commercial
2.0
1.1
0.9
0.6
3.8
0.6
1.4
4.9
15.2
A Three-Phase Approach
• Phase 1: Passive Monitoring
– The availability of cheap, connected (wired or wireless)
sensors makes it possible for the end-user to monitor energyusage of buildings and individual appliances and act thereon.
– Primary feedback on usage
– Monitor health of the system (30% inefficiency!)
• Phase 2: Quasi-Active Monitoring and Control
– Combining the monitoring information with instantaneous
feedback on the cost of usage closes the feedback loop
between end-user and supplier.
• Phase 3: Active Energy-Management through
Feedback and Control—Smart Buildings and
Cory Hall Energy Monitoring
Network
50 nodes on 4th floor
30 sec sampling
250K samples to database over 6 weeks
Moved to Intel Lab – come play!
Smart Buildings
Dense wireless network of
sensor, control, and
actuator nodes
• Task/ambient conditioning systems allow conditioning in small,
localized zones, to be individually controlled by building occupants
and environmental conditions
• Joint projects among BWRC/BSAC, Center for the Built
Environment (CBE), IEOR, Intel Lab, LBNL
Control of HVAC systems
Conventional
Overhead
System
Underfloor Air
Distribution
Control of HVAC Systems
• Underfloor system can save energy because it can
get hotter near ceiling
• Project with CBE (Arens, Federspiel)
• Need temperature sensors at different heights
• Simulation results
– Hot August day in Sacramento
– Underfloor HVAC saves 46% of energy
• Future: test in instrumented room
More sensors – air velocity
• Uses time of flight of sound to
determine 3D air velocity
• Significance
– Heat transfer (energy)
– Air quality
– Perception of temperature
Smart Dust Goes National
 Academia: UCSD, UCLA, USC, MIT,
Rutgers, Dartmouth, U. Illinois UC,
NCSA, U. Virginia, U. Washington, Ohio
State
 Industry: Intel, Crossbow, Bosch,
Accenture, Mitre, Xerox PARC, Kestrel
 Government: National Center of
Supercomputing, Wright Patterson AFB
Why Broadband Connectivity
When Memory Is So Cheap?
• Because users want to interact with the data
in real time
• Users need to access the data at the right
time and at the right place
• They need to access data in the right format
• They want the right amount of data
Examples
• Distributed computation
• Cluster technology
• The Berkeley Millenium Project
Cluster Counts
• NOW (circa 1994) 4proc HP ->36proc SPARC10 >100proc Ultra1
• Millennium Central Cluster (Intel Donation)
– 99 Dell 2300/6400/6450 Xeon Dual/Quad: 332
processors
– Total: 211GB memory, 3TB disk
– Myrinet 2000 + 1000Mb fiber ethernet
• OceanStore/ROC cluster, Astro cluster, Math cluster, Cory
cluster, more
• CITRIS Pilot Cluster : 3/2002 deployment (Intel Donation)
– 4 Dell Precision 730 Itanium Duals: 8 processors
– Total: 20GB memory, 128GB disk
Current Network
CITRIS Network Rollout
Network Rollout
• Millennium Cluster
– Keep existing Nortel 1200/1100/8600
– New Foundry FastIron 1500
• CITRIS Cluster
– New Foundry FastIron 1500
• Backbone
– 2 Foundry BigIron 8000
• Cost of expansion $280K (SimMillennium)
Millennium Cluster Tools
•
•
•
•
Rootstock Installation
Ganglia Cluster Monitoring
gEXEC – remote execution/load balancing
Pcp – parallel copying/job staging
All in production, open source, cluster
community development on sourceforge.net
Rootstock Installation Tool
• Installation configuration
stored centrally
• Build local cluster specific
root from central root
• Install/reinstall cluster
nodes from local rootstock
• http://rootstock.millenniu
m.berkeley.edu/
• Has become basis for
http://rocks.npaci.edu/
cluster distribution.
Ganglia Monitoring
•
Coherent distributed hash of cluster information
–
–
–
–
•
•
•
•
•
•
Static: cpu speed, total memory, software versions, boottime, upgradetime etc.
Dynamic: load, cpu idle, memory available, system clock, etc.
Heartbeat
Customizable with simple API for any other metric
Data is exchanged in well defined XML and XDR
Lightweight – small memory footprint and minimal communication
(tunable).
Scalable – tested on several 512+ node clusters
Trusted hosts - feature allows clusters of clusters to be linked within a single
monitoring and execution domain.
Ported to Linux, FreeBSD, Solaris, AIX, and IRIX, +active development by
community for other ports
Dell Open Cluster Group seriously evaluating this as basis for their cluster
computing tool distribution. “The only monitoring that scales over 64
nodes”
gEXEC – remote execution
•
History
– Glunix from NOW
– rEXEC from Millennium
– gEXEC UCB/CalTech collaboration
•
•
•
•
•
•
•
Lightweight – minimal number of threads on frontend + fanout
Decentralized – no central point of failure
Fault tolerant – fallback ability + failure checks at runtime
Interactive – feels like a single machine
Load balanced from Ganglia Monitoring data
Scalable to at least 512 nodes.
Unix authorization plus cluster keys
e.g.
gexec –n 3 hostname
gexec –n 0 render –in input.${VNN} –out output.${VNN}
Pcp – parallel copy
•
•
•
•
•
Newest addition to cluster suite
Fanout copy of files/directories to nodes
Scalable
Used for job staging
Future of this tool is to wrap it up as an
option into gEXEC.
Known Sites Using Ganglia Cluster Toolkit
Most popular cluster and distributed computing software on sourceforge.net
Over 7000 downloads since release of 1/2002
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Clinica Sierra Vista http://www.clinicasierravista.org
LondonTown http://www.londontown.com/
National Hellenic Research Foundation http://www.eie.gr
RightNow Techologies http://www.rightnow.com/
Idaho National Engineering and Environmental Laboratory http://www.inel.gov
WesternGeco http://www.westerngeco.com
80/20 Software Tools http://rc.explosive.net
Optiglobe Brazil http://www.optiglobe.com.br
Brunel University http://www.brunel.ac.uk
Cinvestav Instituto Politecnico Nacional http://www.ira.cinvestav.mx
Conexant http://www.hotrail.com
Dell http://www.dell.com/
SuSE Linux http://www.suse.de
Arabic on Linux http://www.planux.com
Delgado Community College, New Orleans http://www.dcc.edu
Boeing http://www.boeing.com
RedHat http://www.redhat.com/
University of Pisa, Italy http://www.df.unipi.it
Ecole Normale Superieure De Lyon http://www.ens-lyon.fr
iMedium http://www.imedium.com
Moving Picture Company http://www.moving-picture.com
Professional Service Super Computers http://www.pssclabs.com
AlgoNomics http://www.algonomics.com
Ocimum Biosolutions http://www.ocimumbio.com
Caltech http://www.caltech.edu
VitalStream http://www.publichost.com
Sandia National Laboratory http://www.sandia.gov/
UC Irvine http://www.uci.edu
Guide Corporation http://www.guidecorp.com/
Matav http://www.matav.hu
Math Tech, Denmark http://www.math-tech.dk
Istituto Trentino Di Cultura http://www.itc.it
Compaq http://www.compaq.com/
National Research Council Canada http://www.nrc.ca
Overture http://www.overture.com
Petroleum Geo-Services http://www.pgs.com
National Research Laboratory of the US Navy http://www.nrl.navy.mil
White Oak Technologies, Inc. http://www.woti.com/
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Centre National De La Recherche Scientifique http://www.in2p3.fr
SDSC http://www.sdsc.edu
IE&M http://iew3.technion.ac.il/
GMX http://www.gmx.fr
CAS, Chemical Abstracts Service http://www.cas.org
Keldysh Institute of Applied Mathematics (Russia) http://www.kiam1.rssi.ru
LUCIE (Linux Universal Config. & Install Engine) http://matsuwww.is.titech.ac.jp/~takamiya/lucie/
Mellanox Technologies http://www.mellanox.co.il/
TerraSoft Solutions (PowerPC Linux) http://terraplex.com/tss_about.shtml
Intel http://www.intel.com/
BellSouth Internet Services http://services.bellsouth.net/external/
ArrayNetworks http://www.clickarray.com/
MandrakeSoft http://www.mandrakesoft.com
Technische Universitat Graz http://www.TUGraz.at/
GeoCrawler http://www.geocrawler.com/
Cray http://www.cray.com/
Unlimited Scale http://www.unlimitedscale.com/
UCSF Computer Science http://cs.usfca.edu/
RoadRunner http://www.houston.rr.com
Veritas Geophysical Integrity http://www.veritasdgc.com
Dow http://www.dow.com/
The Max Planck Society for the Advancement of Science http://www.mpg.de
Lockheed Martin http://www.lockheedmartin.com
Duke University http://www.duke.edu
Framestore Computer Film Company http://www.framestore-cfc.com
nVidia http://www.nvidia.com/
SAIC http://www.saic.com
Paralogic http://www.plogic.com/
Singapore Computer Systems Limited http://www.scs.com.sg/
Hughes Network Solutions http://www.hns.com
University of Washington, Computer Science http://www.cs.washington.edu
Experian http://www.experian.com
L'Universite de Geneva http://www.unige.ch
Purdue Physics Department http://www.physics.purdue.edu/
Atos Origin Engineering Services http://www.aoes.nl/
Teraport http://www.teraport.se
Daresbury Laboratory http://www.dl.ac.uk
Grid computing
• Working with key cluster software developers from research and
industry to standardize cluster tools within the Global Grid Forum
(GGF).
CITRIS Cluster
• Goal is to build a production level cluster
environment that supports and is driven by
CITRIS applications
– NOW mostly experimental
– Millennium ½ developmental ½ production
• Clusters adopted as primary compute platform
– ~800 current Millennium users
– 65% average CPU utilization on Millennium cluster,
many times 100% utilization
– 50% of top 20 PACI users compute on Linux clusters
for development and production runs.
Foundry
8000
Campus
Core
2 Frontend Nodes
2
2
Foundry
1500
100
1TFlop 1.6TB memory
100 Dual Itanium
Compute Nodes
Foundry
8000
10
10
10 Storage Nodes
10
1 Gigabit Ethernet
Myrinet
Fibre Channel
100
50TB Fibre Channel
Storage
Myrinet
2000
Steve Brenner Project
Large Molecular Sequence and
Structure Databases
• These databases are in gigabytes
• They provide web services in which low latency is
important
• They often work remotely
• The campus 70Mbit limit is increasingly saturated,
making it impossible to effectively provide
services and do the work
• They need tele/video conferencing over IP
Background of the
Brain Imaging Center at Berkeley
• Campus-wide resource dedicated to Functional
Magnetic Resonance Imaging (FMRI) research
• Non-invasive “neuroimaging” technique used to
investigate the blood flow correlates of neural
activity
• BIC houses a Varian 4 Tesla scanner and
Neuroimaging Computational Facility providing
collaboration among neuroscientists, physicists,
chemists, statisticians, ee and cs scientists
Currant LAN
• Due to high volume of data, we established
high speed connections between computers
in buildings around the campus
• LAN consists of two Cisco Catalyst 6500
switches connected with optic fiber and
communicate at Gigabit Ethernet speed
• Workstations connected to network at Fast
Ethernet speed (100 Mbits/sec, full duplex)
WAN Needs
• Geographically distributed collaborative
researchers and immense data sets make high
speed networking a priority.
• Collaborations exist between researchers at
UCSD, UCSF, UC Davis, Stanford, Varian Inc.
and NASA Ames.
• With spiral imaging, we will soon be capable of
generating data in excess of 1MB/s per scanner
1/
1
1/ /19
29 9
2/ /19 9
26 9
3/ /19 9
26 9
4/ /19 9
23 9
5/ /19 9
21 9
6/ /19 9
19 9
7/ /19 9
17 9
8/ /19 9
14 9
9/ /19 9
11 9
10 /19 9
/9 9
11 /19 9
/6 9
12 /19 9
/4 99
/
1/ 19 9
1/ 9
1/ 20
29 0
2/ /20 0
26 0
3/ /20 0
25 0
4/ /20 0
22 0
5/ /20 0
20 0
6/ /20 0
17 0
7/ /20 0
15 0
8/ /20 0
12 0
/ 0
9/ 20 0
9
10 /20 0
/7 0 0
11 /20
/4 0
12 /20 0
12 /1/2 00
/3 0 0
0
1/ /2 0 0
27 0
2/ /20 0
24 0
3/ /20 1
24 0
4/ /20 1
21 0
5/ /20 1
19 0
6/ /20 1
16 0
7/ /20 1
14 0
8/ /20 1
11 0
/ 1
9/ 20 0
8/ 1
20
01
Normalized Units Sold Value
NASDAQ vs. O'Reilly Tech Book Sales at Amazon
January 1, 1999 through September 30, 2001
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
Normalized O'Reilly Unit Sales at
Amazon
0.2
0.1
Normalized NASDAQ Index Value
0.1
0
0
Date
CITRIS Network in Smart Classroom