guo_testbed1
Download
Report
Transcript guo_testbed1
Discovery
Net
Discovery Net
Yike Guo, John Darlington (Dept. of Computing),
John Hassard (Depts. of Physics and Bioengineering)
Bob Spence (Dept. of Electrical Engineering)
Tony Cass (Department of Biochemistry),
Sevket Durucan (T. H. Huxley School of Environment)
Imperial College London
AIM
To design, develop and implement an
infrastructure to support real time
processing, interaction, integration,
visualisation and mining of massive
amounts of time critical data generated
by high throughput devices.
The Consortium
Industry Connection : 4 Spin-off companies
+ related companies (AstraZeneca, Pfizer,
GSK, Cisco, IBM, HP, Fujitsu, Gene Logic,
Applera, Evotec, International Power,
Hydro Quebec, BP, British Energy, ….)
Industrial Contribution
Hardware : sensors (photodiode arrays, hybrid
photodiodes, PMTs), systems (optics, mechanical systems,
DSPs, FPGAs)
Software (analysis packages, algorithms, data warehousing
and mining systems)
Intellectual Property: access to IP portfolio suite at no
cost
Data: raw and processed data from biotechnology,
pharmacogenomic, remote sensing (GUSTO installations,
satellite data from geo-hazard programmes) and
renewable energy data (from our own remote tidal power
systems)
High Throughput Sensing
Distributed
Reference DBs
Characteristics
Distributed
Users
Different Devices but same
computational characteristics
•Data intensive &
• Data dispersive
•large scale,
•heterogeneous
•distributed data
Collaborative
applications
Distributed
warehousing
Distributed
Devices
Discovery issues:
Distributed
Knowledge Discovery, Management Incremental,
Interactive Discovery & Collaborative Discovery
Information issues:
annotations
semantics, reference, integrated view of data
•Real-time data manipulation Data issues: different measurements for
same object: Data registration, normalisation,
Need to
• calibrate
• integrate
• analyse
calibration & quality control
GRID issues:
wide area, high volume,
scalability (data, users), collaboration
DNet Architecture
High Throughput Sensing (HTS) Applications
Large-scale Dynamic Realtime Decision support
Large-scale Dynamic System
Knowledge Discovery
Grid-based Data Mining, Collaborative Visualisation
Information Structuring
Information Integration & Composition,
Semantics & Domain-based Ontologies, Sharing
Distributed Data Engineering
Data Registration, Data Normalisation, Data Quality
Utilising Grid Infrastructure for HT Computing
Grid Basic Infrastructure
Globus/Cordon/SRB
Based on
Globus & ORB
Infrastructure
High Throughput Computing Services
Based on Kensington
Discovery Platform
Grid-based Knowledge Discovery
Testbed Applications
Throughput
(GB/s)
HTS Applications
Large-scale Dynamic Real- time
Decision support
Size
(petabytes)
Large-scale Dynamic System
Knowledge Discovery
Node
Number
operations
Renewable
energy
Applications
Tidal Energy
Connections to
other renewable
initiatives
(solar, biomass, fuel cells),
& to CHP and baseload
stations
1-10
1-10
>20000
Structuring
Mining
Optimisation
RT decisions
Remote
Sensing
Applications
Air Sensing,
GUSTO
Geological,
geohazard
analysis
1-100
Bio Chip
Applications
10-100
>50000
Image
Registration
Visualisation
Predictive
Modelling
RT decisions
Protein-folding
chips: SNP
chips, Diff. Gene
chips using LFII
Protein-based
fluorescent
micro arrays
1-1000
10-1000
>10000
Data Quality
Visualisation
Structuring
Clustering
Distributed
Dynamic
Knowledge
Management
Large-scale urban air sensing applications
Each GUSTO air pollution system produces 1kbit per second, or
1010 bits per year. We expect to increase the number (from the
present 2 systems) to over 20,000 over next 3 years, to reach a
total of 0.6 petabytes of data within the 3-year ramp-up.
NO simulant 6.7.2001
GUSTO
GUSTO
You are here
The useful
information comes
from time-resolved
correlations among
remote stations, and
with other
environmental data
sets.
Electrical grid
Renewables characterised by
•large number of small units,
•often in remote areas
•wireless connectivity
•fluctuating,unpredictable loading
As total exceeds 12% grid control
becomes very difficult
without RT e-grid.
There is large potential in embedded
generation renewable sources –
they will dominate in new build
(nuclear., hydro and carbon)
power stations. Decentralised power
•active management,
is the new paradigm.
•RT monitoring,
.
•RT control,
•minute to minute security,
•pan network optimisation.
•This requires very high bandwidth
•RT remote station data acquisition,
•warehousing and analysis.
The IC Advantage
The IC infrastructure: microgird for the testbed
Enddevices
devices
End
Cat 5 floor wiring
Floor
Floorswitches
switches
Central Computing
Facilities
Building Riser Fibre
Over than 12000 end devices
10 Mb/s – 1Gb/s to end devices
1 Gb/s between floors
Building
router
switches
Building
Router
Switches
workstation
cluster
Core to Building Fibre
wireless
Core Fibre
SMP
Core router
Router
Switches
Core
switches
storage
Access to
disparate offcampus sites:
IC hospitals,
Wye College
etc.
10 Gb/s to backbone
10 Gb/s between
backbone router
matrix and wireless
capability
Proposed
Firewall
Proposed
firewall
London
LondonMAN
MAN/
JANET
JANET
ICPC Resource
150 Gflops Processing
>100 GB Memory
5 TB of disk storage
£3m SRIF funding
Network upgrade
+20 TB of disk storage
2x1Gb/s to LMAN
II
(10Gb/s scheduled
2004)
+25 TB of tape storage
3 Clusters
(> 1 Tera Flops)
Particle Physics and Astronomy
Research Council (PPARC)
ASTROGRID (http://www.astrogrid.ac.uk/)
a ~£5M project aimed at building a datagrid for UK astronomy, which will form the
UK contribution to a global Virtual
Observatory
Particle Physics and Astronomy
Research Council (PPARC)
GridPP (http://www.gridpp.ac.uk/)
to develop the Grid technologies required
to meet the LHC computing challenge
collaboration with international grid
developments in Europe and the US
EPSRC Testbeds (1)
MyGrid Personalised extensible environments
for data-intensive in silico experiments in
biology
Distributed Aircraft Maintenance Environment
RealityGrid closely couple high performance
computing, high throughput experiment and
visualization
EPSRC Testbeds (2)
GEODISE : Grid Enabled Optimisation
and DesIgn Search for Engineering
CombiChem : Combinatorial Chemistry
Structure-Property Mapping
Discovery Net : High Throughput Sensing