nam - AstroGrid wiki

Download Report

Transcript nam - AstroGrid wiki

AstroGrid
http://www.astrogrid.ac.uk
Belfast
Cambridge
Edinburgh
Jodrell
Leicester
MSSL
RAL
NAM 2001
Andy Lawrence
Cambridge
AstroGrid
http://www.astrogrid.ac.uk
Optical
Infrared
X-ray
Radio
Solar
Space Plasma
NAM 2001
Andy Lawrence
Cambridge
collectivisation
• thirty year trend.....
–
–
–
–
–
facility class (common-user) instruments
central development of data reduction s/w
calibrated archives with simple tools
information services (ADS, NED)
large consortium projects (MACHO, 2dF, SLOAN, VISTA...)
• next steps
– inter-operable archives (joint queries)
– communal exploration and analysis tools (data mining)
– information discovery tools
the archive is the sky
– large fraction of astro-papers based on archives
– HST : retrieval growing faster than ingest
– ISO : whole archive downloaded twice
30
Gbytes/Day
25
20
15
10
5
Already more retrieval than ingest!
0
1994.8
1995.3
1995.8
1996.3
Ingest
1996.8
Year
1997.3
1997.8
1998.3
Retrievals
Ingest
1998.8
1999.3
graphics from
US NVO
project
Large database science
•
•
•
•
•
Rare object searches
modelling populations
statistical manipulation
large sample monitoring
the UNKNOWN
next steps in use of archives
• inter-operability and joint queries
– e.g. retrieve Sloan, UKIDSS and XMM images from single query
– click on image and get spectrum
– give me all objects redder than X
that have no radio counterpart
but already have a spectrum
next steps in use of archives
• exploration and visualisation tools
– large image scrolling and projection
– N-d parameter space plotting
– VR headsets
next steps in use of archives
• large data-set manipulation tools
– Fourier transforms
– Finding outliers
– Data compression
– PCA analysis
next steps in use of archives
• information discovery tools
– intelligent search agents
– networked NED
the scary bit.....
• SDSS science archive a few TB
• WFCAM will produce a TB/week
• VISTA even worse...
• Peta-Byte databases coming your way ...
data intensive computing
•
•
•
•
search SuperCOS data : few hours
search VISTA DB : few months !
need clever DB structures / query memory
need parallel machines
– simple PC farms for simple queries ?
– shared memory architecture for manipulations ?
remote analysis services
• Janet delivers 10 Mb/s to door
– 10TByte dataset takes 93 days to download
• lesson : shift the results not the data
– i.e. data centres must also be service providers
•
•
•
•
data subset access
database query service
analysis tools on line OR ability to upload code
remote visualisation service
Grids
• services remote .... also distributed ?
• computational grids
–
–
–
–
web is distributed information ; grid is distributed CPU
networked users have supercomputers at their fingertips
don't even need to know where they are
like plugging into the electrical power grid
technical issues
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
data format standards
metadata and annotation standards
information exchange protocols
presentation service standards
request translation middleware
workload scheduling, resource allocation
mass storage management
computing fabric management
differentiated service network technology
distributed data management - caching, file replication, file migration
visualisation technology and algorithms
data discovery methods
search agents and AI
database structure and query methods
data mining algorithms
s/w libraries and tools for upload requests
data quality assurance (levels of club membership ?)
– all science-wide and commerce-wide issues ...
context
• Global Grids work
– basic computer science and technology development
• grids work in other sciences
– CERN grid
– Earth observation grid
• international astro-plans
– US National Virtual Observatory (NVO) project
– UK AstroGrid project
– European Astrophysical Virtual Observatory (AVO) project
AstroGrid project
•
•
•
•
developed during LTSR
proposal to PPARC October 2000
three year project
one year Phase A study
–
–
–
–
community consultation
science requirements analysis
benchmark tests
pilot database federations
• we need use-cases....
Phase B - preliminary
• uniform AstroGrid interface
• data-mining machines connected in grid
• tool for simultaneous browsing
– plus advanced visualisation, links to spectra etc.
• tools for advanced database analysis
– advanced querying, mixture fitting, statistical manipulations etc.
• tools for on-line data analysis
– statistics, model fitting
• system for uploading code
FIN