Презентация PowerPoint - INSTITUTE OF ASTRONOMY

Download Report

Transcript Презентация PowerPoint - INSTITUTE OF ASTRONOMY

Basic Propositions of the
RVO Information
Infrastructure Project
On behalf of the RVOII project report co-authors
Leonid Kalinichenko
Institute of Informatics Problems of RAS
[email protected]
RVO Information Infrastructure Project
Report




In May 2005 in Russia the RVO Information Infrastructure (RVOII)
project report has been published as a result of joint efforts of the
Special Astrophysical Observatory of RAS (SAO RAS), of the Institute
of Astronomy of RAS (INASAN) and the Institute of Informatics
Problems of RAS (IPI RAS) supported by one year grant of the Russian
Foundation for Basic Research (RFBR).
RVOII is aimed at integrated representation of information in various
problem domains of astronomy and support of scientific problems solving.
The project report contains analysis of various kinds of astronomical
information resources accumulated in around the World and specifically
in Russia, analysis of technological and architectural recommendations of
the IVOA, analysis of classes of scientific problems that need the VO
facilities, analysis of of correspondence and sufficiency of the IVOA
standards for the identified RVO activities; analysis of existing
components and services that can be re-used for RVOII implementation.
Based on the analysis performed, a structural design of the RVO
information infrastructure has been developed. Strategically the program of
development of RVOII is oriented on tight coordination of works with the
activity on the development of the International VO.
Talk outline
o
o
o
o
o
o
o
o
o
o
o
o
Objectives of the RVO project
Representation of problem domains in natural
sciences in information systems
Information Resources in Astronomy; Russian
astronomical resources
Analysis of Projects of Virtual Observatories
Information Infrastructure Forming Standards
Classes of astrophysical problems for VO
Virtual observatory architecture according to IVOA
Subject mediation infrastructure planned for
problem domains representation in RVO
Information infrastructure of RVO
AstroGrid as the core of the RVOII infrastructure
First trial of AstroGrid community centre
Analysis of possibility of extending of AstroGrid with
subject mediation facilities
Objectives of the RVO project (1)
Main objectives of the RVO project :




to provide the Russian astronomical community with the facilities of
integration of the Russian astronomical resources into the VO;
to provide the Russian astronomical community with the facilities of
integrated access to the data accumulated in the International
astronomical data resources;
to provide the Russian astronomical community with the facilities of
problem domains definition for solving of various classes of the
astronomical problems, computational facilities, facilities for
information analysis and data mining, facilities for automation of
scientific research in astronomy;
to support a set of standards agreed with the international
community and providing for the interoperability of heterogeneous
data and facilities for the problem solving;
Objectives of the RVO project (2)





to develop strategically important classes of astronomical problems
based on the VO technology and develop processes (workflows) and
mediators for the respective research support;
to develop organizational measures for development and usage of
the VO technology in Russia agreed with the international
community, for coordinating of the Astronomical Data Centers in
Russia and abroad, for coordination of research based on the VO
technology;
to develop a set of measures for creation of RVO as an important
educational resource for the Russian Universities;
to form in Russia the sustainable community of astronomers
actively using VO in their scientific research;
to contribute to the high level of research based on VO technology
in Russia in the strategically important areas of astronomy.
Subject Domain in Natural Science
Material System Def in NL Semantics
Domain Terminology and Concepts
(abstract, methodological, concrete)
Semantics of T1…Tn constituents
Interpretations
Theory (Model) 1. T1 Signature
(attributes, types, classes, processes)
T1 Measurable Characteristics
[simulators]
(attributes, types, classes, procs)
Concretization A of T1
T2, … , Tn measuConcretization B of T1
rable characteristics
…
Simulation
Observations, simulations,
Explaining, forecasting
measurements for T1
Theories (Models)
T2, … , Tn
Observable/Measurable
Characteristics
Methods and Instruments for observation, experimentation, measurement,
data analysis, discovery
Problems, methods of solutions,
algorithms, programs, workflows
From the Report to the President of USA
“Computational Science: Ensuring America’s
Competitiveness”
The President’s Information Technology Advisory Committee (PITAC) in May
2005 completed the report where it states:
“No single researcher has the skills required to master all the computational and
application domain knowledge needed to gather data from databases or
experimental devices, create geometric and mathematical models, create new
algorithms, implement the algorithms efficiently on modern computers, and
visualize and analyze the results. To model such complex systems faithfully
requires a multidisciplinary team of specialists, each with complementary
expertise and an appreciation of the interdisciplinary aspects of the system,
and each supported by a software infrastructure that can leverage specific
expertise from multiple domains and integrate the results into a complete
application software system.
Computational researchers need enabling, scalable, interoperable application
software to conduct examinations of their ideas and data”.
Information Resources in
Astronomy
o World-wide resources overview
o Optical surveys and catalogs
o Infrared and radio range surveys
o Archives of observations
o Data centers
o Surveys
o Robotic Telescopes
o Russian astronomical resources
From Tera to Petabytes





Large Synoptic Survey Telescope (LSST) ranging from Earth's vicinity to the
edge of the optical universe.
It will reach 24th mag in 10 seconds, and will survey up to 14,000 square
degrees three times per month. Over a period of years, 30,000 square
degrees will be surveyed in multiple bands and the co-added images will
go to 27th magnitude.
High technology in microelectronics, large optics fabrication and metrology,
and software.
Comparing the LSST (8.4 m) telescope with the SDSS, and allowing also for
its increased pixel sampling and resolution, the advantage in figure of merit
is by a factor of close to 200
Data products will consist of photometric catalogs which will be
continuously updating during the survey, a moving object database, images
in at least 5 bands (updated on a regular schedule), the huge time-tagged
processed image database, totally will climb to around 15 Petabytes.
Russian astronomical resources
Main providers of astronomical data in Russia:











Special Astrophysical Observatory of RAS (SAO RAS)
Sternberg Astronomical Institute of the Moscow State University (SAI MSU)
Main (Pulkovo) Astronomical observatory of RAS (MAO)
Institute of Applied Astronomy (IAA RAS)
Institute of Terrestrial Magnetism, Ionosphere and Radiowave Propagation of
the RAS (IZMIRAN)
Institute of Solar-Terrestrial Physics of the Siberian Branch of Russian
Academy of Sciences (ISTP SB RAS)
Space Research Institute (IKI) of the RAS
Astronomical Institute of Saint-Petersbourg State University (AI SPbSU)
Ural State University (USU)
Puschino Radioastronomical Observatory of Astro Space Center of the LPI
RAS (PrAO ASC LPI RAS)
Russian Robotic Telescopes
Russian astronomical resources
Russian and fSU astronomical data resources classified by subject
Subject
Number of resources
Number of institutions
Stellar systems
7
3
Stars
22
9
Solar system
21
8
Sun
23
8
Radioastronomy
7
4
Cosmic rays
4
3
Multi subject archives
7
5
TOTAL
91
19 (Russia and fSU)
Analysis of Projects of Virtual
Observatories
o NVO
o AstroGrid
o EURO-VO
EURO-VO Participants








French VO, as represented by the Centre de Données
astronomiques de Strasbourg (CDS), Strasbourg, France
European Southern Observatory, Garching, Germany
European Space Agency, Paris, France
UK AstroGrid Consortium, as represented by the University
of Edinburgh, Edinburgh, UK
German Astrophysical Virtual Observatory (GAVO), as
represented by the Max Planck Institute for Extraterrestrial
Physics (MPE), Garching, Germany
Istituto Nazionale di Astrofisica, Rome, Italy
Nederlandse Onderzoekschool voor Astronomie, Leiden,
The Netherlands
Laboratorio de Astrofísica Espacial y Física Fundamental,
Madrid, Spain
Information Infrastructure
Forming Standards




The OAI-Protocol for Metadata Harvesting (OAI-PMH) defines a
mechanism for harvesting records containing metadata from repositories
A Web service is defined as a standardized way of integrating Web-based
applications using the XML, SOAP, WSDL, and UDDI open standards over
an Internet protocol backbone
Grid technology

Compute/File Grid

Information Grid

Hybrid Grid

Semantic Grids
Web Services Resource Framework (WSRF) to make grid resources
accessible within a web services architecture.
Classes of astrophysical problems
for VO
o Class of problems solvable applying database search technique
o Classes of general problems for VO (cosmology, formation
and development of galaxies, formation and evolution of stars,
sun and planets, etc.)
o Theoretical research and VO (VirtU – the Virtual Universe
project as an example)
o Co-existence of theoretical and observational archives and
services in VO
The relationship between the
TVO, TOI and AstroGrid
From the Report to the President of USA
“Computational Science: Ensuring America’s
Competitiveness”
Astrophysical scientific problems mentioned in the PITAC Report:
Discovering Brown Dwarves via Data Mining
Scientists creating the NVO confirmed the existence of the new brown dwarf in 2003.
The new discovery was quite unexpected from data that had been publicly available for at
least 18 months. NVO researchers emphasized that a single new brown dwarf discovered ,
Is not as scientifically significant as the rapidity of the new discovery and the tantalizing
hint it offers for the potential of NVO.
Dark Matter, Dark Energy, and the Structure of the Universe
A team at the University of Illinois has conducted large-scale cosmological
computational simulations that show the distribution of cold dark matter in a model of
cosmic structure formation incorporating the effects of a cosmological constant (Lambda)
on the expansion of the universe. The simulation contained 17 million dark matter particles
in a cubic model universe that is 300 million light-years on a side.
Supernova Modeling
The TeraScale Supernova Initiative (TSI) , a national, multi-institution,
multidisciplinary collaboration of astrophysicists, nuclear physicists, applied
mathematicians, and computer scientists. TSI’s principal goals are to understand the
mechanism(s) responsible for the explosions of core collapse supernovae and all the
phenomena associated with these stellar explosions.
Requirements for scientific results
publishing
To publish means to make data/service products in repositories available through
services that are accessible via a VO supplied sites






To allow independent checks of conclusions based on theoretical results,
reproducing certain results.
To allow comparisons with similar results/methodologies or with the
corresponding data by observers/theoreticians.
To make theoretical results more easily accessible and understandable
for observers.
Journals may require links to actual data products and/or software used
in published work.
To allow querying of publications, real and simulated data products in a
uniform manner (joint queries on a structured content items and on metadata –
on observations and publications)
To check observable classes as interpretations of theories (models), to make
analysis of inconsistencies of observations and theoretical models.
Data Mining as a part of PSE






Two basic classes of models: predictive and descriptive
Predictive: one of the observational features is chosen as the target. The
model provides a way of calculating the target as a function of the rest of the
features: Y=F(X1, … ,Xn). Two approaches – classification (predicts a class
to which an object may belong with a certain probability) and regression
(predicts a value of the target). (Naïve Bayes, Adaptive Bayes, Support Vector
Machines (SVM), regression, searching for essential attributes, etc.)
Descriptive: a) Clusterization applying certain criteria of similarity (in
contrast with classification features and classes of partitioning are unknown),
b) Associative model (looking for stable associations)
For each model many algorithms exist (classification and regression decision
trees, genetic algorithms, neuron nets, discriminant analysis, enhanced Kmeans, O-cluster, association search etc.)
Technology of data mining: 1) problem statement, 2) data preparation, 3)
model development and choosing the algorithm, 4) evaluation and
interpretation. Not all models allow interpretation (e.g., neuron nets). But if
rules are applied, they give a way for interpretation
Problem statements are required !
VO architecture according to IVOA
o VO architecture overview
o Data Modeling
o A unified domain model for astronomy, for use in VO
o Data model for quantity
o IVOA Observation data model
o Simple Spectral Data Model
o Simulation Data Model
o Unified Content Descriptors
o Metadata Registries for VO
o VOTable Format Definition
o Data Access Layer
o DAL Architecture
o Simple Image Access Protocol Specification
o Simple Spectral Access Specification
o IVOA Query Language
o IVOA SkyNode Interface
International Virtual
Observatory Alliance Partners














AstroGrid (UK) (http://www.astrogrid.org);
Australian Virtual Observatory (http://avo.atnf.csiro.au);
Astrophysical Virtual Observatory (EU) (http://www.euro-vo.org);
Virtual Observatory of China (http://www.china-vo.org);
Canadian Virtual Observatory (http://services.cadc-ccda.hia-iha.nrccnrc.gc.ca/cvo/);
German Astrophysical Virtual Observatory (http://www.g-vo.org/);
Hungarian Virtual Observatory (http://hvo.elte.hu/en/);
Italian Data Grid for Astronomical Research
(http://wwwas.oat.ts.astro.it/idgar/IDGAR-home.htm);
Japanese Virtual Observatory (http://jvo.nao.ac.jp/);
Korean Virtual Observatory (http://kvo.kao.re.kr/);
National Virtual Observatory (USA) (http://us-vo.org/);
Russian Virtual Observatory (http://www.inasan.rssi.ru/eng/rvo/);
Spanish Virtual Observatory (http://laeff.esa.es/svo/);
Virtual Observatory of India (http://vo.iucaa.ernet.in/~voi/).
IVOA Infrastructure Controversies
(just one example)
1.
2.
3.
4.
Euro-VO and NVO objectives: how to consolidate them
and support with a complete system of standards
Controversies in understanding of what Data Centre is
(e.g., CDS vs AstroGrid definitions)
Absence of a Data Centre concept in the IVOA standards
Controversy between SkyQuery idea and Data Centres
Subject mediation infrastructure
for problem domains
representation in RVO
o Information sources integration approaches
o Principles of subject mediation
o Subject mediation tools
Information sources integration
approaches



Virtual integration:
 Formation of a global schema as a result of integration of
pre-selected set of source schemas (Global as View)
 Global schema is defined independently of existing
sources as a subject domain schema (Local as View)
Materialized integration (data warehouses)
Combined methods (GLAV, applying partial materialization)
Subject Mediator Concept
There exist two principally different approaches to the problem of integrated
representation of multiple information resources for a researcher solving
scientific problems:
1) moving from resources to a problem (an integrated representation of
multiple resources is created independently of the problem) and
2) moving from a problem to the resources (a description of a problem class
subject domain (in terms of concepts, data structures, functions and processes
of problem solving) is created, in which the relevant to the problem resources
are mapped).
The first approach (used in SkyQuery) is not scalable with respect to the number
of resources, global schema becomes not observable by researcher,
completeness of information is doubtful.
To implement the second approach a mediation technology is to be created. The
mediator supports an interaction between a researcher and resources applying a
description of the problem class subject domain (description of the mediator).
Subject mediator approach (new technology) is considered as a part of
RVOII.
Mediator Definition as a Subject
Metainformation Consolidation
For the mediator's scalability two separate phases of the mediator's
functioning are distinguished: consolidation and operational.
•On the consolidation phase the efforts of the scientific community
are focused on the mediator subject definition by declaring its
metainformation. The metainformation created at the consolidation
phase constitutes a definition of the subject domain of the mediator.
•During the operational phase arbitrary information collections can
be registered at the mediator expressed in terms of the mediator.
Process of the registration is autonomous and can be done by
collection providers independently of each other. Users of the
mediator know only the metainformation defining the mediator’s
subject and formulate their queries in terms of the mediator’s
subject.
Advantages of subject domain
mediation
Semantic integration of heterogeneous information collections
is reached
2. Users should know only subject definitions consolidated by a
community
3. Information providers can disseminate their information for
integration independently of each other and at any time.
4. Autonomous information collections are absolutely independent
of the mediators and their consolidated metainformation
definitions
5. Users have integrated access to all information registered up to
the moment of a query.
6. Mediators form recursive structure. Multiple subjects can be
semantically integrated defining mediators of the higher level.
1.
Subject mediation tools
(operational phase)
Portal
Web
Browser
Web
Web
Page
Page
Application
Client
1
Application Server
Servlets/
JSP
2
Mediator
1
EJB /
WS
2
6
Metadata
Access
ADQL2SYFS
3
6
Supervisor
Rewriter
3
3
Planner
Collection
Adapter
5
Collection
Synth2Oracle
3
4
4
Oracle 10g
Metainformation
Repository
3
7
SOAPWrapper
4
Registration
Client
Collection
Adapter
9
Collection
Collection
4
Tool
Adapter
5
Software
Tools
Data
Repository
Information infrastructure of the
RVO
o Basic principles for the RVO infrastructure
o The RVO layered infrastructure
o Components of RVO
Basic principles for the RVO
infrastructure





Basic RVO infrastructural principle is to represent the architecture as a
network of interoperating web services (Grid services as soon as
suitable OGSA DAI or WSRF standard will mature). a multilevel
hierarchy of services is the basis for the RVO architecture. The handling
of remote and virtual data sources should be provided. The core will be
set of simple, low level services that are easy to implement even by small
projects. Thus the threshold to join the VO will be low. Large data
providers may be able to implement more complex, high-speed services
as well. The services can be combined into more complex
compositions that talk to several services, and create more complex
results.
Move processing to the data is another principle motivated by large
volume of the data and data intensive character of VO applications.
Modular architecture that encourages code reuse and composition is
another guiding principles for the RVO infrastructure.
Conventional practice of applying global as view approach to data
integration in the VO projects (e.g., SkyQuery) looks as not scalable.
Emphasizing subject mediators to support representation and access to
various subject domains in astronomy is a basic RVO principle.
The RVO layered infrastructure
Researcher/
Problem
Layer
Simulators
Mediators
Data Analysis
Programs
Collaboratory
Data Spaces
Workflow
Support
Virtual
Observatory
Searchable
Metadata
Registry
Catalogs
Warehouse
Integrated
Catalog
Search
Access
Services
Data
Analysis
Facilities
Computational
Grid Facilities
Portals and
Workflow
Support
Data
Centers
Searchable
Metadata
Registries
Catalogs
Warehouse
Integrated
Catalog
Search
Access
Services
Data
Analysis
Facilities
Computational
Grid Facilities
Portals and
Workflow
Support
Resource
Layer
Local
Metadata
Registries
Catalogs
Local
Catalog
Search
Access
Services
Data
Analysis
Facilities
Interface for
Integrated
Search
Portals and
Workflow
Support
Ground
Layer
Archives
Simulations
Telescopes
Publications
Searchable metadata registries
at Data Center and Virtual
Observatory layers
SAO Data Center Infrastructure
INASAN Data Center Infrastructure
RVO Infrastructure
The RVO layered infrastructure
Researcher/
Problem
Layer
Simulators
Mediators
Data Analysis
Programs
Collaboratory
Data Spaces
Workflow
Support
Virtual
Observatory
Searchable
Metadata
Registry
Catalogs
Warehouse
Integrated
Catalog
Search
Access
Services
Data
Analysis
Facilities
Computational
Grid Facilities
Portals and
Workflow
Support
Data
Centers
Searchable
Metadata
Registries
Catalogs
Warehouse
Integrated
Catalog
Search
Access
Services
Data
Analysis
Facilities
Computational
Grid Facilities
Portals and
Workflow
Support
Resource
Layer
Local
Metadata
Registries
Catalogs
Local
Catalog
Search
Access
Services
Data
Analysis
Facilities
Interface for
Integrated
Search
Portals and
Workflow
Support
Ground
Layer
Archives
Simulations
Telescopes
Publications
AstroGrid as the core of the
RVOII infrastructure
AstroGrid as the architectural core for
implementation of RVOII

Analysis shows that usage of AstroGrid as the RVOII core
provides for implementation of the RVOII principles (such as
modularity of the architecture, grid interoperability of
services, possibility of re-use and composition of services,
development of multilayered architecture). Components of
AstroGrid are analyzed to be directly applicable as the RVOII
architecture core:







Registry – for metadata based resource registration and search,
MySpace – for management of sharable by researchers and tools data
spaces,
Workbench – for the VO user interface during problem solving,
Community – for administration and management of VO users,
JES – for the workflow engine,
CEA – for constructing of interoperable applications (services);
DSA – for a facility of data storage functionality inclusion into
AstroGrid on the required level of system (task) implementation
AstroGrid existing and planned
components
CLI
Portal
Workbench
Science Application
Tools
Dataset Access
Workflow
Registry
VObs Support
Sevices
Community
Resource
Discovery
Agent
Framework
Data Mining
Framework
Visualization
Framework
MySpace
Astronomer
Interface
Auth/Auth Security
Virtual
Observatory
Infrastructure
Middleware
Grid & Web Services Middleware
Astronomical Datasets
Legend:
Existing Component
AstroGrid-2 Component
External Component
Data
Community centre in Moscow (IPI RAS)
for support of scientific astronomical
problem solving over distributed
repositories of astronomical information


One of the first steps of implementation of RVOII is installation of
Community centre in Moscow (at IPI RAS) for support of scientific
astronomical problem solving over distributed repositories of astronomical
information (containing data of observations, problem solving results,
services for data and knowledge analysis). This Centre is positioned at the
top layer of RVOII providing for its immediate usage for problem
solving by scientists in astronomy..
The Centre has been created in October 2005 as an installation of the
AstroGrid (1.1), developed recently in the UK and generously provided by
the authors to be used for RVO.
First trial: application of AstroGrid
for data analysis for the distant
galaxy discovery problem
Superposition of radio images contours
and optical images in Aladin
RVO facilities as a part of the
International VO
Data Center
SAO RAS
AstroGrid RVO
(IPI RAS)
Tools for astrophysical
problems definition
Metainformation of AstroGrid
information sources
Data Center
INASAN
Tools for management of
problem solving
Information Grid
AstroGrid
Leicester
(Great Britain)
AstroGrid
Edinburgh
(Great Britain)
Data Center
Strasbourg
(France)
Data Centers in
USA
Analysis of possibility of extending
AstroGrid with subject mediation
facilities
Basic preliminary decisions:








Mediators are registered in the Registry as CEA applications;
At the mediator interface the methods for providing ADQL queries and
mediator programs in a subset of the SYNTHESIS language are planned;
CEA applications can be used as functions in the mediator programs;
The results of the mediator programs are represented in a form of VOClass,
for which VOTable is a strict subset; the results are stored in MySpace;
The mediator programs can be used as tasks of the AstroGrid workflows;
Adapters are embedded into AstroGrid either by means of the built-in
application server for java applications or by means of DSA application
server;
For the mediator clients on the initial stage Portal and Workbench can be
used; On the forthcoming stages a development of specific mediator client
based on the ACR capabilities can be undertaken;
Facilities for external applications calls are planned (e.g., for data mining
facilities of Weka and/or Oracle).
Composed architecture
Clients
application
list
Applications
Servers
Registry
Aladin
Portal
Command-line CEA
reslove
application
Weka
submit
workflow
JES
Workbench
CEC
Java CEA
(m-I)
(m-II)
Mediator
client
Mediator CEA
save/load
workflow
(a-I)
MySpace
view data
(VOTable, VOClass)
save/load data
(VOTable, VOClass)
DSA CEA
Mediator
transmit query,
receive result
(VOClass)
Adapter
(a-II)
Http CEA
SIA
Links

RVOII Report
Briukhov D.O., Kalinichenko L.A., Zakharov V.N., Panchuk V.E.,
Vitkovsky V.V., Zhelenkova O.P., Dluzhnevskaya O.B., Malkov O.Yu.,
Kovaleva D.A Information Infrastructure of the Russian Virtual
Observatory (RVO). Second Edition IPI RAN, May 2005
http://synthesis.ipi.ac.ru/synthesis/publications/rvoii/rvoii.pdf

Объявление АстроГрида РВО как центра коллективного
пользования, инструкция по регистрации
http://synthesis.ipi.ac.ru/synthesis/projects/
astromedia/astroannounce
BASIC INFORMATION
TECHNOLOGY FOR VO IS
COMING.
SCIENTIFIC PROBLEM
STATEMENTS AND
MULTIDISCIPLINARY WORK ON
THEIR SOLVING APPLYING VO
IS REQUIRED
IVOA Architecture Diagram