Roy Williams - National e

Download Report

Transcript Roy Williams - National e

US National Virtual Observatory
Semantic Grid
+ Data Federation
Roy Williams
California Institute of Technology
NVO co-director
What is NVO?
– Standard protocols, standard data types
•
•
•
•
•
XML transfer protocol (VOTable)
Resource description (VOResource etc)
Publish/discover to federated registry (OAI)
Semantic Types (UCD)
Services: Cone search, Simple Image Access
– Computing with big data on the Grid
• Database Crossmatch
• Image Federation: Atlases
First NVO Discovery
Database Fuzzy Join
Billion Source Cross-Identification: A Computational Challenge
2MASS versus SDSS crossidentification with
- j_m as 2MASS magnitude
and
- I_mtotn as SDSS
magnitude
2MASS : j_m ,+ 15
SDSS: I_mtotn <= 18
SDSS unmatched
2MASS matched
SDSS matched
2MASS unmatched
Crossmatch Services
SDSS
database
query
NVO protocols
Crossmatch
service
2MASS
database
query
query
scientific
knowledge!
First NVO Discovery
Database crossmatch
of two massive
databases creates new
science
“The sum is
greater than
the parts”
Semantic Grid
Cone Search
• First VO standard service
• Input: RA, DEC, SR must be present
– decimal degrees J2000
• Output: VOTable of sky-located data records
– must have columns with UCDs:
POS_EQ_RA_MAIN, POS_EQ_DEC_MAIN, ID_MAIN
ID RA DEC x y z
RA=300
DEC=25
SR=0.1
Response
Request
Cone Search Registry
A collection of services that have the same shape
Request: HTTPget of shape:
URLbase RA=200&DEC=20&SR=2
ID
Response: VOTable of shape:
POS_EQ_RA_MAIN
POS_EQ
POS_EQ_DEC_MAIN
Cone Search + Density Probe
Federation of Multiple Services
baseURL
Spacing
Search radius
Density
Probe
interoperating NVO-compliant services!
Cone
Search
NVO Image Protocol
SIAP
• Specify box by position and size
• SIAP server returns relevant images
• Footprint
• Logical Name
• URL
Can choose:
standard URL:
http://.......
SRB URL
srb://nvo.npaci.edu/…..
Simple Image Access Service
• Query is sky region
• May query on image type, image geometry
• Response is VOTable of images
• Each has WCS (geometry) parameters
• Plus a URL to fetch the image
• Designed for
• Set of pointed observations (eg Hubble)
• Wide-area survey (eg Sloan)
• Image service
– Mosaicking
– Reprojection
Data Inventory Service
• What data covers a
position in the sky?
JHU/StSci
NCSA
Registry
Registry
Publish
OAI
OAI
4
Caltech
Goddard
Registry
Publish
1
Query
OAI
2
DIS
3
Data Inventory Service
Request is a
cone on the
sky
Data Inventory Service
Relevant
Images
and
Catalogs
NVSS
Image
ROSAT
catalog
Image Federation
VO Registry
R
md server for ivo://
R
VORegistry
OAI
VOResourceID
ivo://me.com/file123
Portals Tools
& Services
Aladin
OASIS
DIS
Query
service
Databases
Grid
Virtual Data
Schemas &
Service Types
VOView
Publishing
Publish
service
Fill-in forms
Visualization
Reports
What is in the Registry?
• Answer: “Entities”
• It has a global identifier ivo://…….
– Must be resolved by authority
• It has “VOViews”
– Queries return these
• …..and that’s all!
3 Views of an Entitiy
Transportation metadata:
<weight>4000 kg</weight>
<poisonous>no</poisonous>
<claws>no</claws>
<food>carrots</food>
<waste-mgmt>heavy</waste-mgmt>
“entity”
Zoo-keeper metadata:
Zoo-manager metadata:
<popularity>9</popularity>
<visitors>2500 per day</visitors>
<feeding>carrots</feeding>
<diet>carrots</diet>
<excrement>yes</excrement>
<fencing>strong</fencing>
VOResource
A mandatory form plus other supporting forms
Schemas and Service Types
• VOResource
– Entity description form
• Organzation, project, data collection, service
• Has ivo:// identifier
•
VORegion
– sky coverage form (α/δ/λ)
•
VOTable
– star catalog, image list, other tables
• OAI
– Registry harvesting
– Distributed virtual registry
• CONE
– Request-response for catalog
• SIAP
– Request-response for images
When can I
publish my own
schema to VO?
Dublin Core Metadata
Curation data for “any human creation”
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights
A name given to the resource.
An entity primarily responsible for making the content of the resource.
A topic of the content of the resource.
An account of the content of the resource.
An entity responsible for making the resource available
An entity responsible for making contributions to the content of the resource.
A date of an event in the lifecycle of the resource.
The nature or genre of the content of the resource.
The physical or digital manifestation of the resource.
An unambiguous reference to the resource within a given context.
A Reference to a resource from which the present resource is derived.
A language of the intellectual content of the resource.
A reference to a related resource.
The extent or scope of the content of the resource.
Information about rights held in and over the resource.
Dublin Core
Dublin Core is how
the VO will
interoperate with
libraries of the
world
A global metadata
standard
Prototype Registry
Organization
Data Collection
Project
Service
SIA service
VOViews
VOResource view
Dublin Core view
OAI: Open Archives Initiative
Harvesting Protocol
OAI is popular
– Ask your University librarian
Distributed Comprehensive Registry
– Harvesting
Different views for different purposes
– Six blind men and the elephant
OAI Harvesting Protocol
6 magic verbs of OAI
VO Identifiers
ivo://mydomain.com
/
Authority ID
• Registered with IVOA
• Must correspond to a registry
mySkySurvey
Resource ID
• Created by Authority
• Resolved by registry
delimiter
• URI form
• Still in flux
#
file00037.fits
Record ID
• Not known to registry
delimiter
Image Federation
Multispectral Imagery
Moffet Field California.
224 channels from 400 nm to 2500 nm
Crab Nebula.
3 channels: X-ray in blue, optical in
green, and radio in red.
Image Federation
Images of the same galaxy taken
several days apart are automatically
subtracted from one another, and
remaining bright spots may be
supernova candidates. (NEAT project)
Stacking allows detection of faint
sources. A 1-sigma detection in
each of many bands becomes a 3sigma detection.
detection
Image subtraction allows detection
of narrow-line features that are not
also wide-band (eg Hα but not Rband)
Principle Components
SDSS (5 channel)
SDSS+2MASS (8 channel)
Mosaicking and Federation
Infrared map
Mosaicking
Xray map today
• We want to mosaic different images
• We want to federate different information
Compute intensive:
flux in each pixel is carefully
distributed into a new pixel grid
Xray map last year
Federation
Every Astronomical image has a different
projection
• different pointing of the telescope
Atlasmaker
Uses Montage, Yoursky
Project
Estimate & correct Background
Co-Add
David Hockney Pearblossom Highway 1986
Project
Data
Chart
Images and Charts
Image
• Big data
Chart
• Map: sphere → plane
• FITS-WCS header
• small data
An atlas is a collection of charts
Hyperatlas is an attempt to standardize atlases
Hyperatlas
Standard naming for atlases and vcharts
TM-5-SIN-20
Vchart TM-5-SIN-20-1589
TM-5 layout
Standard Scales:
scale s means
220-s arcseconds
per pixel
Standard
Layout
Standard
Projections
HV-4 layout
SIN projection
TAN projection
Parallel Atlasmaker
Making a single Image
MPI Parallellism
• ~2% serial work (Amdahl)
• Projection is parallel
• All nodes share filespace

Making an Atlas of 1736 Images
Teragrid Distributed
• Federated Scheduling wanted
• SRB as Virtual Data Catalog
Atlasmaker Architecture
NVO Protocol
NVO/IVO
NED
Sloan
DPOSS
FIRST
[2MASS]
making
atlas
pages
Hyperatlas service
SIAP services
scale
reproject
compress
sky index
Virtual
Data
System
data
mining
VIEW Bus
federation
YourSky
VirtualSky
Oasis
Atlasmaker
Virtual Data System
User
request
Request
manager
Metadata repositories
Federated by OAI
Mosaicked
data is on
file
Data repositories
Federated by SRB
2d: Store
result &
return result
2a. Mosaicked
data is not on file
2c: Compute
on TG/IPG
2b. Get raw
data from
NVO
resources
Compute resources
Federated by TG/IPG
Atlasmaker stack
Virtual
Data
System
-- Chimera?
Atlasmaker
(script)
Mosaicking
(executables)
Montage
NVO Image Access
(service)
YourSky
web
Hyperatlas
(service)
SRB
(service)
Charts and Pages
Page – an organization for data
SIN projection
Chart – a frame for specific data
The virtual disk is 400,000 pixels wide
Background Correction
Uncorrected
Corrected
Montage Background
Correction
Project pixels
to output chart
Fit ramps on
overlap regions
Fit ramps on
projected images
Subtract from
Pixel values