Diapositiva 1 - Scope
Download
Report
Transcript Diapositiva 1 - Scope
VO-Neural Group
G. Longo – P.I.
M. Brescia – P.M.
Team
A. Corazza (models)
O. Laurino (System and models for image segmentation)
S. Cavuoti & E. Russo (Models – SVM)
N. Deniskina (Grid manager and interfacing with V.O.)
G. d’Angelo (Grid developments, JAVA clients and documentation)
M. Garofalo & A. Nocella (UML and models: PPS+NEC)
B. Skordovski & C. Donalek (models: MLP)
S. Cavuoti, E. de Filippis, R. D’Abrusco, (Test & Validation)
The problem
Band 2
Band 1
Cf. isophotal, petrosian, aperture
magnitudes
concentration indexes, shape
parameters, tc.
RA , , t , , , f
,..., , , f
p1 RA1 , 1 , t , 1 , 1 , f11,1 , f11,1 ,..., f11,m , f11,m ,..., n , n , f n1,1 , f n1,1 ,..., f n1,m , f n1,m
p2
2
2
1
.........................
1
2 ,1
1
, f12,1 ,..., f12,m , f12,m
n
n
2 ,1
n
, f n2,1 ,..., f n2,m , f n2,m
Band 3
p N RAN , N , t , 1 , 1 , f1N ,1 , f1N ,1 ,..., f1N ,m , f1N ,m ,...
D 3 m n
…..
The scientific exploitation of a multi band, multiepoch (K epochs)
survey implies to search for patterns, trends, etc. among
Band n
N points in a DxK dimensional parameter space
N >109, D>>100, K>10
The mixed blessing of data richness
Data Mining algorithms scale very badly:
– Clustering ~ N log N N2, ~ D2
– Correlations ~ N log N N2, ~ Dk (k ≥ 1)
– Likelihood, Bayesian ~ Nm (m ≥ 3), ~ Dk (k ≥ 1)
Dimensionality reduction (without a significant loss of information) is a critical need!
International Virtual
Observatory Alliance
Started in 2000
•
•
•
•
User friendly access to distributed computing
Transparent homogeneization of multiwavelenght multiepoch standards
Similar standards for real and simulated data
Common learning framework
(no need to adapt know-how’s to specific data: experimental work focused on science and not
on technicalities)
Tasks in Progress
• Data Mining Models
• MLP (Multilayer Perceptron) FANN library completed by including SOFT-MAX and
Cross-Entropy
• SVM (Support Vector Machines)
• PPS (Probabilistic Principal Surfaces)
• NEC (Negative Entropy Clustering & Dendrogram)
• Additional problems
•
•
•
•
Star/Galaxy Classification (in coll. with Caltech)
Next (Neural Extractor) for Image Segmentation and object parameters extraction
Simulation of cosmic strings signatures on Cosmic Microwave Background
N body simulations (mesh code)
Implementation of interface between
ASTROGRID and GRID- SCOPE with different CA
ASTROGRID – GRID Launcher
(N. Deniskina)
1.
2.
3.
4.
5.
6.
Forms directory on Lupalberto (i.e. executable file, input data) and wraps it
Makes connection with SCOPE U.I. (checking certificate)
Sends wrapped directory from Lupalberto to Scope U.I.
Unzips the wrapped job directory on SCOPE U.I. & forms JDL job
Sends job to GRID and waits for the results
Wraps the output and sends it to Lupalberto
Chart flow
Scientific cases in progress
•
•
•
•
Physical classification of galaxies
Search for QSO at intermediate high redshifts
Search for cosmic strings in CMB
Characterization of cosmic large scale structure
http://people.na.infn.it/~astroneural/
First results: AGN classification
(Cavuoti, D’Abrusco & D’Angelo)
Different orientations
Different parameters become significant
Different clusters in parameter space
BUT, STILL THE SAME OBJECT !
First Scientific Experiments on SCOPE GRID
SVM on AGN
dataset extracted from SDSS for automatic classification of galaxies
BoK from spectroscopically confirmed sample
SVM code implemented from LIB-SVM
13 parameters for 89.000 objects
SVM – RBF needs optimization against 2 parameters (C and g)
Maximum of classification rate must be found in a given range
110 grid points in parameter space (each at least 1 h)
110 computers in GRID-SCOPE (Na-CT-CA)
RESULTS:
First Experiment
Seyfert 1 vs Seyfert 2
Second Experiment
AGN vs non-AGN
Thanks to all the other WP’s
&
a special thank to S. Pardi
String_simulation
Velocità stringa
ß
Direzione
stringa
k
INPUT
Crea mappa dT/T
Distanza
osservatore –
stringa ?
Raggio di
smoothing r
INPUT
MAP.FITS
MAPPA
HEALPIX
dT/T
SMOOTHED.FITS
MAPPA
HEALPIX
dT/T
Crea mappa dT/T
smoothed
END PROGRAM
The problem: a huge Parameter space
Applications:
High dimensionality Massive Data sets (from astronomical survey but also any other high
dimensionality data space)
p
t
N
Any observed (simulated) datum p is
defined by a set of parameters. Ex.:
N 100
R.A
•
•
•
•
RA and dec
time
experimental setup (spatial and spectral
resolution,
limiting mag, limiting surface brightness,
etc.)
• Polarization
• Etc.
The parameter space concept is crucial to:
1. Guide the quest for new discoveries
2. Find new physical laws (patterns) in
GRID SCOPE
User
interface
Lupalberto
CEC
Output of results
myspace
Resource
Broker
resource
registry
ACR
Astrogrid
Middleware
Computational
element
Working
Node
USER
ASTROGRID
Execution of
job