HMI Presentation Format

Transcript HMI Presentation Format

JSOC Pipeline Processing Environment
Rasmus Munk Larsen, Stanford University
[email protected]
650-725-5485
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 1
Overview
•
•
•
•
•
JSOC data series organization
Pipeline execution environment
Pipeline software architecture
Co-I analysis module contribution
Pipeline Data Products
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 2
JSOC logical data organization
•
Evolved from MDI dataset concept to
–
–
•
Fix known limitations/problems
Accommodate more complex data models required by higher-level processing
Main design features
–
Separation of meta-data (keywords) and image data
•
•
•
–
No need to re-write large image files when only keywords change (lev1.8 problem)
No (fewer) out-of-date keyword values in FITS headers
Can bind to most recent values on export
Easier data access
•
•
All access in terms of (collections of) data records, which are the “atomic units” of a data series
A dataset name is a query specifying a set of data records (possibly from multiple data series):
–
–
–
–
Storage and tape management must be transparent to user
•
•
•
–
jsoc:hmi_lev0_com1_fg?recordnum=12345 (a specific filtergram with unique record number 12345)
jsoc:hmi_lev0_cam1_fg[12300-12330]
(a minute’s worth of filtergrams from camera1)
jsoc:hmi_lev1_fd_V?”T_OBS>=‘2008-11-01’ AND T_OBS<‘2008-12-01’ AND N_MISSING<100”
Chunking of data records into storage units for efficient tape/disk usage done internally
Completely separate storage and catalog (i.e. series & record) databases: more modular design
Legacy MDI modules should run on top of new storage service
Storing keywords in relational database system (Oracle)
•
•
•
Can use power of relational database to rapidly find data records
Easy and fast to create time series of any keyword value (for trending etc.)
Consequence: Data records for a given series must be well defined (e.g. have fixed set of keywords)
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 3
Logical Data Organization
JSOC Data Series
Data records for
series hmi_lev1_fd_V
Single hmi_lev1_fd_V data record
Keywords:
hmi_lev0_cam1_fg
aia_lev0_cont1700
hmi_lev1_fd_M
hmi_lev1_fd_V#12345
hmi_lev1_fd_V#12346
hmi_lev1_fd_V#12347
hmi_lev1_fd_V
aia_lev0_FE171
hmi_lev1_fd_V#12348
…
hmi_lev1_fd_V#12349
hmi_lev1_fd_V#12350
hmi_lev1_fd_V#12351
hmi_lev1_fd_V#12352
Links:
ORBIT = hmi_lev0_orbit, SERIESNUM = 221268160
CALTABLE = hmi_lev0_dopcal, RECORDNUM = 7
L1 = hmi_lev0_cam1_fg, RECORDNUM = 42345232
R1 = hmi_lev0_cam1_fg, RECORDNUM = 42345233
…
Data Segments:
hmi_lev1_fd_V#12353
…
RECORDNUM = 12345 # Unique serial number
SERIESNUM = 5531704 # Slots since epoch.
T_OBS = ‘2009.01.05_23:22:40_TAI’
DATAMIN = -2.537730543544E+03
DATAMAX = 1.935749511719E+03
...
P_ANGLE = LINK:ORBIT,KEYWORD:SOLAR_P
…
Storage Unit
= Directory
V_DOPPLER =
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 4
JSOC Series Definition (JSD)
Creating a new Data Series:
testclass1.jsd
JSD parser
SQL: INSERT INTO series_catalog
VALUES(‘testclass1’,’rmunk’,
…
SQL: CREATE TABLE testclass1 (
recnum integer not null unique,
keywd0 binary_float,
…
Oracle database
HMI Science Team Meeting – January, 2005
#======================= Global series information ===========================
Seriesname:
"testclass1"
Description:
“This is a small example of a JSOC series definition."
Author:
"Rasmus Munk Larsen"
Owners:
"rmunk"
Unitsize:
10
Archive:
1
Retention:
permanent
Tapegroup:
127
Primary Index:
#============================ Keywords =================================
# Format:
#
Keyword: <name>, link, <linkname>, <target keyword name>
# or
#
Keyword: <name>, <type>, <default value>, <format>, <unit>, <comment>
#
Keyword: "keywd0", float,
0.0f,
"%f", "unit3", "Comment3"
Keyword: "keywd1", double,
0.0,
"%lf", "unit4", "Comment4"
Keyword: "keywd2", datetime, "1970-01-01 00:00:00", "%-s", "unit5", "Comment5"
Keyword: "keywd3", timestamp, "19700101000000",
"%-s", "unit6", "Comment6"
Keyword: "keywd4", string,
"",
"%-s", "unit7", "Comment7"
Keyword: "keywd5", link, "link1", "keywd0"
Keyword: "keywd6", char,
'\0',
"%d", "unit1", "Comment1"
Keyword: "keywd7", int,
0,
"%d", "unit2", "Comment2"
#============================ Links =====================================
# Format:
#
Link: <name>, <target series>, { static | dynamic }
#
Link: "link0", "testclass0", static
Link: "link1", "testclass0", dynamic
#============================ Data segments ===============================
# Data: <name>, <type>, <naxis>, <axis dims>, <unit>, <protocol>
#
Data:
"x-axis", float, 1, 100, "m", fits
Data:
"y-axis", float, 1, 200, "m", fits
Data:
"z-axis", float, 1, 50, "m", fits
Data: "pressure", float, 3, 100, 200, 50, "kg/(s^2*m)", fitz
Data: "velocity", float, 4, 100, 200, 50, 3, "m/s", fitz
Rasmus Munk Larsen / Pipeline Processing 5
Pipeline batch processing (a.k.a. MDI mapfile)
•
•
Pipeline processing is scheduled in batches by PUI+: a data driven pipeline scheduler inherited from
MDI
A pipeline batch is a single atomic transaction:
– If no module fails all data records are commited and become visible to other clients of the archive
– If failure occurs all data records are deleted and the database rolled back
Disk
Pipeline batch = atomic transaction
Register
session
Module 1
Module 2
JSOC API
JSOC API
JSOC API
Input data
records
…
Module N
Commit Data
&
Deregister
JSOC API
JSOC API
Output data
records
JSOC ARCHIVE
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 6
Pipeline Client-Server Architecture
Pipeline client process
Analysis code
C/Fortran/IDL/Matlab
OpenRecords
CloseRecords
GetKeyword, SetKeyword OpenDataSegment
GetLink, SetLink
CloseDataSegment
JSOC Library
File I/O
Data Segment I/O
Record Cache (Keywords+Links+Data paths)
JSOC Disks
JSOC Disks
JSOC Disks
JSOC Disks
Storage unit transfer
Data Record
Management Service
(DRMS)
AllocUnit
GetUnit
PutUnit
Storage Unit
Management Service
(SUMS)
Storage unit transfer
SQL query
Oracle Database
Server
SQL query
SQL query
Series
Catalog
HMI Science Team Meeting – January, 2005
Record
Record
Catalogs
Record
Catalogs
Catalogs
Tape Archive
Service
Storage
Database
Rasmus Munk Larsen / Pipeline Processing 7
co-I contributions and collaboration
•
Contributions from co-I teams:
–
–
Software for intermediate and high level analysis modules
Output data series definition
•
–
–
–
Documentation (detailed enough to understand the contributed code)
Test data and intended results for verification
Time
•
•
•
•
Keywords, links, data segments, size of storage units etc.
Explain algorithms and implementation
Help with verification
Collaborate on improvements if required (e.g. performance or maintainability)
Contributions from HMI team:
–
–
–
Pipeline execution environment
Software & hardware resources (Development environment, libraries, tools)
Time
•
•
•
•
Help with defining data series
Help with porting code to JSOC API
If needed, collaborate on algorithmic improvements, tuning for JSOC hardware, parallelization
Verification
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 8
HMI module status and MDI heritage
Intermediate and high level data products
Primary
observables
Heliographic
Doppler velocity
maps
Mode frequencies
And splitting
Ring diagrams
Local wave
frequency shifts
Doppler
Velocity
Tracked Tiles
Of Dopplergrams
Internal rotation
Spherical
Harmonic
Time series
Time-distance
Cross-covariance
function
Wave travel times
Egression and
Ingression maps
Wave phase
shift maps
Internal sound speed
Full-disk velocity,
sound speed,
Maps (0-30Mm)
Carrington synoptic v and
cs maps (0-30Mm)
High-resolution v and cs
maps (0-30Mm)
Far-side activity index
Line-of-sight
Magnetograms
Stokes
I,Q,U,V
Full-disk 10-min
Averaged maps
Vector Magnetograms
Fast algorithm
Tracked Tiles
Vector Magnetograms
Inversion algorithm
Coronal magnetic
Field Extrapolations
Solar limb parameters
Coronal and
Solar wind models
Brightness feature
maps
Brightness Images
Tracked full-disk
1-hour averaged
Continuum maps
HMI Science Team Meeting – January, 2005
Standalone
“production” code
routinely used
Research code
currently used
Deep-focus v and cs
maps (0-200Mm)
Stokes
I,V
Continuum
Brightness
MDI pipeline
modules exist
Research code exists
in the community
Line-of-Sight
Magnetic Field Maps
Vector Magnetic
Field Maps
New codes under
development (HAO)
Instrument specific
code, Stanford is
primary developer
Rasmus Munk Larsen / Pipeline Processing 9
Questions this meeting should address
•
List of all science data products
–
–
Which data products, including intermediate ones, should be produced by JSOC?
What cadence, resolution, coverage etc. will/should each data product have?
•
–
–
•
Which data products should be computed on the fly and which should be archived?
Have we got the basic pipeline right? Are there maturing new techniques that have been overlooked?
Detailing each branch of the processing pipeline
–
–
–
•
Eventually a JSOC series description must be written for each one.
What are the detailed steps in each branch?
Can some of the computational steps be encapsulated in general tools that can be shared among different
branches (example: tracking)?
What are the computer resource requirements of computational steps?
Contributed analysis modules
–
–
Who will contribute code?
Which codes are mature enough for inclusion? Should be at least working research code now, since
integration has to begin by c. mid 2006.
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 10
Example: Global Seismology Pipeline
HMI Science Team Meeting – January, 2005
Rasmus Munk Larsen / Pipeline Processing 11