RobertJ.DevelopingJWSTPipelines - stsdas

Download Report

Transcript RobertJ.DevelopingJWSTPipelines - stsdas

Developing JWST Pipelines
at STScI
Robert Jedrzejewski
Who we are
•
•
•
•
•
•
The Science Software Branch at STScI
16 members
Most have an astronomy background
6 have PhDs
Combined experience in group: 125 years
Combined experience at STScI: 200 years
What we do
•
•
•
•
•
Develop HST calibration pipelines
STSDAS/TABLES
PyRAF, PyFITS, STScI_Python
HST Exposure Time Calculators
Other smaller projects
(Gemini/GOODS/Hubble Legacy
Archive/GoogleSky/JWST Backplane
Stability…)
Development Experience
•
•
•
•
•
•
•
Python
Java
C/C++
Fortran
spp/cl
IDL
(Perl/Assembly/Tcl…)
Our preferred development
model
• Python!
• We find we can be extremely productive
writing in Python
• Speed is occasionally an issue, so we use C
extensions when necessary
• Very little pipeline code requires
performance optimization
Development style
• Use version control (subversion)
• Use regression tests + nightly builds + web
reporting tools
• Trac for problem tracking/wiki for
information dissemination
• Unit/doc tests
• Multiple platforms
(Linux/Mac/Solaris/Windows)
How we did HST pipelines
• Calfoc, calfos, calhrs, calwfpc, calwp2
– First generation pipelines, written in spp, read GEIS files
• Calstis, calnic(a/b)
– Second generation, written in C using hstio (which wraps IRAF
imio libraries) to read multiple extension FITS files
• Calacs
– Borrowed much code from calstis imaging
• Calwfc3
– Borrowed much code from calacs, calnic
• Calcos
– Third generation, written in Python (+ c where needed)
• Later pipelines were more likely to be used by IDTs for
calibrating ground test data
More on HST pipelines
• Pipeline operation is data-driven
– Calibration steps as header keywords:
• FLATCORR=PERFORM/OMIT/COMPLETE/SKIPPED
– Reference file names as header keywords
• FLATFILE=oref$g2342212_flt.fits
• This decouples some of the intelligence from the
code
– No need to rebuild code if step or reference file changes
Multidrizzle
• Multidrizzle is used by the ACS and WFPC2
pipelines to combine images with small position
offsets (dithered), removing cosmic rays
• It is a Python application that can be used with
ACS, STIS, WFPC2, NICMOS and WFC3 data
• This breaks from our ‘tradition’ of having 1
calibration pipeline program for each instrument
How we see the JWST Pipelines
• A series of calibration steps
Input stage
Reference
File
Calibration
Step
Output stage
Early design ideas
• No need to have separate pipeline programs for
each JWST instrument
– Many calibration steps depend on detector, and JWST
instruments use detectors of the same type
– We can use the same code, instead of having to
replicate it (and maintain it) in more than one place
– Some calibration steps will probably be identical for all
JWST data (e.g. the MASKCORR step, where a static
mask from a reference is applied to the DQ array of the
data)
Try not to make the mistakes we made with HST
• Use the same keywords for the same quantities
• Use the same file/association structure
• Use the same algorithms to do the same
calibration
– Unless a team shows that a given algorithm does not
work for their instrument
– Even then, try and keep as much code common as
possible, only breaking out the code that is different
– Sometimes it is possible to encapsulate the differences
in the reference files, keeping the code the same
JWST Pipelines (continued…)
• Python gives us object-oriented capabilities
– ‘input_stage’ and ‘output_stage’ are objects that encapsulate
information on their state and on how to calibrate themselves
– For example, they might be NIRSPEC IFU data objects, or MIRI
imaging data objects
– When executing a given step, they may use their own custom
method, or else defer to a method that they inherit from a more
‘generic’ datatype
– E.g. MIRI imaging data and NIRCAM imaging data may both use
the flatfield() method of the JWSTImagingData class, from which
they both inherit
JWST Pipelines (continued…)
• The inheritance hierarchy encapsulates
information about what is the same and
what is different about JWST data types
– We can mix in behaviors from different types of object,
as necessary
– But, to the extent that is possible, we try and keep as
much the same as possible
– The people who inherit this project will thank us
What goes in?
• IDTs and instrument teams at STScI will
figure out:
–
–
–
–
–
Which steps are needed, and their ordering
Which instruments/modes use the steps
What each step does
What calibration reference data are needed
What tests the code needs to pass
Facilitating the process
• Calibration data will be in a “public”
repository
• This will include:
– Code
– Test data
– Documentation
Facilitating…
• We will encourage everyone to try out our algorithms as
we develop them
• And we encourage everyone to contribute their own
algorithms
• We’ll handle keeping teams synchronized by versioning
and providing different builds
– E.g. Team A may still be testing build X, when team B needs to
test the next stage of functionality in build X.1
– When Team B is ready to test the functionality in build X.1, there
may already be build X.2 (which includes the functionality in build
X.1 as well as new functionality)
– In the end, all the teams will test the same code
Facilitating
• How do we know that the code does the ‘right’
thing?
– Teams provide test data with test results
– Then we know that the result is correct because it
reproduces team-supplied answers
– Test results could be actual data (e.g. FITS files)
• Pixels in pipeline-calibrated data should be identical within +/-
– Or results of analysis
• Aperture photometry should be the same to within +/-
Interfacing with other languages
• If teams develop code that does a lot of fancy
processing, we can try to include it by wrapping
• Python talks to C/C++ using C extensions
• An existing C function can be wrapped so that
Python objects can be passed to C/C++, and C
objects passed back to Python
• We can wrap relatively simple C functions
– Arguments are arrays or primitive datatypes
(integer/float/string…)
– No objects as arguments
– Structs are OK, as long as they are simple (flat)
– Play nice with memory
Wishlists
• We don’t need to feel constrained by HST
• What are the biggest deficiencies in HST?
– Best reference files and best calibration steps
can be determined by querying a service
• Don’t need to rely on HST archive to find these out
– Reference files can be downloaded as needed
– Even calibration code can be updated as needed
(don’t need to wait 6 months for the next
STSDAS release)
Wishlists
• Tell us what you want!
– The earlier the better
– Some aspects of the overall architecture are still
flexible
• And not just pipeline calibration code
– We are going to need tools for data analysis,
evaluation, interpretation, visualization
– Reference file generation