Sciamachy features and usage with respect to end

Download Report

Transcript Sciamachy features and usage with respect to end

Sciamachy features and usage with
respect to end-users
The typical fate of retrieval people dealing with large
datasets…
C. Frankenberg, SRON team, IUP Heidelberg team
SCIAMACHY on ENVISAT, a brief introduction
SCIAMACHY
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
2
SCIAMACHY data viewer (1 orbit =300Mb)
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
5
Scientific question in my case:
Retrieval of CH4 and CO2
Spectra  vertical column densities of CO2 and CH4  xVMR(CH4)
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
6
CH4 VMR August through November 2003
Frankenberg et al., Assessing methane emissions from global space borne observations, Science 2005
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
7
Issues related to ADAGUC
•
SCIAMACHY data access, 5Gb/day
direct download from the Netherlands SCIAMACHY data center
•
Data access, binary PDS file
• No library available at that time
• Official reading tool not useful for nearly operational
retrievals
• Own C/C++ access routine was written
• Complex code structure, retrieval and data access are
difficult to separate
Too instrument specific to be of general interest in ADAGUC
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
8
Issues related to ADAGUC
•
General procedure:
1) Level 1 PDS File:
Geographic entity (usually a 60*120km rectangle)
spectra and numerous auxiliary datasets
comprises
2) Retrieval via own C++ code, results stored in so called level 2 file
3)  Level2 File (own format, so far ASCII)
Geographic entity comprises eg
CH4 total column and additional parameters such as cloud
cover, albedo, fit error, etc.
4) Generating gridded plots of the level 2 files depending on filter
criteria (eg. CloudTopHeight < 1km, fitError < 2%)
5) Compare data (raw and gridded) with other datasets (eg. Model
output, retrievals of other groups, other satellite sensors)
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
9
What is of general interest?
•
Points 3-5:
3) Output file generation (file format, no standards!)
4) Gridding and plotting data based on predefined selection criteria
5) Comparing datasets
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
10
Output file generation
•
Why ASCII?
• Human readable
• Easiest exchange between different groups (preferred format
for the comparison between SRON, IUP Bremen, IUP
Heidelberg)
• Variety of linux tools available for processing, most notably
awk
•
Drawbacks…
• Slow access, big files, files not self-describing
•
Why didn’t I use HDF/netCDF/GIS format?
• Lazy (additional work, new skills necessary)
• Awk tools not available
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
11
Gridding, projections, plotting
•
What did I use?
• Admittedly very simple methods, lat/lon box gridding with
own routines, IDL plotting/projection routines
•
What would be nice?
• Better gridding options (eg weighting by the overlapping
area)
• Data conversion tools for easier access to tools such as GMT
(Generic mapping tool)
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
12
Comparing datasets
•
a headache
• Even within SCIA: different pixel sizes
 comparing different species needs averaging to the lowest
resolution, how to do the averaging?
• Processing a lot of files is slow due to the ASCII format
•
Data exchange
• In my case only within the atmospheric community, so no
direct problems as people were experienced with the
formats, ASCII no problem anyway (but slow and large)
• What is needed for the GIS community, level 2 and/or level 3
(gridded) data?
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
13
What I find ideal…
•
Results stored in a relational database management system
(RDBMS) with extracting routines of subsets to HDF, netCDF,
ASCII
•
Why? Database systems are meant for large datasets and
complex queries to derive subsets
•
Simple example in SQL language
select avg(CH4) from results where latitude>50 and latitude
<51 … and albedo>0.2 and cloudCover<0.05
•
FAST due to indexing (tested with a test database with 5 million
entries, one query takes no time)!
Selection criteria easy (no awk necessary)
•
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
14
Even better: Spatial SQL
•
Spatial SQL: Spatial extension of the database systems (eg.
Points, polygons, etc)
•
Example syntax (Postgres):
SELECT ch4_total_column FROM results WHERE distance(
center_point, GeomFromText( 'POINT(10.0 20.0)', -1 ) ) < 100
Dumpers to eg “shape files” available:
pgsql2shp [<options>] <database> <query>
•
•
Direct connection to data viewers such as QGIS possible
•
Web interface to the interactive plotting tool mapserver
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
15
What takes most of the time?
•
SCIA data format 
Esp. level2 files for validations are far too complex and frustrate
people
•
Data filtering  plotting  interpreting  change filters  and
so forth
An interactive data viewer would be great (such as in GIS, click
on the point and you get additional information)
ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
16
Lots of time for discussion
Website for spatial RDBMS:
www.postgis.org