Applicable use cases of OPeNDAP

Download Report

Transcript Applicable use cases of OPeNDAP

Session 2: Using OPeNDAP-enabled Applications to
Access Australian Data Services and Repositories
eResearch Australasia 2011, ½ Day Morning Workshop, Thursday 10 th November 2011
GENERAL INFORMATION
• This is a half-day workshop (9am to 12:30pm)
• 9:00am Introductions and Participants Goals
• 9:15am Session 1: Discovering OPeNDAP data access services
• 10:00am
10:00am Session
Session 2:
of of
OPeNDAP
data
services
2: Applicable
Applicableuse
usecases
cases
OPeNDAP
data
services
− 10:30am Tea Break for 15 minutes
• 11:00am Session 3: OPeNDAP service protocols and features
• 11:45am Session 4: Accessing complementary features and services
• 12:30pm End of Workshop
Session 2
• Applicable use cases of OPeNDAP data services for data cataloging
and data access using a variety of applications and tools.
• A short tutorial exploring data access using an OPeNDAP-enabled
tool within a scripting language such as python.
• 45 minutes in length + 15 minute Tea Break at 10:30am
Spectrum of Use Cases
**DAP4 features listed is my estimation and not the official specification
**DAP4 data model
domain neutral
new data types and data
structures
streaming, compressed,
chunked
Common Data Model (CDM) Future data model
domain specific
domain neutral??
Interactive Data Viewer
IDV, Panolopy, IDL, MATLAB,
iPython (matplotlib), NCL, web
browser (metadata)
Interactive Analysis
MATLAB, IDL, iPython, NCL
Custom Application: Inudation
Modeller
Web Application
Live Access Server
IMOS Data Portal (WMS)
Custom Java Servlet
**DAP4 programming
legacy code support
**DAP4 programming
new data model and protocols
streaming support
**DAP4 programming
Asynchronous access modes,
server-side processing
NcML Data Request
aggregation, virtual data sets
**DAP4
server-side operations, async
access mode, new data model,
posting
Return file translations
file.nc.netcdf - NetCDF file
Server-side operations
file.nc?GEOLOC()
Async access mode
??
OGC data model
domain specific
geospatial, 1-D, 2-D
DAP2 data model
domain neutral
n-D, time series
Application Types
Programmatic / Langauge
API
FORTRAN, C/C++, JAVA,
Python, NetCDF, Java NetCDF
Programmatic / Tools
NetCDF, NCO, PyDAP
Custom Tools: OPeNDAP
crawler, ocean_prep
Programming
DAP2 Legacy Code
existing tools
DAP2 New Code
New tools
Data Access
Protocol
Metadata Request
das, dds, ddx
ASCII/Binary Data Request DAP Binary Object Request
Simple data representation
Syntax
Return data set info
file.nc.dds - readable
file.nc.ddx - XML
file.nc.asc - ASCII data return
Select variables
file.nc.dods?var1,var2,var3
Clients
Programmatic Access
Tsunami inudation modeller,
NetCDF,
NCO, PyDAP, PyNetCDF,
MATLAB, IDL, …
Interactive Access
Web browser - Catalog
MATLAB, IDL, Python,
Panolopy,…
Application Data
Representation
Service Capabilities
**DAP4 Response
DAP2 response
async access mode, servermetadata, dods, ASCII / Binary
side, streaming,
subset arrays
file.dods?var1(0:1:10)
Data Library & Catalog
Service
metadata harvesting
directory listings
remote THREDDS services
Web Service
Java servlet, Java applet
Analysis Service
Geospatial Information Service Live Access Server
OPeNDAP data service
NcML
Aggregation service
Virtual Data Set Service
Remote Data Access
Metadata Conversion and
RDF
metadata definitions,
translations (-> ISO) sematics,
ontalogy
CF->ISO, CF->WMS, CF->WCS
Layered Services
Catalogue service
WMS, WCS services
Authentication
Conformance checks
CF metadata check
ISO metadata check
Workshop Use-Cases
DAP2 data model
domain neutral
n-D, time series
Application Data
Representation
Application Types
Programmatic / Langauge
API
FORTRAN, C/C++, JAVA,
Python, NetCDF, Java Netcdf,
PyDAP
Programmatic / Tools
NetCDF, NCO, PyDAP
Custom Tools: OPeNDAP
crawler
Programming
DAP2 Legacy Code
existing tools:
DAP2 New Code
New tools
Data Access
Protocol
Metadata Request
das, dds, ddx
ASCII/Binary Data Request DAP Binary Object Request
Simple data representation
NcML Data Request
aggregation
Syntax
Return metadata info
file.nc.das - readable
file.nc.dds - readable
file.nc.ddx - XML metadata
file.nc.help - help info
subset arrays, return data
Select vars and return data
file.asc?var1(0:1:10)
file.nc.asc?var1,var2,var3
file.dods?var1(0:1:10)
file.nc.dods?var1,var2,var3
Return file translations
file.nc.netcdf - NetCDF file
Clients
Programmatic Access
NetCDF, NCO, PyDAP,
PyNetCDF
Interactive Access
Web browser - Catalog
Python, MATLAB, Panolopy
Service Capabilities
DAP2 response
THREDDS data service
Hyrax data service
Interactive Data Viewer
Panolopy, MATLAB, NCL, web
browser
NcML
Aggregation service
Server-side operations
file.nc?GEOLOC()
Layered Services
Catalog service
WMS
Use Case limitations
• Time to access data is dependent on the following factors:
• Hardware and network performance
• Selection of variables and dimensions
• Number of data requests to be issued
− Latency inherent in the data request
• Number of concurrent accesses to the server
Performance limitations to data delivery
Network
connection
Network
Bandwidth
Data Transfer
(MB per second)
WiFi
2 – 56 Mbps
0.2 – 5 MBps
500+
ADSL modem
2 – 14 Mbps
0.2 – 1.4 MBps
357+
Home LAN
100 Mbps
10 MBps
SATA Disk
Office LAN
1000 Mbps
Disk Array
*Elapse Time
(seconds)
50
20 – 40 MBps
12.5+
100 MBps
~5.00
120 – 240 MBps
~3.00
Backbone Ethernet
10 Gbps
1,000 MBps
~0.50
QDR Infiniband
40 Gbps
4,000 MBps
~0.12
10,000 MBps
0.05+
Lustre Parallel FS
*Time to transfer a 500 MB data object
Performance limitations to data delivery
Data Request
Data Size
*Elapse Time
(seconds)
Improved
Access
Complete File
3 x fields(3D) doubles
250 MB
178
1.0x
One 3D field
5 vertical levels
50 MB
35.7
5.0x
3 x 2D fields
30 MB
21.4
8.3x
One 2D field
(1250x1000)
10 MB
7.14
24.9x
Subset 3D field
(500 x 500 x 5 )
10 MB
7.14
24.9x
Subset 2D field
(500 x 500 )
2 MB
1.43
124x
Vertical Column
(100 x 100 x 5 )
0.4 MB
0.28
635x
*Time to transfer a data object on an ADSL2 modem = 14 Mbps
DAP-enabled client tools/applications
OPeNDAP Clients (partial list)
http://opendap.org/whatClients
To be demo’ed today.
1. Web browser returning ASCII data
2. Pydap - is a pure Python library implementation of the DAP2
3. NetCDF - is a set of software libraries and self-describing,
machine-independent data formats with interfaces to Python,
FORTRAN, C/C++, and Java languages
4. NCO – comprises a dozen standalone, command-line programs
that take netCDF files as input
5. MATLAB – session 3
6. Panoply – session 4
Web Browser demo
• “.ascii” tells the OPeNDAP service to return the data in ASCII format.
− http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111
06-ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon
• Try accessing multiple variables such as latitude
− http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111
06-ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon,lat
• What other variables are available in the file?
− Try accessing “sst” and download to ascii
Tutorial - Subsetting
Modify the variable indice (C array syntax 0..n-1 ), and request ascii data in the
web browser
•
http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/20111106ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon[10:1:50]
What happens if the middle index number is changed from “1” to “2”?
•
http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/20111106ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon[10:2:50]
Tutorial – subsetting continued
http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/20111106ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon[0:1:1439]
Add a new variable to the above URL, separated by a comma, and make a
request for ascii data in the web browser:
•
http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/20111106ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon[0:1:50],lat[0:1:30]
Now do the same thing in the form and modify the indice range
•
watch out for large indice ranges returning large amounts of data
Tutorial: .dods response
Try the binary response “.dods”
• “.dods” tells the OPeNDAP service to return the data in binary format
− http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111
06-ABOM-L4LRfnd-GLOB-v01-fv01.nc.dods?lon
This is two part binary DAP data object which contains 1) meta data,
and 2) binary data structure.
This is the typical response for OPeNDAP enabled client applications.
Pydap
Pydap is a pure Python library implementing the Data Access Protocol,
also known as DODS or OPeNDAP. You can use Pydap as a client
or server.
• http://pydap.org/
To install Pydap on Windows… see the next slide for Windows
To install Pydap on Mac OS X… see the slide for Mac OS X
To install Pydap on Linux… see the slide for Mac OS X
Pydap installation for Windows
To install Pydap on Windows …
1.
2.
3.
Install python onto Windows
Install easy_install: ez_setup.py
Install Pydap: easy_install Pydap
Pydap installation for Mac OS X
To install Pydap on Mac OS X…
1.
2.
3.
Python is install on Mac OS 10.5 and 10.6 by default
Install easy_install: ez_setup.py
Install Pydap: easy_install Pydap
Test Pydap client installation
>>> from pydap.client import open_url
>>> dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc')
>>> var = dataset['SST']
>>> var.shape
(12, 90, 180)
>>> var.type
<class 'pydap.model.Float32'>
>>> print var[0,10:14,10:14] # this will download data from the server
<class 'pydap.model.GridType'>
with data
[[ -1.26285708e+00 -9.99999979e+33 -9.99999979e+33 -9.99999979e+33]
[ -7.69166648e-01 -7.79999971e-01 -6.75454497e-01 -5.95714271e-01]
[ 1.28333330e-01 -5.00000156e-02 -6.36363626e-02 -1.41666666e-01]
[ 6.38000011e-01 8.95384610e-01 7.21666634e-01 8.10000002e-01]]
and axes
366.0
[-69. -67. -65. -63.]
[ 41. 43. 45. 47.]
More Pydap client features
See Pydap client: http://pydap.org/client.html
NetCDF API and Tools
NetCDF is a set of software libraries and self-describing, machineindependent data formats that support the creation, access, and
sharing of array-oriented scientific data.
• http://www.unidata.ucar.edu/software/netcdf/
To install, go to
• http://www.unidata.ucar.edu/downloads/netcdf/index.jsp
To use with python, build netCDF4 and its python module, or …
• easy_install netCDF4
NetCDF demo
>>> import netCDF4
>>> url = 'http://test.opendap.org/dap/data/nc/coads_climatology.nc’
>>> dataset = netCDF4.Dataset(url)
>>> var = dataset.variables['SST']
>>> var.shape
(12, 90, 180)
>>> print var[0,10:14,10:14] # this will download data from the server
<class 'pydap.model.GridType'>
with data
[[-1.26285707951 -- -- --]
[-0.769166648388 -0.77999997139 -0.675454497337 -0.595714271069]
[0.128333330154 -0.0500000156462 -0.0636363625526 -0.141666665673]
[0.638000011444 0.895384609699 0.721666634083 0.810000002384]]
>>> print var
<type 'netCDF4.Variable'>
float32 SST('TIME', 'COADSY', 'COADSX')
…
NetCDF demo
Get metadata information about the following data set:
• ncdump -h
http://opendap.bom.gov.au:8080/thredds/dodsC/nmoc/oceanmaps2_ofa
m_fc/latest/ocean_fc_20111108_000_surface.nc
NCO Tools
The netCDF Operators (NCO) comprise a dozen standalone,
command-line programs that take netCDF files as input, then
operate (e.g., derive new data, average, print, hyperslab, manipulate
metadata) and output the results to screen or files in text, binary, or
netCDF formats. NCO aids manipulation and analysis of gridded
scientific data.
• http://nco.sourceforge.net/
To install NCO tools, go to
• http://nco.sourceforge.net/#Binaries
NCO tool demo
Download the initial conditions for regional ocean model
• ncks -O -F -d xt_ocean,648,979 -d yt_ocean,467,798
http://opendap.bom.gov.au:8080/thredds/dodsC/oceanmaps_access_an
alysis_ogcm/temp/2010/ocean_an_20100312_temp.nc -o
ocean_temp_2010_03_12.nc
View the original file metadata:
• ncdump –h
http://opendap.bom.gov.au:8080/thredds/dodsC/oceanmaps_access_an
alysis_ogcm/temp/2010/ocean_an_20100312_temp.nc -o
ocean_temp_2010_03_12.nc
View the subsetted file’s metadata
• ncdump –h ocean_temp_2010_03_12.nc
NCO demo
Download the initial conditions for regional ocean model using
longitude and latitudes ranges for the dimensions
• ncks -O -F -d xt_ocean,143.55,176.66 -d yt_ocean,-28.35,4.75
http://opendap.bom.gov.au:8080/thredds/dodsC/oceanmaps_access_an
alysis_ogcm/temp/2010/ocean_an_20100312_temp.nc -o
ocean_temp2_2010_03_12.nc
Are the files the same (dimensions and lon/lat range)?
ncdump –v xt_ocean ocean_temp_2010_03_12.nc
ncdump –v xt_ocean ocean_temp2_2010_03_12.nc
Tutorial: Pick a demo to try
Please select from pydap, NetCDF, and NCO demos
1. Install the software on your machine
2. Run a test case and see if the software is installed correctly
3. Access a different file from a TDS or Hyrax data service
4. Get the metadata information
5. Get the coordinate axes data
6. Get a subset of data from an array
Thank you
Authors:
Tim F. Pugh1, James Gallagher2, Dave Fulker3
1Australian
Bureau of Meteorology, Melbourne, Australia, [email protected]
2
OPeNDAP, Butte, Montana, USA, [email protected]
3
OPeNDAP, Boulder, Colorado, USA, [email protected]