Using Python to Retrieve Data from the CUAHSI HIS Web

Download Report

Transcript Using Python to Retrieve Data from the CUAHSI HIS Web

Using Python to Retrieve Data
from the CUAHSI HIS Web
Services
Jon Goodall
Hydroinformatics
Fall 2014
This work was funded by National Science
Foundation Grants EPS 1135482 and EPS
1208732
Class Plan
• Introduction (setting this material within context)
• Set up
– Software requirements
• In class demos
– Example 1: Getting the site name for a USGS NWIS station using
the CUAHSI HIS Web Services
– Example 2: Getting the minimum streamflow over the past five
days for a USGS NWIS station using the CUAHSI HIS Web
Services
• Challenge problems
• Wrap up
– Further reading
– Thursday’s class
Big Picture Context
Data life cycle -> data modeling -> database
design -> database implementation -> ODM ->
SQL Querying of an ODM -> Python
programming against an ODM -> sharing data
from an ODM using CUAHSI HIS Web services
(WaterOneFlow) and WaterML -> Accessing
CUAHSI HIS Web services using HydroDesktop ->
This week: Accessing CUAHSI HIS Web services
using Python
Set up …
Install the suds package
•
•
•
•
Start Canopy
Open the Package Manager
Search for the ‘suds 0.4-1’ package
Install this package
• Requires you to sign in with an account that has an
academic license.
• Close the package manager
Test the installation of the suds
package
• Within Canopy
– Open the Editor
– At the command line, type ‘import suds’ as shown
below
– In [1]: import suds
– If you don’t get an error, then suds is installed
correctly.
– If you do get an error, repeat the instructions to
install the suds package.
What is the suds package?
• “Suds is a lightweight SOAP Python client for
consuming Web
Services."https://fedorahosted.org/suds/
• SOAP and WSDL are standards for creating web
services. You don’t need to know the details behind
these standards, but if you are interested, Wikipedia
has a good summary of both:
– SOAP: http://en.wikipedia.org/wiki/SOAP
– WSDL:http://en.wikipedia.org/wiki/Web_Services_Descrip
tion_Language
In class examples …
Example 1: Get the site name for a USGS NWIS station using the
CUAHSI HIS Web Services
• Use the GetSiteInfo method on the CUAHSI HIS WaterOneFlow web
services
– http://river.sdsc.edu/wateroneflow/NWIS/UnitValues.asmx?op=GetSit
eInfo
• Use the suds Client object to call the web service method
– https://fedorahosted.org/suds/wiki/Documentation
• We will use siteCode = USGS:10109000.
• Suds will automatically parse the WaterML response from the web service
call.
• We will need to find the siteName property in the response and print it to
the console.
• The answer for USGS:10109000 is: LOGAN RIVER ABOVE STATE DAM, NEAR
LOGAN, UT
• I will walk you through building the code in class, but you can download an
up-to-date version of the code here.
Pros and Cons of using Web Services
vs a local ODM database
• Pros:
– Access to the entire NWIS database without the need to
store the data locally
– No need to keep local local in sync with USGS version
• Cons:
– Requires Internet connection
– Speed: Getting data via web services will almost certainly
be slower than getting data from your own database
– Data is outside your control: breaking changes, unavailable
services, etc.
– (Some of these could be improved with a data caching
strategy)
Example 2: Getting the minimum streamflow over the past five
days for a USGS NWIS station using the CUAHSI HIS Web Services
•
Use the GetValuesObject method on the CUAHSI HIS WaterOneFlow web services
– http://river.sdsc.edu/wateroneflow/NWIS/UnitValues.asmx?op=GetValuesObject
•
•
•
•
Like example 1, Use the suds Client object to call the web service method
We will use siteCode = USGS:10109000 and ParameterCode = USGS=00060.
DateTimes should be in the format YYYY-MM-DD.
We will extract the values and dateTimes and create a Pandas Series object to
store the time series.
We will use the min and idxmin methods on the Series object to get the minimum
streamflow and datetime when the minimum streamflow occurred.
– http://pandas.pydata.org/pandas-docs/dev/generated/pandas.Series.min.html
– http://pandas.pydata.org/pandas-docs/dev/generated/
•
•
I will walk through this in class, but you can download an up-to-date version of this
code here.
Note: This works with Pandas version 0.14.1 and with version 0.15.0, but may
cause errors with earlier versions of Pandas. If you get an error, check your version
of Pandas in the Canopy Package Manager and upgrade the version of Pandas if
needed.
Using these principles for other
WaterOneFlow Web services
• CUAHSI HIS Central lists other available web
services that can be accessed in a similar way
– http://hiscentral.cuahsi.org/
– There are over 400 billion observations available
through HIS Central!
• The HydroServers you created last week with Dr.
Ames can also be accessed using this same
approach.
– If you would like to do this for your class project, get
in touch with Dr. Ames.
Challenge problems …
Coding Challenges
• For the same station (USGS:10109000), modify
the example 2 script so that it prints the daily
min, max, and average streamflow for the past 5
days.
• (time permitting) Modify your script so that it
prints out the min, max, and average streamflow
for EACH DAY during the past 5 days  this is
part of what you will need to do for Assignment
8.
Wrap up …
Further reading
• Details on the CUAHSI HIS web services are available here:
http://his.cuahsi.org/wofws.html
– Read about WaterOneFlow (the WSDL) to learn more about the
methods
– Read about WaterML version 1.0 to learn more about the structure of
the responses
• The WaterML example response are particularly helpful to seeing the structure
of the response
– Note that some services will return WaterML in version 1.1 format.
The code shown in this lecture may require slight modification to work
with these services.
Thursday’s Class
• Dr. Horsburgh will be leading class.
• Introduce Assignment 8.
• Work on the assignment in class.