Class17_PythonWebServicesx

Download Report

Transcript Class17_PythonWebServicesx

Using Python to Retrieve Data
from the CUAHSI HIS Web
Services
Jeffery S. Horsburgh
Hydroinformatics
Fall 2015
Slides adapted from original version by Jon Goodall
University of Virginia, Hydroinformatics, Fall 2014
This work was funded by National Science
Foundation Grants EPS 1135482 and EPS
1208732
Objectives
• Discover and access data from major hydrologic
data sources
• Create reproducible data visualizations
• Write and execute computer code to automate
difficult and repetitive data related tasks
• Manipulate data and transform it across file
systems, flat files, databases, programming
languages, etc.
• Retrieve and use data from Web services
Class Plan
• Introduction (setting this material within context)
• Set up
– Software requirements
• In class demos
– Example 1: Getting the site name for a USGS NWIS station using
the CUAHSI HIS Web Services
– Example 2: Getting the minimum streamflow over the past 5
days for a USGS NWIS station using the CUAHSI HIS Web
Services
– Example 3: Making a plot of the streamflow data for the past 5
days
• Challenge problems
• Wrap up
Big Picture Context
1.
2.
3.
4.
5.
6.
7.
Data life cycle
Data modeling
Database design
Database implementation and ODM
SQL Querying of an ODM database
Python programming against an ODM database
Sharing data from an ODM using CUAHSI HIS Web services
(WaterOneFlow) and WaterML
8. Accessing CUAHSI HIS Web services using HydroDesktop
9. This week: Accessing CUAHSI HIS Web services using
Python
Set up …
Required Packages
• “suds” – a package for making requests to
SOAP web services
(https://fedorahosted.org/suds/)
• “pandas” – A data analysis library with high
performance data structures
(http://pandas.pydata.org/)
• “matplotlib” – A package for scientific plotting
(http://matplotlib.org/)
What is the “suds” package?
• “Suds is a lightweight SOAP Python client for
consuming Web Services.”
https://fedorahosted.org/suds/
• SOAP and WSDL are standards for creating web
services. You don’t need to know the details
behind these standards, but if you are interested,
Wikipedia has a good summary of both:
– SOAP: http://en.wikipedia.org/wiki/SOAP
– WSDL:http://en.wikipedia.org/wiki/Web_Services_De
scription_Language
What is the “pandas” package?
• pandas is an open source library providing high
performance data structures
• Some pandas data structures you may be
interested in:
– Series – a one-dimensional, labeled array capable of
holding any data type – axis labels are collectively
referred to as the “index”
– DataFrame – a 2-dimensional data structure with
columns of potentially different types (essentially a
high performance table object)
In class examples …
Example 1: Get the site name for a USGS NWIS gage
station using the CUAHSI HIS Web Services
• Use the GetSiteInfoObject method on the CUAHSI HIS WaterOneFlow
USGS Unit Values web service:
– http://hydroportal.cuahsi.org/nwisuv/cuahsi_1_1.asmx
• Use the suds Client object to call the web service method
– We will use siteCode = “USGSUV:10109000”
– Suds will automatically parse the WaterML response from the web
service call.
– We will need to find the siteName property in the response and print
it to the console.
• The answer for “NWISUV:10109000” is:
LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT
Example 2: Getting the minimum streamflow over the
past five days for a USGS NWIS station using the
CUAHSI HIS Web Services
•
Use the GetValuesObject method on the CUAHSI HIS WaterOneFlow web services
– http://hydroportal.cuahsi.org/nwisuv/cuahsi_1_1.asmx
•
•
Like example 1, use the suds Client object to call the web service method
We will use siteCode = “USGSUV:10109000 and ParameterCode = USGSUV:00060.
DateTimes should be in the format “YYYY-MM-DD”.
–
•
•
Example GetValuesObject web service call in a browser that returns a WaterML file:
http://hydroportal.cuahsi.org/nwisuv/cuahsi_1_1.asmx/GetValuesObject?location=NWISUV:101090
00&variable=NWISUV:00060&startDate=2015-10-20&endDate=2015-10-31&authToken=
We will extract the values and dateTimes and create a pandas Series object to
store the time series
We will use the min() and idxmin() methods on the Series object to get the
minimum streamflow and datetime when the minimum streamflow occurred.
– http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.min.html
•
Note: I tested this with Pandas version 0.17.0. If you get an error, check your
version of Pandas in the PyCharm Package Manager and upgrade the version of
Pandas if needed.
Example 3: Create a Time Series Plot of
Streamflow Values for the Past 5 Days
•
Use the GetValuesObject method on the CUAHSI HIS WaterOneFlow web services
– http://hydroportal.cuahsi.org/nwisuv/cuahsi_1_1.asmx
• Like examples 1 and 2, use the suds client object to call the web
service method
•
•
•
We will use SiteCode = USGSUV:10109000 and ParameterCode = USGSUV:00060.
DateTimes should be in the format YYYY-MM-DD.
We will extract the values and dateTimes and create a Pandas Series object to
store the time series.
We will create a “figure” object within which we can create our time series plot
– http://matplotlib.org/api/figure_api.html
•
We will use the plot() method on the pandas Series object to plot the time series.
– http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.plot.html
•
Note: I tested this with Pandas version 0.17.0. If you get an error, check your
version of Pandas in the PyCharm Package Manager and upgrade the version of
Pandas if needed.
Using these principles for other
WaterOneFlow Web services
• CUAHSI HIS Central lists other available web
services that can be accessed in a similar way
– http://hiscentral.cuahsi.org/
– There are over 400 billion observations available
through HIS Central!
• The HydroServers you created last week with
Dr. Ames can also be accessed using this same
approach
Challenge problems …
Coding Challenges
• For the same station (NWISUV:10109000), modify
the example 2 script so that it prints the daily
min, max, and average streamflow for the past 5
days.
• (time permitting) Modify your script so that it
prints out the min, max, and average streamflow
for EACH DAY during the past 5 days  this is
part of what you will need to do for Assignment
8.
Wrap up …
Pros and Cons of using Web Services
vs. a local ODM database
• Pros:
– Access to the entire database on which the service is
based without the need to store the data locally
– No need to keep local local in sync with USGS version
• Cons:
– Requires Internet connection
– Speed: Getting data via web services will almost certainly
be slower than getting data from your own database
– Data is outside your control: breaking changes, unavailable
services, etc.
– (Some of these could be improved with a data caching
strategy)
Summary
• You can use Python to automate the retrieval
of hydrologic data via web services
• The “suds” package enables you to retrieve
data as Python objects
• “pandas” has some nice data structures that
make analysis and visualization easier
• “matplotlib” allows you to make nice plots
Thursday’s Class
• Introduce Assignment 7
• Work on the assignment in class