snotelqc0305 - PRISM Climate Group
Download
Report
Transcript snotelqc0305 - PRISM Climate Group
A Probabilistic-Spatial Approach to the
Quality Control of Climate Observations
Christopher Daly, Wayne Gibson, Matthew Doggett,
Joseph Smith, and George Taylor
Spatial Climate Analysis Service
Oregon State University
Corvallis, Oregon, USA
Traditional QC Systems are
Categorical and Deterministic
Designed to Work With
Human Observing Systems
• Data subjected to categorical quality checks
– Designed to uncover mistakes
• Validity determined from test results
– Mistake = flag / toss
– No mistake = no flag / keep
Alien Electronic Devices are Invading the
Climate Observing World!
They’re Everywhere!
They’re Everywhere!
Electronic Sensors and Modern Applications
Create Challenges for Traditional QC Systems
Situation
Need
• Errors tend to be
continuous drift, rather
than categorical
mistakes
• Continuous estimates,
rather than categorical tests,
of observation validity
• Increasing usage of
computer applications
that rely on climate
observations
• Quantitative estimates of
observational uncertainty,
not just flags
More Challenges…
Situation
• Range of applications
is increasingly rapidly,
and each has a
difference tolerance for
outliers
• Data are often more
voluminous and
disseminated in a more
timely manner
Need
• Probabilistic information from
which a decision to use an
obs can be made, not upfront decision
• Automated QC methods
An Opportunity
Advances in climate mapping
technology now make it possible
to estimate a reasonably
accurate “expected value” for an
observation based on
surrounding stations.
Assumption: Spatial consistency is
related to observation validity
Useful Characteristics for a Next-Generation
Climate QC System
continuous
quantitative
probabilistic
automated
spatial
PRISM Probabilistic-Spatial QC (PSQC) System
for SNOTEL Data
Uses climate mapping technology and climate statistics to provide a
continuous, quantitative confidence probability for each observation,
estimate a replacement value, and provide a confidence interval for
that replacement.
• Start with daily max/min temperature for all
SNOTEL sites, period of record
• Move to precipitation, SWE, soil
temperature and moisture
• Develop automated system for near-real
time operation at NRCS
Climatological Grid Development
4 km
–
PRISM must produce a high-quality
estimate of temperature at each
SNOTEL station each day
–
Highest interpolation skill obtained by
using a high-quality predictive grid that
represents the long-term climatological
temperature for that day, rather than a
digital elevation grid
–
Climatological grid: 0.8 km resolution,
1971-2000
0.8 km
Oregon Annual
Leveraging Information
Content of High-Quality Climatologies
Precipitation
to Create New
Maps with Fewer Data and Less Effort
Climatology used in place of DEM as PRISM predictor grid
PRISM Regression of “Weather vs Climate”
PRISM Results
34
21D12S
21D35S
21D13S
353402
30
21D08S
5211C70E
324045CC
28
3240335C
21D14S
26
Regression
Stn: 21D12S
Date: 2000-07-20
Climate: 21.53
Obs:26.0
Prediction: 25.75
Slope: 1.4
Y-Intercept: -4.37
24
22
20
71-00 Mean July Maximum Temperature
20 July 2000 Tmax vs 1971-2000 Mean July Tmax
26.5
25.5
24.5
23.5
22.5
21.5
20.5
19.5
18.5
17.5
18
16.5
Daily Maximum Temperature (C)
32
PRISM
Parameter-elevation Regressions on Independent Slopes Model
-
Generates gridded estimates of climatic parameters
-
Moving-window regression of climate vs. elevation for each grid cell
-
-
Uses nearby station observations
Spatial climate knowledge base (KBS) weights stations in the
regression function by their climatological similarity to the target grid
cell
PRISM
Parameter-elevation Regressions on Independent Slopes Model
PRISM KBS accounts for spatial variations in climate due to:
-
Elevation
Terrain orientation
Terrain steepness
Moisture regime
Coastal proximity
Inversion layer
Long-term climate patterns
PRISM Moving-Window Regression Function
Weighted linear
regression
1961-90
Mean April
Precipitation,
Qin Ling
Mountains,
China
Rain Shadows: 1961-90 Mean Annual Precipitation
Oregon Cascades
Portland
Eugene
Mt. Hood
Dominant
PRISM KBS
Components
Mt. Jefferson
2500 mm/yr
2200 mm/yr
Elevation
Terrain orientation
350 mm/yr
Sisters
Terrain steepness
Moisture Regime
Redmond
Bend
1961-90 Mean Annual Precipitation, Cascade Mtns, OR, USA
1961-90 Mean Annual Precipitation, Cascade Mtns, OR, USA
Coastal Effects: 1971-00 July Maximum Temperature
Central California Coast
Sacramento
Stockton
34°
San Francisco
Oakland
Fremont
San Jose
Preferred
Trajectories
Elevation
Coastal Proximity
Santa Cruz
27°
20°
Monterey
Dominant
PRISM KBS
Components
Salinas
Inversion Layer
Hollister
Inversions – 1971-00 July Minimum Temperature
Northwestern California
N
Willits
Ukiah
9°
Lake Pilsbury.
10°
17°
Cloverdale
Lakeport
16°
Dominant
PRISM KBS
Components
Elevation
12°
Inversion Layer
17°
Topographic
Index
Coastal Proximity
PRISM PSQC System
Confidence Probability
(CP)
Definition of CP: Given the difference between an observation
and an expected value (residual), CP is the probability that
another observation and expected value from the same time
of year would differ by at least as much
SX
P
P
X
X
Residual distribution
+/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155)
Confidence Probability Takes into Account
Uncertainty in the System
Low Overall Skill
High Overall Skill
SX
SX
P
P
X
P
X
P
X
X = Residual (P-O)
P-value is higher for a given deviation from
the mean when Sx is large (low skill)
X
Interpreting Confidence Probability
Continuous values from 0 – 100%
0% = highly spatially inconsistent observation, reflected in
a PRISM prediction that is unusually different than the
observation
100% = highly consistent observation, reflected in a PRISM
prediction that is relatively close to the observation
Guidelines to date
CP > 30: Use observation as-is
10 < CP < 30: Blend prediction and observation
CP < 10: Use prediction instead of observation
PRISM PSQC Process
1. Create Database Records
Goal: Enter daily tmax/tmin observations for all networks into database and prepare
data
Current Actions:
1.
Ingest daily tmin/tmax observations from SNOTEL, COOP, RAWS, Agrimet,
ASOS, and first-order networks.
2.
Shift AM COOP observations of tmax to previous day (assumes standard
diurnal curve, which does not always apply).
3.
Convert units to degrees Celsius.
PRISM PSQC Process
2. Single-Station Checks
Goal: Take all QC actions possible at the single-station level, before entering the spatial QC
process.
Current Checks:
1.
Temperature observation is well above the all-time record maximum or well below the
all-time record minimum for the state – flag set and CP set to 0
2.
Maximum temperature is less than the minimum temperature – flag set and CP set to 0
3.
First daily tmax/tmin observation after a period of missing data – flag set and CP set to
0 (COOP only?)
4.
More than 10 consecutive observations with the same value (<+/-1F COOP, <+/-0.1C
others), or more than 5 consecutive zero values, is a definite flatliner – flag set and CP
set to 0
5.
5-10 consecutive observations with the same value is a potential flatliner, to be
assessed by the spatial QC system – flag set and CP unchanged
PRISM PSQC Process
3. Spatial QC System
Goal: Through a series of iterations, gradually and systematically “weed out” spatially inconsistent
observations from consistent ones
Overview:
1.
PRISM is run for each station location for each day, and summary statistics are
accumulated
2.
Once all days have been run, frequency distributions are developed and confidence
probabilities (CP) for each daily station observation are estimated
3.
These CP values are used to weight the daily observations in a second iteration of
PRISM daily runs
4.
Obs with lower CP values are given lower weight, and thus have less influence, in the
second set of PRISM predictions, and are also given lower weight in the calculation of
the second set of summary statistics
5.
CP values are again calculated and passed back to the daily PRISM runs
6.
This iterative process continues for about 5 iterations, at which time the CP values have
reached equilibrium
QC Iteration
For each station-day:
•
•
•
Run PRISM for each station location in its absence, estimating its obs for each day
PRISM omits nearby stations, singly, and in pairs, to try to better match observation
Prediction closest to obs is accepted
–
Raw PRISM variables: Observation (O), Prediction (P), Residual (R=P-O), PRISM Regression Standard Deviation (S)
Once all station-days are run:
•
Calculate summary statistics for each station for each day
–
–
–
–
•
Determine “effective” standard deviation for frequency distribution
–
•
Sigma = Max ( Rs, S, Ss, RunSD, 2 )
Calculate probability statistics for O, P, R, S, and V for each day
–
–
–
•
Mean and std dev of O (Os), P (Ps), R (Rs), and S (Ss)
+/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155)
5-day running Standard Deviation (RunSD) as a measure of day-to-day variability (time shifting)
Potential flatliners: calculate V, the ratio of station’s RunSD (set to 0.3) to that of surrounding stations
Probability statistics are p-values from z-tests
Residual Probability (RP) used as an estimate of overall Confidence Probability (CP) for an observation
Except in the case of potential flatliners, where CP = min(RP,VP)
CP used to weight stations in next iteration
Drifting sensor : MCKENZIE PASS (21E07S)
Observations and CP values, Date: 1996-02-08
Drifting sensor : MCKENZIE PASS (21E07S)
Climatology vs Observation and Prediction, Date: 1996-02-08
Warm Bias: SALT CREEK FALLS (22F04S)
Observations, Date: 2000-07-14
Warm Bias: SALT CREEK FALLS (22F04S)
Anomalies and CP values, 7-21 July 2000
14 July
Warm Bias: SALT CREEK FALLS (22F04S)
Scatter Plot: Climatology vs Observation, 14 July 2000
22F04S
Odell Lake
COOP
Tmax Observations, Date: 2000-07-14
Computing Obstacles
• Computing – currently takes about 60 hours to run PRISM PSQC
system for SNOTEL sites in the western US
– 14-processor cluster
• Disk space – we now have > 1 TB, but will probably need more
• Funds are insufficient to “do it right”
Issues to Consider
• How far can the assumption be taken that spatial
consistency equates with validity?
• Are continuous and probabilistic QC systems useful
for manual observing systems?
• Can a high-quality QC system ever be completely
automated?