Transcript Title
Project Plan Task 8
and
VERSUS2 Installation problems
Anatoly Myravyev and Anastasia Bundel,
Hydrometcenter of Russia
March 2010
Task 8: Statistical features like
confidence intervals and the
Bootstrap method
Formal definition of confidence
intervals (CIs):
• Estimation of an unknown value defines a
distribution Р corresponding to a random
sample X from the population ={Р}.
• If for a given α>0 there exist random
variables = (α, Х) such that P(– <
< +) 1– α, then the interval (– , +) is
called the confidence interval for of level
1– α.
• The random interval contains the unknown
value , which is not random.
The statistical problem lies
in the construction of CIs
• Cases with known probability distribution
function of the population:
parametric CIs
• Cases where the pdf is not known:
non-parametric CIs
Parametric CIs
• Normal distribution assumption is most frequent.
The underlying sample must be an iid-sample
(independent and identically distributed).
• Pluses:
– Easy and not computer-intensive
• Minuses:
– Cannot be used for scores with non-normal
distributions without some normalization (proportions,
odds ratio, correlation coefficients, …), or require
complicated calculation formulas
Non-parametric CIs
• Construction of artificial datasets from a given
collection of real data by resampling the
observations.
• Pluses:
– Highly adaptable to different testing situations because
no assumptions regarding an underlying theoretical
distribution of data are required
– Computational ease
• Minuses:
– The assumptions for sample statistics must not be
overlooked: representativeness, iid
Bootstrapping
• Operates by constructing the artificial data
using sampling with replacement from the
original data (Efron 1979, Wassermann
2006)
• Highly elaborated computational technique
(R-project)
• The most common and popular
resampling method in verification (Wilks
1995)
Different bootstrap methods – how to
construct CIs from the samples obtained
•
•
•
•
•
•
•
Percentile CIs
used at present
in MET Package
Bias-corrected Cis (BSa)
Normal approximation CIs
Basic bootstrap CIs
Bootstrap-t CIs
Approximated bootstrap CIs (ABC),
etc.
A compromise between their accuracy and
computational burden must be made.
Implementation of CIs using
R package boot
• Boot is one of the required packages for R
verification package
• The intention is to introduce commands
analogous to the MySQL v_index table in a
form like
• index_booted<-boot(index(fcs,obs), 1000)
• index_ci<-(index_booted,
conf=c(0.95, 0.99),
type=c(“perc, ”bca”)
Conclusions
• The accuracy of statistical scores depends
among other things on the following:
– Sampling uncertainty
– Validity of assumptions about representativeness
and iid of the sample
– Observational uncertainty
Bayesian prediction
– Uncertainty in the physical
intervals?
processes (Gilleland, 2008)
• Different α can be used (e.g. CIs of level
0.95, 0.99, even 0.70, etc) depending on the
scope of analysis
Conclusions (2)
• In view of ambiguities about a “most precise”
method for the CI construction, we should try
several procedures on real frc and obs data
available. Both parametric and nonparametric statistics are rightful (MET
experience!)
• The decision making (what is good, what is
bad) should be performed on the multi-criteria
basis
Problems with VERSUS2 functioning
In the Hydrometcenter of Russia
Problems with VERSUS2 functioning
• Installation is done in the RedHat
environment without errors
• The new data leave traces in the MySQL
tables and the test (Pirmin-) files are acquired
• However, the data information gets lost in the
vicinity of the Data Availability tab (Model?
Date Intervals?...)
• A tutorial variant for the package is urgently
needed with valid obs and frc data
Thank you for
your attention!