Data capture and data storage

Download Report

Transcript Data capture and data storage

Environmental statistics
Вонр. проф. д-р Александар Маркоски
Технички факултет – Битола
2008 год.
Enviromatics 2008 - Environmental statistics
1
Introduction
• Statistical analysis of environmental data is an important task to
extract information on former and actual states of ecosystems.
The estimates are known as sample statistics and form a base for
prognoses on environmental system developments.
• Topics of statistical analysis of environmental data are
1. Data analysis for the requirements of environmental
administrations and associations (descriptive statistics,
frequency distributions, averages, variances, error corrections,
significance tests),
2. Data analysis for the requirements of different users as
companies, farmers, tourists (explanatory statistics,
multivariate statistics, time series analysis),
3. Basic research (regression and correlation analysis,
multivariate statistics, advanced statistical techniques).
Enviromatics 2008 - Environmental statistics
2
Environmental data
• Environmental data are obtained by field samples and/or
laboratory analysis.
• They are directly observed (direct observations) or indirectly
observed (due to calibration of analytical instruments and
sensors).
• Summary data are derived from statistics or by restricted
observable indicators.
• Simulated data are obtained by simulation models.
• Measurement errors and outliers have to be removed from
data sets. They will not take into account by data processing
features.
Enviromatics 2008 - Environmental statistics
3
Probability distributions of
environmental data
• Environmental data series represent the time and space varying
behaviour of environmental processes. Some indicators show a longwave cycling overlaid by short variations. Other indicators lay out
stochastic fluctuations. Some indicators represent an unique behaviour
with some peak events.
Enviromatics 2008 - Environmental statistics
4
Statistical measures
• Statistical measures of environmental data are
represented by
– averages,
– variances and
– measures of correlation.
Enviromatics 2008 - Environmental statistics
5
Averages
•
•
•
•
•
•
1. Arithmetic mean: x* = 1/n⋅Σ xi
2. Empirical median: x~
3. Empirical mode: M
4. Geometric mean: x°
5. Weighted arithmetic mean: x*g
6. Weighted geometric mean: lg x°
Enviromatics 2008 - Environmental statistics
6
Variances:
•
•
•
•
1. Range: R = xmin - xmax
2. Empirical variance: s2
3. Empirical standard deviation: s = √s2
4. Empirical coefficient of variation: v = s/x*⋅100 (%)
Enviromatics 2008 - Environmental statistics
7
Coefficients of correlation
•
•
•
•
•
1. Bivariate correlation coefficient
2. Performance index (coefficient of determination) B = r2
3. Multiple correlation coefficient
4. Multiple performance index
5. Spearman’s rank correlation (small sample size, normal
probability distribution not necessary)
Enviromatics 2008 - Environmental statistics
8
Statistical tests
• In sample statistics the characteristics of interest are often
expressed in terms of sample parameters such as average
μ or variance σ 2. Other questions arise from comparing two
or more samples. They may be expressed by the
differences of averages.
• A statistical hypothesis is a statement about the sample
distribution of a random ecological variable.
• Hypothesis testing consists of comparing statistical
measures called test criteria (or test statistics) deduced
from data sample with the values of these criteria taken on
the assumption that a given hypothesis is correct.
Enviromatics 2008 - Environmental statistics
9
Hypothesis testing
• In hypothesis testing one examines a Null hypothesis H0
against one or more alternative hypotheses H1, H2,…,Hn
which are stated explicitly or implicitly.
• To reach a decision about the hypothesis an arbitrary
significance level α is selected (0.05, 0.01 or 0.001). The
confidence coefficient ε is given by ε = 1 – α. For hypothesis
testing the test criterion (or test statistics) is set up. If this
statistic falls into the range of acceptance, then the Null
hypothesis can not be rejected.
• On the other hand, when this statistics falls into the region of
rejection, then the Null hypothesis is rejected. The
probability of the test statistic falling in the region of rejection
is equal to ε. It is expressed in %-values.
Enviromatics 2008 - Environmental statistics
10
Procedure for hypothesis testing
• The Null hypothesis H0 and an alternative hypothesis H1
have to be formulated.
• The significance level α has to be selected. The test statistic
is chosen. The region of rejection of the test statistic on the
basis of its probability distribution and the significance level
is determined.
• Test statistic is calculated from data set. The Null hypothesis
is rejected and the alternative hypothesis is accepted when
the value of the test statistic falls into the region rejection.
• The Null hypothesis is accepted if the value of test statistic
does not fall into the region of rejection.
Enviromatics 2008 - Environmental statistics
11
Example
• From sampled data an average m was calculated and is
now compared with an expected value K (a fixed number).
• The Null hypothesis H0: m = K is tested against the
alternative hypothesis H1: m ≠ K. The significance level α =
0.05 is selected and the test statistic is chosen:
• t = |m - K|/s ⋅√n. If the test statistic falls into the region of
acceptance of the Null hypothesis, that means tα/2 < t < t1α/2, H0 cannot be rejected. T
• he power of the test depends on sample size n. The bigger
the sample size (more information is available), the stronger
the confidence of the test.
Enviromatics 2008 - Environmental statistics
12
t – Test (Student – Test)
• The test statistic tcalc = |x* - μ0|/s⋅√n,
–
–
–
–
where x* - sample mean,
μ0 – expectation value of the ensemble,
s – standard deviation,
n – sample size.
• Decision: Acceptance if tcalc < ttab, otherwise rejection.
Enviromatics 2008 - Environmental statistics
13
Comparison of means (t-test)
• The test statistic t = |x* - x**|/sd ⋅√n*⋅n** / (n* + n**), where
x* - first sample
• mean, x** – second sample mean, s* – first standard
deviation, s** – second
• standard deviation, n* – first sample size, n** – second
sample size, n-1 – degrees
• of freedom and sd = √((n*-1)s*² + (n**-1)s**²)/(n*+n**-2).
Decision: Acceptance
• if tcalc < ttab, otherwise rejection.
Enviromatics 2008 - Environmental statistics
14
Comparison of variances (F – Test)
• The test statistic: F = (s*/s**)2 ≥ 1, where s* is the standard
deviation of the first
• sample, s** is the standard deviation of the second sample.
Decision: Acceptance
• if Fcalc < Ftab, otherwise rejection.
Enviromatics 2008 - Environmental statistics
15
Outlier – Test (NALIMOV-Test)
• The test statistic: r = |(x+ - x*)|/s⋅√n/(n-1), where x+ is to be
expected as an outlier,
• x* is the expectation of the sample, s is the standard
deviation of the sample,
• and n – sample size. Decision: Acceptance if rcalc < rtab,
otherwise rejection
Enviromatics 2008 - Environmental statistics
16
Environmental statistics
The End
Enviromatics 2008 - Environmental statistics
17