Sample Statistics We are interested in describing this population

Download Report

Transcript Sample Statistics We are interested in describing this population

Statistical Concepts
Basic Principles
An Overview of Today’s Class
What: Inductive inference on characterizing a population
Why : How will doing this allow us to better inventory and
monitor natural resources
Examples
Relevant Readings: Elzinga pp. 77-85 , White et al.
Key points to get out of today’s lecture:
Description of a population based on sampling
Understanding the concept of variation and uncertainty
By the end of today’s lecture/readings you should understand
and be able to define the following terms:
Population parameters
Accuracy/Bias
Sample statistics
Precision
Mean
Standard Error
Variance / Standard Deviation
Confidence Interval
Steps in Conducting an Assessment using
Inventory and Monitoring
1. Develop Problem Statement—may include goals
2. Develop specific objectives
3. Determine important data to collect
4. Determine how to collect and analyze data
principles of statistics allows us to better plan how to collect the data
AND analyze it - they work in tandem
5. Collect data
6. Analyze data
7. Assess data in context of objectives
The Relation between Sampling and Statistics
Can you make perfect generalizations
from a sample to the population?
There is uncertainty in inductive inference.
The field of statistics provides techniques for
making inductive inference AND for providing means of
assessing uncertainty.
Why sample?
Inductive inference:
“…process of generalizing to the population from the sample..”
Elzinga –p. 76
Target/Statistical Population
Sample Unit
Individual objects
(in this case, plants)
Elzinga et al. (2001:76)
We are interested in describing this population:
• its total population size
• mean density/quadrat
• variation among plots
At any point in time, these
measures are fixed and a
true value exists.
These descriptive measures
are called ?
Population Parameters
The estimates of these parameters
obtained through sampling are called ?
Sample Statistics
We are interested in describing this population:
• its total population size
• mean density/quadrat
• variation among plots
How did we obtain the sample statistics?
ALL sample statistics are calculated through an estimator
“An estimator is a mathematical expression that indicates
how to calculate an estimate of a parameter from the sample
data.”
White et al. (1982)
No Way!
You do this all the time!
The Mean (average):

(standard expression, but often
denoted by a some other character)
What is the formal estimator you use?
n
y  1 / n y (i )
i 1
Which states to do what operations?
Is
y
y
A sample statistic or population parameter ?
is a sample statistic that estimates the population mean
y = population mean if all n units in the population are sampled
Estimating the amount of variability
Why?
Recall:
There is uncertainty in inductive inference.
The field of statistics provides techniques for
making inductive inference AND for providing means of
assessing uncertainty.
Two key reasons for estimating variability:
• a key characteristic of a population
• allows for the estimation of uncertainty of a sample
Think about this conceptually, before mathematically:
Recall wedn lab:
Each group collected
data from 5 4m2 plots
Did each group get
identical results?
What characteristic of
the population would
affect the level of similarity
among each groups’ samples?
How about sampling method?
Estimating the Amount of Variation within a Population
The true population standard deviation is a measure of how
similar each individual observation (e.g., number of plants
in a quadrat) is to the true mean
Populations with lots of variability will have a large standard
deviation, whereas those with little variation will have a low value
High or low?
Counts of dock from wedn lab?
What would the standard deviation
be if there were absolutely no variabilitythat is, every quadrat in the population
had exactly the same number ?
The Computation of the Standard Deviation
• key is to get differences among observations, right?
• then each difference is subtracted from the mean–
consistent with definition
First, we calculate the population variance
N
1 / N  ( Xi   )
2
i 1
Does this make sense ?
For the pop Std Dev, we take the SQRT of the Var
The Computation of the Standard Deviation
The estimator of the variance – that is what produces the
sample statistic, simply replaces N with the actual samples (n),
and the true population mean with the sample mean
n
s  (1 / n)  ( Xi  X )
2
2
i 1
The estimator of the standard dev is simply the SQRT of the var.
Because of an expected small sample bias, n-1 is usually used
rather than n as the divisor in both the var and stdev
Where Are We?
We have computed a mean value of a population and a sample
We have computed the variability of a population and a sample
We now can use the variability of the sample to tell us something
about uncertainty and the way we sampled to tell us something
accuracy.
Bias vs Precision
Bias (accuracy): Essentially, the “closeness” of a measured
value to its true value; the average performance
of an estimator
Precision:
The “closeness” of repeated measurements
of the same quantity; the repeatability of a
result.
The level of bias is a function of your sampling scheme and
estimator used. Your are in control of this!
Precision is a function of the variance of the population, and
How you sample:
• Number of samples
• Variability within samples (so quadrat SIZE and SHAPE
matters) compared to among samples
• analytical techniques
Why does Bias and Precision matter in
inventory and monitoring of natural resources?
Lets imagine monitoring the density of dock in Ron’s
pasture through time
The effect of sampling variation: a function of precision
All estimates come from the same population
So how “good” are your parameter estimates?
Lets examine this with the estimation of the population mean
What influences the reliability of the estimate of the mean value?
n
y  1 / n y (i )
i 1
n
s  (1 / n)  ( Xi  X )
2
i 1
2
Estimating the Reliability of a Sample Mean
Standard error:
the standard deviation of independent sample means
Measures precision from a single sample
(e.g., from a collection of quadrats)
Quantified the certainty with which the mean computed
from a random sample estimates the true population mean
Estimating the Reliability of a Sample Mean
Formally, the SE is a function of the standard deviation
of the sample and the number of samples
SE=s/SQRT(n)
Does this make sense?
Consider this example:
Communicating the Reliability of a Sample Mean
Confidence Intervals
Provides an estimate of precision around a sample mean
or other estimated parameter
Includes two components:
confidence interval width
confidence level: the probability that the interval
includes the true value
What’s the relation between the two?
Communicating the Reliability of a Sample Mean
Estimating the Confidence Interval
95% CI = Mean +/- 1.96(SE)
Intervals can be computed for any level of confidence
desired in a particular study
The interpretation of this chart (p. 76) should now
( or soon!) be clear
How was this computed?
Key points to get out of today’s lecture:
Description of a population based on sampling
Understanding the concept of variation and uncertainty
Ability to define (and understand) the following terms:
Population parameters
Accuracy/Bias
Sample statistics
Precision
Mean
Standard Error
Variance / Standard Deviation
Confidence Interval