Random Samples

Download Report

Transcript Random Samples

What is a Random Sample
(and what if its not)
©Dr. B. C. Paul 2005
Some Commentary on Random
Samples

We are using mathematical models as
surrogates for a reality we either don’t have data
for or can’t afford what it means to get the data


We’ve already discussed that we assumed normal
distribution (the t distribution is just an adaptation
with uncertainty in the stdev.)
What does it mean to say our sample was
random


1- No one cherry picked the data set (can be a
problem when visual appearance is different –
humans are born cherry pickers)
2- Value of one sample has no bearing on what the
next sample value will be
When is That Not True

When taking the sample alters the nature
of the remaining population

Example – Playing Black Jack
When a card is drawn and played the number of
that particular card in the deck is changed
 Casinos may play with several decks to more
closely approximate a random chance draw
(because the house has an advantage in a random
game)


Casinos also tend to get upset if they find that someone
is trying to recalculate the odds based on what has
played
Another Time it is not True

In the presence of spatial correlation



Spatial correlation is commonly seen in Mining Ore
Grade problems and Environmental Engineering
If I take a soil sample and find it loaded with
dioxin what are the chances that a soil sample
taken two inches away will show no dioxin?
With the random formula for variance of the
mean there are so many little samples in a truck
load of ore that every truck load of ore should
have the average grade of the deposit – IF
THINGS WERE RANDOM
Variance of Means with Spatial
Correlation


First Thing one must define how
correlation is influenced by distance and
direction.
Take the samples and create a
“Semivariogram”

Plot the average half squared difference for all
samples a distance X apart
The Semivariogram
½ squared
difference
Model Line fit to data points
from samples
½ squared difference has same units as
Sample variance and levels out at
Sample variance
Measures correlation
Using ½ the squared
Difference between
Samples a distance X
apart
Distance
½ squared difference is named
Gama (symbol – γ)
Variance of Means
Itsy Bitsy
Sample used
To plot
semivariogram
Big Block of Ore Loaded in a Truck
We know the big block of ore has
A lower stdev than the samples –
But how much lower?
Its not σ/sqrt(n)
Using Numerical Methods and
Computers
Computer creates a grid of
Points – about 25 is usually
Enough.
Computer then exhaustively
Measures all combinations of
Distances between points (all
525 of them)
For each distance it uses the
Semivariogram model to
Calculate the expected
Variability of the points
It keeps a running total and
Then calculates the average
Value of gamma.
The Variance of Big Blocks is




(
W
:
W
)

BigBlocks
samples
2
2
Remember variance is just standard deviation squared
We know the variance of samples cause we have the sample set
And have calculated it
That gamma bar thing up there is the number our computer just
Chugged out for us
Hey I can subtract even on a bad day!!
We’ll look more at Spatial Statistics Later
Randomness and Using Normal
Distribution Statistics

Use ordinary normal distribution (or T) statistics if you
are using random samples



Don’t cherry pick your samples
Don’t determine what the test is after you collect your test
statistics
Watch Out for Conditions that make a random sample
impossible to take



Cases were your sample actually changed in a noticeable way
the remaining population (the Black Jack example)
Cases were your samples are in fact related to each other by
virtue of how close and in what direction they came from (ieSpatial Correlation)
We can handle these non-random sampling events but it does
take a different mathematical model (don’t use the wrong
model)