The Normal Distribution
Download
Report
Transcript The Normal Distribution
Scientific Practice
The Detail of the Normal Distribution
@UWE_JT9
@dave_lush
The Binomial Distribution
This distribution can be seen when the
outcomes have discrete values…
eg rolling dice
Assumptions…
Fixed number of trials
Independent trials
one roll cannot influence another
Two different classifications
eg we will roll the dice 10 times
rolled/didn’t roll a 12 = ‘success/failure’
Probability of success stays the same for all trials
didn’t add extra dice half way through
Rolling Dice
One die…
outcome values are 1, 2, 3, 4, 5 or 6
each equally probable (1 in 6)
distribution is…
boring!
Rolling Dice
Two dice…
outcome values are 2,3,4,5,6,7,8,9,10,11,12
each not equally probable
36 ways of making these
only 1 way to get 2 (1+1), 3 ways to get 4, etc
distribution is…
slightly less boring!
Rolling Dice
Three dice…
outcome values are
3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
each not equally probable
216 ways of making these
27 ways to throw a 10 or 11, only 1 to get a 3 or 18
distribution is…
starting to curve
Rolling Dice
Four dice…
outcome values are
4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,18,20,21,
22,23,24
1926 ways of making these
each not equally probable
distribution is…
looking familiar!
Rolling Dice
24 dice…
outcome values are 24 144
4.73838134 × 1018 ways of making these!
each not equally probable
distribution is…
looking very familiar!
still discrete outcomes
Rolling Dice
Infinite number of dice…
outcome values are no longer discrete but
continuous
the Binomial Distribution becomes known as…
…the Normal Distribution/Bell Curve
Substitute dice for something like height and…
height, being determined by the sum effect of a
large number of factors (genes, nutrition, etc)…
looks like a continuous variable
approximates the Normal Distribution
Ie its variation becomes definable/predictable
we can expect our data to behave in a certain way
The Normal Distribution
Represents the idealised distribution of a large
number of things we measure in biology
many parameters approximate to the ND
Is defined by just two things…
population mean
µ (mu)
the centre of the distribution (mean=median=mode)
population standard deviation (SD)
σ (sigma)
the distribution ‘width’ (mean point of inflexion)
encompasses 68% of the area under the curve
95% of area found within 1.96 σ either side of mean
The Normal Distribution
Is symmetrical
mean=median=mode
The Normal Distribution
One SD either side of mean includes 68% of
represented population
SD boundary is inflexion point
curvature changes direction
the ‘s’ bit
2 SD covers 95%
3 SD covers 99.7%
The Normal Distribution
All Normal Distributions are similar
differ in terms of…
mean
SD (governs how ‘spikey’ curve is)
Fig below…
4 different SDs, 2 different means
Standardising Normal Distributions
Regardless of what they measure, all Normal
Distributions can be made identical by…
subtracting the mean from every reading
dividing each reading by the SD
the mean then becomes zero
a reading one SD bigger +1
Called Standard Scores or z-scores
amazing! Different measurements same ‘view’
Standard (z) Scores
A ‘pure’ way to represent data distribution
the actual measurements (mg, m, sec) disappear!
replaced by number of SDs from the mean (zero)
For any reading, z = (x - µ) / σ
A survey of daily travel time had these results (in minutes):
26,33,65,28,34,55,25,44,50,36,26,37,43,62,35,38,45,32,28,34
The Mean is 38.8 min, and the SD is 11.4 min
To convert the values to z-scores…
eg to convert 26
first subtract the mean: 26 - 38.8 = -12.8,
then divide by the Standard Deviation: -12.8/11.4 = -1.12
So 26 is -1.12 Standard Deviations from the Mean
Familiarity with the Normal Distribution
95% of the class are between 1.1 and 1.7m tall
what is the mean and SD?
Assuming normal distribution…
the distribution is symmetrical, so mean height is
(1.7 - 1.1) / 2 = 1.4m
the range 1.1 1.7m covers 95% of the class,
which equals ± 2 SDs
one SD = (1.7 – 1.1) / 4
= 0.6 / 4
= 0.15m
Familiarity with the Normal Distribution
One of that class is 1.85m tall
what is the z-score of that measurement?
Assuming normal distribution…
z-score = (x - µ) / σ
z = (1.85m - 1.4m) / 0.15m
= 0.45m / 0.15m
=3
note there are no units
3 SDs cover 99.7% of the population
only 1.5 in 1000 of the class will be as tall/taller
a big class, with fractional students!
Familiarity with the Normal Distribution
36 students took a test; you were 0.5 SD above
the average; how many students did better?
from the curve, 50% sit above zero
from the curve, 19.1% sit between 0 and 0.5 SD
so 30.9% sit above you
30.9% of 36 is about 11
Familiarity with the Normal Distribution
Need to have a ‘feel’ for this…
Populations and Samples – a Diversion
A couple of seemingly pedantic but important
points about distributions…
population
the potentially infinite group on which measurements
might be made
don’t often measure the whole population
sample
a sub-set of the population on which measurements
are actually made
most studies will sample the population
n is the number studied
n-1 called the ‘degrees of freedom’
often extrapolate sample results to the population
Populations and Samples – so what?
The two are described/calculated differently…
μ is the population mean, x is the sample mean
σ or σn is population SD, s or σn-1 is sample SD
Calculating the SD is different for each
most calculators do it for you…
as long as you choose the right type (pop vs samp)
Populations and Samples – choosing
Analysing the results of a class test…
Analysing the results of a drug trial…
sample, since you expect the conclusions to apply
to the larger population
A national census collects information about
age
population, since you don’t intend extrapolating the
results to all students everywhere
population, since by definition the census is about
the population taking part in the survey
If in doubt, use the sample SD
and as n increases, the difference decreases
Populations and Samples – implications
The sample mean and SD are estimates of
the population mean and the population SD
ie you calculate σn-1 (or s)
If the sample observed is the population, then
the mean and SD of that sample are the
population mean and the population SD
ie you calculate σ (or σn)
Implications of Estimating Pop Mean
For a sample, the ‘quality’ of the estimate of
the population mean and SD depends on the
number of observations made
if you sampled, say, 1 member of the population,
it’s unlikely to be close to the population mean
if you sampled the whole population, your
estimate is the population mean
in between, adding extra samples will improve
estimate
sampling different amounts a variety of means
that set of means will have its own SD (!)
called the Standard Error of the Mean (SEM)
The Standard Error of the Mean
Recap…
each sampling of a distribution will produce a
different estimate of the population mean
the variation in those estimates called the SEM
Surprisingly easy to calculate
SEM = sample standard dev / square root of
number of samples
SEM = s / √ N
eg if N=16, then SEM is 4x smaller than SD
Summary
The Binomial Distribution is a basic distribution
With lots of dice Binomial Dist Normal Dist
Normal Dist fully defined just by mean and SD
Transformation to z-scores makes all NDs identical
SD calculation differs for sample vs population
eg rolling dice
sample is a subset of the whole population
population is, erm, the whole population
Estimation of population mean from a sample is
always prone to uncertainty
Standard Error of Mean (s/√N) reflects uncertainty