1 - Statistical Analysis
Download
Report
Transcript 1 - Statistical Analysis
STATISTICAL
ANALYSIS
IB Biology Year 1
CURRICULUM OUTCOMES
Topic 1: Statistical analysis (2 hours)
1.1.1 State that error bars are a graphical representation
of the variability of data.
1.1.2 Calculate the mean and standard deviation of a set
of values.
1.1.3 State that the term standard deviation is used to
summarize the spread of values around the mean, and
that 68% of the values fall within one standard deviation
of the mean.
1.1.4 Explain how the standard deviation is useful for
comparing the means and the spread of data between two
or more samples.
1.1.5 Deduce the significance of the difference between
two sets of data using calculated values for t and the
appropriate tables.
1.1.6 Explain that the existence of a correlation does not
establish that there is a causal relationship between two
variables.
LET’S START WITH AN EXAMPLE
Imagine
you want to study
some aspect of bean plants.
What sorts of things could
you study?
Create an hypothesis
How will you test and
measure your hypothesis?
Obviously you can’t measure every bean
plant that exists!
Even thousands of bean plants are
unrealistic in terms of time...
We
must use samples of bean plants that
represent the entire population.
So what we do is grow
enough bean plants in
order to get a sample that
is small enough to efficiently
get our data but large
enough to represent the
population as a whole.
STATISTICS IS A BRANCH OF MATH!
It allows us to take small portions from habitats,
communities and populations and draw
conclusions about the larger population.
Stats measures the differences and relationships
between sets of data.
As for our experiment...
Small sample compared
to large population.
Depending
on our sample size,
we can draw conclusions with a
certain level of confidence.
We can be 95% confident...
We may even be 99% confident...
But nothing is 100% confident in
science (Yikes... That makes my
scientific brain hurt...TOK
application!!!).
DESCRIPTIVE STATISTICS
The
mean and the standard
deviation describe the data – they
show us a picture that helps with
interpretation of the data.
MEAN
The MEAN is the average of your
data points. It is calculated by adding
your data points together and dividing
by how many points there are.
Example:
Look at these numbers:
3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
The sum of these numbers is equal to 330
There are fifteen numbers.
The mean is equal to 330 ÷ 15 = 22
RANGE
Is the measure of the spread of data. It is
calculated by finding the difference between the
largest and smallest values.
The range can give us an idea of how variable the
data is
Example:
Largest value is 15, smallest value is 5
The range is 10 (15 – 5 = 10)
Note: very large and very small values, called
outliers, can have a very dramatic effect on the
range.
STANDARD DEVIATION (SD)
Measures
how the individual
observations of a data set are
dispersed (spread)around the mean.
We will learn how to calculate SD by
hand but usually you will use your
graphing calculator or Excel.
CALCULATING SD
Try
the example on your handout.
I find a table helps to organize the
calculation of SD
NOTE: In Biology, we are calculating
the sample standard deviation. In
math, you will calculate the
population standard deviation.
Funny s for math, Sx for biology on
your graphing calculators!
We
use standard deviation to measure
the spread of our values around the
mean.
If our data has normal distribution
(meaning our values are clustered
around the mean) then we assume
that:
About 68% of our values lie within ± 1
SD of the mean.
This number rises to 95% for ± 2 SD
from the mean.
ERROR BARS
Are
graphical representations of the variability
of the data. Error bars can show either the
range of the data or the SD on a graph.
BACK TO OUR BEAN PLANT...
A sample of 100 bean plants
Some extremes (very small, some very large)
But when plotted our data should look something
like a bell curve with the majority of our data
centred around the mean.
THE NORMAL DISTRIBUTION
A
flat bell curve indicates that the data is
spread out widely from the mean.
Thus, the standard deviation would be
large.
A
bell curve that is very
tall and narrow shows
that the data is very
close to the mean.
Thus, the standard
deviation would be
very small.
SIGNIFICANT DIFFERENCE BETWEEN TWO
MEANS
To determine if a difference between two data
sets is significant a t – test is commonly used.
A t-test compares two sets of data.
T TABLES
Along
one side of the table of critical
values of t, you see probability (p), this is
the likelihood that chance alone could
produce your results.
If p = 0.50 that means that the difference
is due to chance 50% of the time. This is
not significant.
If p = 0.05 then only 5% of the difference
is due to chance and means the 95% of the
difference is due to one set of data
actually being different from the other.
This is considered to be a significant
difference.
The
mean, standard deviation and
sample size are all used to calculate the
value of t.
On the left column you will notice the
“degrees of freedom” this is calculated by
adding the two sample sizes together
and subtracting 2.
Line up the degrees of freedom and the
0.05 level of significance and this will
give you the critical value of t for your
Compare
this critical value with the
calculated value of t
***If the calculated t value is larger than
the number on the chart then the two
groups are significantly different from
each other!
LET’S DO AN EXAMPLE TO CLEAR THE
MUD...
Ms.
Chris conducted an experiment. She
wanted to study the effect of a hair growth
product on the length of toe hair. She
measured the length the hair on the toes
of students in her biology class (Sample X)
and then she had the students apply the
growth product daily for one week and
measured the length of the hair again
(Sample Y).
Both groups were normally distributed
N
for sample X was 23
N for sample Y was 19
Ms. Chris did some fancy
math and calculated t = 2.956.
Use a level of significance of 0.05
Can we conclude that the hair
growth product resulted in
significant hair growth?
We
will be testing the null
hypothesis that is the two groups are
the same.
Step 1: Calculate the degrees of
freedom:
df= (N1+N2)-2
Df = 23+19-2 = 40
Step
2: Use the chart of critical values of t
Line up 0.05 with 40
This gives us a critical value of t to be
2.021
Step
3: Evaluate
Remember*** If the calculated t value is
larger than the number on the chart, then
the groups are significantly different from
each other
So... We were given a calculated t of 2.956
which is larger than the table value so....
We
can conclude that the two
groups are significantly different
from each other and that the
hair growth formula resulted in
significant hair growth!
TRY ANOTHER...
Are
two samples with df=17 and
a calculated t value of 1.935
significantly different at 0.05?
CORRELATION DOES NOT MEAN
CAUSATION!
We
make observations about the
world around us all the time.
We might notice that our grass turns
yellow when its soil is dry, this is a
simple observation.
We might do an experiment to see if
watering our grass prevents
yellowing.
Observing that the yellowing occurs when soil is
dry is a simple correlation, but the experiment
gives us evidence that a lack of water is the
cause of the yellowing.
Experiments provide a test that shows cause,
observations without a test only show a
correlation.
ODD EXAMPLES...
Ice cream sales and the number of shark attacks
on swimmers are correlated.
The number of cavities in elementary school
children and vocabulary size have a strong
positive correlation.
Clearly there is no real interaction between the
factors involved simply a co-incidence of the data.
APPLICATION AND CONSOLIDATION
Now
it’s time to apply what we have
learned.
Please complete the worksheet called
“Statistical Analysis Application Practice”
You can put your heads together with the
person sitting next to you
We will go over everything when you have
had a chance to try – the best way to learn
is to try!