Statistics made simple - Dr. Jennifer Capers, PhD

Download Report

Transcript Statistics made simple - Dr. Jennifer Capers, PhD

Statistics made simple
Modified from Dr. Tammy Frank’s presentation, NOVA
Why do we need statistics?
• Example:
– Chemical may increase growth of animal
– Will be tested on housefly
– A colony of 20,000 houseflies are divided
into 2 groups
– Group 1 gets chemical in food
– Group 2 gets a placebo in same food
What comes next?
• 2 weeks later – take random sample of
25 house flies from each group,
measure wingspan
• What are the results?
Housefly results
• 25 houseflies from each group
• Group 1 (with chemical) – 7.5mm wingspan
• Group 2 (without) – 7.2mm wingspan
• What does this mean?
• Are group 1 flies really bigger?
• Some might say yes, some might say no
• Did you, by chance, happen to pick some larger
flies from group 2?
• Was there sampling error or bias?
• One way to be sure is to measure all
20,000 flies……not feasible
• So what do we do?
Statistics
• You say the flies are bigger, I say not
• Statistics provide rules to help us find out
• Statistics will help tell us if these are
significant (real) differences
• Is there bias? Where bigger ones in
group 2 picked by chance?
• Statistics will tell us what the chances are that
the results are due to sampling bias or random
chance
Significant Difference
• Real difference
• Due to chemical, not chance
• If test shows probability of getting
results by chance or random error is <5%,
we accept claim that chemical produced
larger fly
• If test shows that the probability of
getting results by chance or random error
is >5%, we reject claim that chemical
produced larger fly
• 5% is arbitrary cut-off point that is
generally accepted
• However, if the cost of making an incorrect
decision is very high, there will be higher cutoff like 1%
» such as research with cancer drugs, etc.
• Probability value is the p-value
• Measure of probability that the pattern we see
in our data is due to sampling error or random
chance
Scientific Method
• Remember that we cannot “prove”
anything. We can only accept or reject
a hypothesis
• A theory is the closest that a biologist
can come to “proving” a hypothesis
• Supported and validated by data and scientific
community
Null and Alternative Hypotheses
• For any experiment/survey/study, there must
be a null hypothesis and an alternative
hypothesis
• Set up so that one of them must be true, and one must be
false
• Null hypothesis (H0): = or ≤ or ≥
• Example:
– The average weight of hermit crab group A is the same as that
of hermit crab group B (=)
– OR
– The average weight of hermit crab group A is the same or
greater than that of hermit crab group B (≥)
– OR
– The average weight of hermit crab group A is the same or less
thank that of hermit crab group B (≤)
If null is true, then alternative
must be false
• Ho: average weight of hermit crab group A = average weight of
group B
• HA:
average weight of hermit crab group A ≠ average weight
of group B
Two-tailed hypotheses
• Use if you have no expectations
– You are trying to find out if weights are
different but have no reason for them to
be
• Ho :
average weight of hermit crab group A = average weight of
group B
• HA:
average weight of hermit crab group A ≠ average weight
of group B
One-tailed hypothesis
• Use if you have an expectation of the
outcome, based on previous studies or
information
• For example, previous studies have demonstrated
that Group A area has more hermit crab food that
Group B
• Ho: average weight of hermit crab group A ≤ average weight of
group B
• HA:
average weight of hermit crab group A › average weight of
group B
•
Alternative hypothesis corresponds to what you expect
• Always reject or accept the null
hypothesis, never reject the alternative
• If you accept or support the null, then don’t
mention the alternative
• If you reject the null, then accept or support
the alternative
• We never prove a hypothesis
• We just gain a measure of how confident we are
with our hypothesis
p-value
• The measure of the probability that the
pattern we see in our data is due to
random chance or sampling error
• 0.05 is the value most commonly used
• If p-value is ›0.05 (high p-value), accept null
» Weight is not significantly different
• If p-value is ≤0.05 (low p-value), reject null and
accept alternative
» Weight is significantly different
Important terms:
x = measurement value
∑ = sum of
n = sample size
df = degrees of freedom = n – 1
X = mean or average = ∑x/n
√ s2 = Standard deviation = average distance
from mean
• s2 = Variance = mean of sum of squares
•
•
•
•
•
•
• ∑(x – X)2/df
• Tells you how much your values varied from mean
– Large variance means there is large spread in data, small
variance means data points are closer to mean
• What test do you use to get p?
• Depends on what type of data you are collecting
– Measurement variable or nominal variables?
• Measurement variables
• Something that can be counted or measured
• Involves numbers
• Examples: length, weight, quantity
• What are examples of tests that can be used?
t-test
• Used to determine if two sets of data
have the same mean
• Paired t-test – when measurements are
linked
• Patient before and after using drug
• The null would state there is no difference
• Unpaired t-test – when you have before
and after within 2 different groups
• Patients with drug (group 1) and patients
without drug (group 2)
What do you do when there are
more than 2 sets of data?
• ANOVA – analysis of variance
• Null would state that the means are equal
• Example would be if you had 5 groups of
patients taking drugs at different dosages
per group
• Single factor ANOVA
• Only vary one parameter – drug dosage
• Two factor ANOVA with or without
replication
• Vary dosage and time of day
• Nominal variables
• Usually involves categories
• A nominal variable is often a word or percentage
• Examples: color, sex, genotypes
• What are examples of tests that can be used?
• Goodness of fitness test
• Chi-square