Statistical Analysis

Download Report

Transcript Statistical Analysis

Statistical Analysis
Topic – 1.1.1-1.1.6
Math skills requirements
Syllabus Statements
• 1.1.1: State that error bars are graphical
representations of the variability of data
• 1.1.2: calculate the mean and standard deviation of a
set of values
• 1.1.3: State that the term standard deviation is used to
summarize the spread of values around the mean and
that 68% of the values fall within one standard
deviation of the mean
• 1.1.4: Explain how the standard deviation is useful for
comparing the means and the spread of the data
between two or more samples
• 1.1.5: deduce the significance of the difference
between two sets of data using calculated values for t
and appropriate tables
• 1.1.6: Explain that the existence of a correlation does
not establish a causal relationship between two
variables.
Error Bars
• Bars on a graph only show means and can
be misleading
• Error bars show variability around the
mean
• Can be used to show range, standard
deviation or standard error
Means can look different
Junior Class Student Heights
72
Height (inches)
70
68
66
64
62
60
Boys
Girls
Student Sex
But really not be
Junior Class Student Heights
90
80
Height (inches)
70
60
50
40
30
20
10
0
Boys
Girls
Student Sex
Given a set of data can you
calculate mean and stdev?
• In calculator
• Stat key
• Edit and enter your
list(s)
• Stat key again
• Calc and 1-var stats
then specify your list
• Which one is the
mean?
• Which one is the
standard deviation?
• Use the following data
as an example
• 170, 160, 150, 175,
180, 175, 190, 165
• The mean is 170.63
• The standard deviation is 12.37 (use the s
value)
So what is the Standard Deviation?
Standard Deviation is just
• A numerical measure of the spread of the data
around the mean
• The absolute number doesn’t mean a lot
• Look at the number in relation to the mean
• If you mean is 100 and your standard deviation
is 1 then its tiny
• If your mean is 1.5 and your standard deviation
is 1 then that is pretty significant
• Rule of thumb is if Sx/mean > .20 then its getting
up there
So by definition the Standard deviation marks off
discrete intervals under a bell curve
• In a normal distribution (bell curve) remember
the 68, 95, 99.7 RULE
• 68% of observations are within 1 stdev of the
mean, 95% within 2 stdev, 99.7% within 3 stdev
• Mean of 18 stdev of 4.5 => 68% = 13.5-22.5
• Now can compare mean and spread of 2
distributions
small stdev = values tightly cluster around the
mean (little variability)
large stdev = values spread out around the
mean (large variability)
Using Standard Deviation to compare
Variability around means
Step 8: Does your data really show
an effect?
• Statistics give power to your results
• Is your result just chance or is it caused by
your Independent Variable (IV)?
• Statistics uses probability to determine
how likely it is that your results are just
random
• You should understand T-test, linear
regression analysis
Statistics: T-test
• Compares the means of two populations
which are normally distributed, with
sample size of at least 10.
• A way to tell if means of two groups are
actually different from each other. (Or
conversely looks at the amount of overlap
between the two)
• Accounts for the mean and variability of
the data
• Two tailed unpaired T-test is expected
• Not expected to calculate T
• So while we usually say that if the p value
is < .05 then there is a significant
difference
• They want you to go from a T table as
follows
t table with right tail probabilities
df\
p
0.40
0.25
0.10
0.05
0.025
0.01
0.005
0.0005
1
0.3249
1.0000
3.0776
6.3137
12.706
31.820
63.656
636.61
2
0.2886
0.8164
1.885
2.9199
4.3026
6.9645
9.9248
31.599
3
0.2766
0.7648
1.637
2.3533
3.1824
4.5407
5.8409
12.924
4
0.2707
0.7406
1.5332
2.1318
2.7764
3.7469
4.6040
8.6103
5
0.2671
0.7266
1.4758
2.0150
2.5705
3.3649
4.0321
6.8688
To calculate the df you take the total number of samples and
subtract 2
T value must exceed the value in a given cell to be that p value
Think of those p values as percentages look at p = .05 column
So back to our graphs
Junior Class Student Heights
Junior Class Student Heights
72
90
80
70
Height (inches)
Height (inches)
70
68
66
64
60
50
40
30
20
62
10
60
0
Boys
Girls
Student Sex
Boys
Girls
Student Sex
•Is there an actual difference between the means?
•Conduct T-test  if p < 0.05 then there is an actual
difference, otherwise its just a chance event
•
•
•
•
So mean boys height was 71
And mean girls height was 64
And the T value was t = 0.082
And the T critical value in the table for
p=.05 was t=2.002using appropriate
degrees of freedom
• So our t was too small meaning that the
means are NOT significantly different
Statistics: Linear Regression
• Is there a relationship between two variables
that are measured in an experiment?
• Works with scatterplots with a line of best fit
e.g. Height and weight data, age and weight
data
• Does change in one variable predict change
in another?
Statistics: Linear Regression
Conch Vital Statistics
140
140
120
120
100
100
Weight (g)
Weight (g)
Fish Vital Statistics
80
60
80
60
40
40
20
20
0
0
0
5
Length (cm )
10
0
5
10
Length (cm )
•Does change in Length predict a change in Weight?
•Is there a positive or negative correlation (slope)
•r = correlation coefficient – measures the strength of the linear
association between 2 quantitative variables
Correlation
• df = number of points in scatterplot – 2 (x
and y axis)
• Calculate r with equation or program
• Use table to determine the critical value for
the number of points you are using
• r must exceed that number for a significant
relationship (correlation) to be present
But remember
• The existence of a correlation does not
indicate causation
• So if people with bigger hands have bigger
feet that does not mean that a change in
hand size causes a change in foot size
• Rather they are both caused by something
else…