Box and Whisker Plot

Download Report

Transcript Box and Whisker Plot

Statistical Significance of Data
Box and Whisker Plot
…can be useful when dealing with many data values. Rather than showing all of
the data, it selects five statistics. Five-number summary is another name for the
visual representations of the box-and-whisker plot.
The five statistics consist of the:
• Median
• Quartiles (lower and upper)
• Minimum
• Maximum
Make a Box and Whisker Plot from these numbers:
54 68 18 93 87 27 100 91 52 85 34 61 56 78 82
1. Put the numbers in numerical order.
2. Find the Minimum (smallest value of the entire set)
3. Find the Maximum (largest value of the entire set)
4. Find the Median number (the number in the middle of the ordered set of numbers).
5. Lower Quartile (Q1) - numbers to the left of the median, find its median.
6. Upper Quartile (Q3) - numbers to the right of the median, find its median.
7. Draw a box to represent the IQR (interquartile range) and solve: (Q3 – Q1) = IQR
* If you are finding the median of an even set of numbers - find the two middle numbers,
add them together and divide by two to get the median.
0
10
20
30
40
50
60
70
80
90
100
Make a Box and Whisker Plot:
54 68 18 93 87 27 100 91 52 85 34 61 56 78 82
1. Put the numbers in numerical order. 18 - 27 - 34 - 52 - 54 - 56 - 61 - 68 - 78 - 82 - 85 - 87 - 91 - 93 - 100
2. Find the Minimum (smallest value of the entire set) 18
3. Find the Maximum (largest value of the entire set) 100
4. Find the Median number (the number in the middle of the ordered set of numbers). 68
5. Lower Quartile (Q1) - # to the left of the median, find its median. 52+54/2 = 53
6. Upper Quartile (Q3) - # to the right of the median, find its median. 85+87/2 = 86
7. Draw a box to represent the IQR (interquartile range) and solve: (Q3 - QL) = IQR 86 - 53 = 33
0
10
20
30
40
50
60
70
80
90
100
Make a Box and Whisker Plot:
46 39 10 48 46 45 51 42 49
1. Put the numbers in numerical order. 3 - 39 - 42 - 45 - 46 - 46 - 48 - 49 – 51
2. Find the Minimum (smallest value of the entire set) 10 39
3. Find the Maximum (largest value of the entire set) 51
4. Find the Median number (the number in the middle of the ordered set of numbers). 46
5. Lower Quartile (Q1) - # to the left of the median, find its median. 39 + 42/2 = 40.5
6. Upper Quartile (Q3) - # to the right of the median, find its median. 48 + 49 = 48.5
7. Draw a box to represent the IQR (interquartile range) and solve: (Q3 – Q1) = IQR 68.5 – 40.5 = 8
8. Is 10 an Outlier that should be ignored? So change the minimum to the next smallest number (39) and draw the whisker
Multiply 1.5 (IQR) = 1.5 x 8 = 12; then Q1 – 12 = 40.5 – 12 = 28.5 (10 is well below 28.5 = an outlier)
Is 51 an outlier?
Multiply 1.5 (IQR) = 1.5 x 8 = 12; then Q3 – 12 = 48.5 + 12 = 60.5 (51 is within that range = NOT an outlier)
*
0
10
20
30
40
50
60
Dot Plot
…can be useful when trying to find patterns, trends, or clusters of data.
In some cases it may show a discrepancy in data that could be due to
unavoidable error or avoidable/user error.
Using this data, create a:
1. Box and Whisker Plot
2. Dot Plot
Box and Whisker
1)
2)
3)
4)
5)
6)
1 3 4 4 4 5 5 6 7 7 7 7 8 10 10 11 11 12 13 22
Median = 7 + 7 / 2 = 7 grams of sugar/serving
Lowest = 1 gram of sugar/serving
Highest = 22 grams of sugar/serving
Lower Quartile = 4+5/2 = 4.5 grams of sugar/serving
Upper Quartile = 10+11/2 = 10.5 grams of sugar/serving
Dot Plot
0
5
10
15
20
25
Chi-Squared
Introduction:
The Chi Square test (X2) is often used in science to test if data you observe from an experiment is
the same as the data you would expect from the experiment. Calculating X2 values allow you to
determine if test results can be attributed to randomness or not. If the data differs greatly and is not
due to randomness, other factors must be influencing your results.
Objectives:
• Determine the degrees of freedom (df) for this investigation (category or class number -1) = n-1.
• Calculate the X
X2 = ∑ (observed value – expected value)
expected value
• Use the Chi Square Table to determine if the calculated value is equal to or less than the critical value.
• Determine if the Chi Square value exceeds the critical value & if the null hypothesis is accepted or rejected.
2
value for a given set of data.
Biologists generally use a Probability value of 0.05 (p = 0.05) in a Chi Square Table. That p-value
means the probability of a random error would be fewer than 1 time in 20, thus the value (p = 0.05).
This is like saying you are 95% certain that the results are a due to random chance.
Degree of Freedom is the number of choices(n) minus 1
df = n – 1
The P-Value & the Degree of Freedom are used to determine the Critical Value
If the chi square value ≤ the critical value the Null Hypothesis is accepted as statistically reasonable.
If the chi square value is > the critical value, then it is seen as a “statistically significant” difference –
meaning that the validity of the hypothesis would be under question, suggesting that the results are
“unlikely to have occurred by chance,” thus rejecting the Null Hypothesis.
Difference Between Null Hypothesis and Experimental Hypothesis:
The null hypothesis is the hypothesis that the dependent variable in an experiment is not affected
by the independent variable.
The experimental hypothesis is that the dependent variable is affected by the independent variable.
For example, let's say that you are testing whether playing violent video games affect
aggressiveness. The playing or not playing of video games is the independent variable, and the
aggressiveness is the dependent variable.
1) Your experimental hypothesis could be that playing violent video games affects aggressiveness
in the test subjects.
The null hypothesis is that the violent video games do not in any way affect aggressiveness.
2) Alternatively, you could make an experimental hypothesis saying that playing violent video games
make people more aggressive, in which case the null hypothesis would be that it does not
make people more aggressive.
In this case you'd get a directional null hypothesis, e.g. that there is either no effect or the effect is
the opposite of what you expect.
Question: Is there a statistically significant difference in the data?
Scenario 1:
While reviewing zoo records, a zookeeper notices that the baboon exhibits each average 42
incidences of aggressive behavior a month. He hypothesizes that changing the intensity of the light
in the primate exhibit will reduce the amount of aggression between the baboons.
In exhibit A, with a lower light intensity, he observed 32 incidences of aggression over a one month
period. In exhibit B, with normal lights, he observes 45 incidences of aggression.
Would you accept or reject his experimental hypothesis?
Exhibit A (32 – 42)2 = 100 = 2.38
42
42
= 2.59
(45 – 42)2 = 9 = 0.21
Exhibit B
42
42
P-value = 0.05
Degree of Freedom (df) = number of choices – 1
(n is aggressive or not aggressive ) 2-1 = 1 df
Critical Value = 3.84
Accept or Reject his experimental hypothesis?
Scenario 2:
A behavioral psychologist notices that gate #4 in the polar bear exhibit is preferred over gate 1, 2
and 3. If there were no impetus for the polar bears to go through that particular gate, then one would
expect each gate to be used equally, 25% of the time. However, based on her observations, the
polar bears entered gate 1 (9%), gate 2 (20%), gate 3 (25%), and gate 4 (46%). She believes there
is something making them select that door in such great numbers.
Would you accept or reject the null hypothesis (that nothing is impacting their decision)?
Gate 1
(9 – 25)2 = 256 = 10.24
25
25
Gate 2
(20 – 25)2 = 25 = 1.00
25
25
Gate 3
(25 –
25
Gate 4
(46 – 25)2 = 441 = 17.64
25
25
25)2
= 28.88
= 0 = 0.00
25
P-value = 0.05
Degree of Freedom (df) = number of choices – 1
(n is 4 gates) 4-1 = 3 df
Critical Value = 7.82
Accept or Reject the null hypothesis?
Statistics Worksheet
Mean ( ) : (same as the average) add up the values / number of trials
Sum of the Squares (SS)
SS =
Variance (s2)
Standard Deviation (s)
Standard Error of the Mean
N: Total number of individuals in a population
n: Total number of individuals in a sample of the population
Xi: a single measurement
∑: Summation
x: sample mean
Sample #
Data of Non-Survivors
Data on 100 medium ground finches
from Peter and Rosemary Grant’s
40 years of Research in the Galápagos
Mean
= (total/sample #)
Sum of Squares SS =
Non-Survivor
Beak Depth (mm)
X1 measurement
1
7.52
2
9.31
3
8.20
4
8.39
5
10.50
(xi -
1)
2
43.92/5 = 8.78
SS = xxxx
Variance
S2 = xxxx
Standard deviation
S = xxxx
Standard error of the
mean
SE = xxxx
95% CI
Squared Difference
2s
√n
CI = xxxx
n = 5 samples
Sample #
Data of Non-Survivors
Data on 100 medium ground finches
from Peter and Rosemary Grant’s
40 years of Research in the Galápagos
Mean
= (total/sample #)
Sum of Squares SS =
Non-Survivor
Beak Depth (mm)
X1 measurement
1
9.10
2
8.80
3
9.15
4
11.01
5
10.86
(xi -
2)
2
48.92/5 = 9.78
SS = xxxx
Variance
S2 = xxxx
Standard deviation
S = xxxx
Standard error of the
mean
SE = xxxx
95% CI
Squared Difference
2s
√n
CI = xxxx
n = 5 samples
t–Test Statistics
The t-Test determines the probability (p) that any observed differences between
the means of the two samples (i.e. non-survivors and survivors) occurred simply
by chance, and not natural selection.
| | = absolute value, always a positive number
n = 5 birds