Standard Deviation

Download Report

Transcript Standard Deviation

TOPIC 1
STATISTICAL ANALYSIS
MAKING A SCIENTIFIC INVESTIGATION
STEP 1: HAVE A RESEARCH QUESTION
STEP 2: HAVE A HYPOTHESIS
STEP 3: WRITE A METHOD TO TEST YOUR HYPOTHESIS
(design a controlled experiment)
STEP 4: COLLECT DATA
STEP 5: ORGANIZE THE DATA
STEP 6: ILLUSTRATE THE DATA USING AN APPROPRIATE DIAGRAM
STEP 7: ANALYZE THE DATA USING THE CORRECT STATISTICAL
METHODS, ENABLING A CONCLUSION TO BE DRAWN
STEP 4: DATA COLLECTION
The collection of all things being investigated is
called the population.
It is usually impossible for us to collect data from
every member of the population.
We must therefore choose a sample from the
population.
We must try to make sure that the sample is
representative of the population from which it is
drawn, so that we can generalize any findings about
the sample to the population.
Random sampling ensures that every member of
the population has an equal chance of being
included in the sample.
I. QUALITATIVE DATA (descriptive)
II. QUANTITATIVE DATA (numerical)
•
•
CONTINUOUS ex. length
DISCRETE ex. number of eggs
STEP 5: ORGANIZING DATA
Ways to Organize Raw Data:
Constructing tables
- Ranking
- Tally chart
- Frequency distribution
Use the table below to answer the following questions:
Is discrete or continuous data represented?
What type of data organization is below?
Is the data table complete?
How will you process this data? (What does this data ‘say’ to you?)
Shell length /
mm
Number of
limpets
8-11
2
12-15
5
16-19
8
20-23
10
24-27
9
28-31
5
32-35
1
QUADRAT
SAMPLING
Marine Intertidal Zone
SPREADSHEET ACTIVITY 1: NORMAL DISTRIBUTION
1) Input the data from Limpet Shell Lengths in your spreadsheet
2) GRAPH: frequency distribution (normal distribution)
3) What does this graph tell you?
Shell length / Number of
mm
limpets
8-11
2
12-15
5
16-19
8
20-23
10
24-27
9
28-31
5
32-35
1
Normal Distribution
Skewed Distribution
Descriptive Statistics Includes:
• Calculating the:
– Mean
– Median
– Mode
– Range
– Standard deviation (variability)
– P value (level of confidence from a T-Test)
– PEARSON correlation coefficient (correlation/cause)
Mean (average): the average of all data entries; measure of central
tendency for normal distribution.
Median: middle value when data entries are placed in rank order;
good measure of central tendency for skewed distributions.
Mode: the most frequently ocurring value (the most common data
value)
Range: the difference between the smallest and largest data values.
This gives simple measure of spread of data. (Note: gives us
outliers – extremes which are very different from all other
values)
SPREADSHEET ACTIVITY 2
1) Input the following data in your spreadsheet
Sample 1: 30 45 45 60 75 75 75 80 90 90 100
Sample 2: 60 60 70 70 80 80 90 90 100 100 120 120
2) Calculate the mean, median, mode & range
a) manually (using scientific calculator)
b) using your spreadsheet
Note: you need to know how to complete all stats.
calculations using: 1) formula 2) spreadsheet 3) calculator.
Do we stop data analysis at
calculating the Mean, Median &
Mode?
• No!
• The mean does not give us a complete picture of
variation in our data.
• We need to calculate standard deviation
– The STDEV is a more complete measure of variation. It
considers every value in the set.
– It is a measure of the spread of data around the mean
SPREADSHEET ACTIVITY 3: Standard Deviation
1) Input the following data in your spreadsheet.
Mass (g) of mice bred in different environments
Sample A (isolated mice)
22, 22, 23, 24, 24, 24, 24, 25, 26, 26
Sample B ( bred together)
16, 17, 20, 23, 24, 25,27, 28, 29, 31
2) Calculate the means for samples A & B
3) Calculate standard deviation (STDEVP) for A & B
a) with formula b) with spreadsheet c) with calculator
4) Is variation high or low in Sample A? Sample B?
5) What does this variation tell us?
Analyzing Values from Mice Samples
• Looking at the calculated values for mean alone
for sample A and B, it appears that there is no
difference between the two populations of
mice. (we cannot recognize variability of data)
• However, when looking at STDEV, we can see:
• For sample A – STDEV is low
• For sample B – STDEV is high
– Wide variation in this data set makes us question
the experimental design. Is it possible that mice
bred in environment ‘B’ were subject to other
environmental factors ? What is causing wide
variation of data?
x
x
x x x
x x x x x
22 24 26
x xx
16
xx x
24
x x x
31
x
• Standard Deviation: A measure of how the individual
observations of a data set are dispersed or spread out
around the mean (average).
• For normally distributed data:
– 68% of all values lie within ±1 standard deviation of the mean
– 95% of all values lie within ±2 standard deviations of the
mean
Reasons for Using Statistics
• In a population, we usually find that not all the values are
identical. Instead, there are differences between the
values even inside a population.
• We call this VARIATION.
• The data we obtain from a study has variability.
• We often need to describe the variation within a
population to help us decide whether a difference
between sample means truly represents a difference
between populations means.
• How can we describe this variation? (via statistics)
Why Use Standard Deviation?
• The value provides a description of the
variation which considers every data item.
• Large differences in the sizes of the standard
deviation between samples being compared
can indicate:
– 1) that control variables are not constant
– 2) that there is a problem with validity of the
investigation.
• The standard deviation can be used as a
support in hypothesis testing.
We can graphically represent STDEV as ERROR BARS
Error Bars
• In many charts and
graphs, we show the
mean values of our
samples.
• It is useful to show a
measure of the
variation inside each of
these samples. We do
this by adding error
bars to the chart or
graph.
Error Bars
• An error bar is a line that extends above and
below a bar in a chart of a data point in a
graph. It could represent the range for that
sample, or the standard deviation.
• The length of the line represents the size of
the range or size of standard deviation – it
extends an equal distance above and below
the value of the mean.
• Error bars are graphical representations of
the variability of data.
Significance
Significance: real; true difference between two
or more samples in the phenomena that we
are examining (testing to see if findings are
not just by chance)
Note: statistical significance is our main tool in
deciding whether the data supports the
hypothesis.
What information do the means of data give?
What additional information do error bars give?
How does this affect interpretation of the figures?
- Error bars help us determine whether or not the
difference between two sets of data is significant
(real).
- A large difference between the means of samples, and
small standard deviations for thes samples, indicates
that it is likely that the difference between the means
is statistically significant.
- A small difference between these means and large
standard deviations fro these samples indicates that it
is likely that the difference between these means is
not statistically significant.
Confidence Levels
• It is seldom possible to say with absolute certainty that
the difference between sample means is significant with
complete certainty (100% confidence)
• Instead, we determine if the difference between the
sample means is probably significant.
• Most often, scientists/biologists want to be 95% confident
that the difference between the samples is significant.
• This means that there is only 5% chance that the samples
could be different purely due to chance and not because
of a real difference between the populations.
• We could say: p = 0.05 (the probability (p) that chance
alone produced the difference between our sample
means is 5%.
Determining Confidence of
Significance with T-Test
How do we determine if our findings are significant?
We Need to calculate our t value and find p value.
Apply t-test to calculate t-value – will help determine
p-value (significance at a certain level of
confidence):
• Data should be normally distributed
• Sample size should be at least 10
T-Test
• Need to include the following information for T-Test
calculation:
• 1) size of the difference between means of the samples
• 2) number of items in each sample
• 3) the amount of variation about the mean of each sample
(standard deviation)
• Value for t from data can be calculated using:
– Formula
– Scientific calculator
– Spread sheet (Microsoft Excel)
SPREASHEET ACTIVITY 4: T-TEST (P-value)
1) Input data from Clegg Text Chapter 21 Page 681
2) Calculate: mean and standard deviation
3) Calculate: P-value (from T-Test)
a) spreadsheet
b) calculator
4) What does this P-value tell you?
T-Test & P-Value using a Calculator
• Need to use table of t-values!
• Calculate T-Test Value (t-value)
• Identify Degrees of Freedom for your experiment
((sample 1 + sample 2)-2) = DF
Example: (10+10)-2 = 18
• Find row 18 in DF column
• Find t value in row 18 under “t values” column
• Once you found your t value, look to the bottom
row in that column for p value.
Two – tailed test
• A two-tailed test will test both if the mean is significantly
greater than x and if the mean significantly less than x.
• The mean is considered significantly different from x if the
test statistic is in the top 2.5% or bottom 2.5% of its
probability distribution, resulting in a p-value less than 0.05.
• We would use a two-tailed test to see if two means are
different from each other (ie from different populations), or
from the same population.
Most likely observation
observed or more extreme result arising by chance
Cause & Correlation
• Correlation: a relationship or connection between two
or more things. (observations without an experiment
can only show a correlation)
• Cause: a phenomenon that gives rise to a result.
(experimentation gives evidence for cause of result)
• Example: we might do an experiment to see if watering
bean plants prevents wilting. Observing that wilting
occurs when the soil is dry is a simple correlation, but
the experiment gives us evidence that the lack of water
is the cause of the wilting. Experiments proved a test
which shows cause.
SPREADSHEET ACTIVITY 5:
1) Inpute the following data
2) Calculate the PEARSON Correlation Coefficient (r value)
LIGHT INTENSITY (X UNITS)
PLANT HEIGHT (CM)
0
6
5
7
10
9
15
10
20
11
25
12
30
15
3) Explain what this r-value tells you.
4) Explain that existence of a correlation does not establish that there is
a causal relationship between two variables.
Positive Correlation:
The correlation in the same direction is called positive correlation. If one
variable increases, the other variable also increases or if one variable decrease and the
other variable also decreases. For example, the length of an iron bar will increase as
the temperature increases.
Negative Correlation:
The correlation in opposite direction is called negative correlation, if one
variable is increase other is decrease and vice versa, for example, the volume of gas
will decrease as the pressure increase or the demand of a particular commodity is
increase as price of such commodity is decrease.
No Correlation or Zero Correlation:
If there is no relationship between the two variables such that the value of one
variable change and the other variable remain constant is called no or zero correlation.