standard deviation
Download
Report
Transcript standard deviation
Statistical Analysis
Topic 1
Statistics
1.1.1 State that error bars are a
graphical representation of the
variability of data.
1.1.2 Calculate the mean and standard
deviation of a set of values.
1.1.3 State that the term standard
deviation is used to summarize the
spread of values around the mean, and
that 68% of values fall within one
standard deviation of the mean.
1.1.4 Explain how the standard deviation is
useful for comparing the means and spread
of data between two or more samples.
1.1.5 Deduce the significance of the
difference between two sets of data using
calculated values for t and the appropriate
tables.
1.1.6 Explain that the existence of a
correlation does not establish that there is a
causal relationship between two variables.
What is data?
Information, in the form of facts
or figures obtained from
experiments or surveys, used
as a basis for making
calculations or drawing
conclusions
Encarta dictionary
2 types of Data
Qualitative
Quantitative
Statistics in Science
Data
can be collected about a
population (surveys)
Data
can be collected about a
process (experimentation)
Qualitative Data
Information that relates to characteristics
or description (observable qualities)
Information is often grouped by descriptive
category
Examples
Species of plant
Type of insect
Shades of color
Rank of flavor in taste testing
Remember: qualitative data can be “scored” and
evaluated numerically
Qualitative data, manipulated
numerically
Survey results, teens and need for environmental action
Quantitative data
Quantitative
– measured using a
naturally occurring numerical
scale
Examples
Chemical concentration
Temperature
Length
Weight…etc.
Quantitation
Measurements are often displayed
graphically
Quantitation = Measurement
In data collection for Biology, data must be
measured carefully, using laboratory
equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc)
The limits of the equipment used add some
uncertainty to the data collected. All
equipment has a certain magnitude of
uncertainty. For example, is a ruler that is
mass-produced a good measure of 1 cm?
1mm? 0.1mm?
For quantitative testing, you must indicate
the level of uncertainty of the tool that
you are using for measurement!!
Finding the level of
uncertainty
As a “rule-of-thumb”, if not specified, use +/1/2 of the smallest measurement unit (ex
metric ruler is lined to 1mm,so the limit of
uncertainty of the ruler is +/- 0.5 mm.)
If the room temperature is read as 25
degrees C, with a thermometer that is scored
at 1 degree intervals – what is the range of
possible temperatures for the room?
(ans.s +/- 0.5 degrees Celsius - if you read
15oC, it may in fact be 14.5 or 15.5 degrees)
Definition of statistics
Branch of mathematics which allows us
to sample small portions from habitats,
communities, or biological populations,
and draw conclusions about the larger
population.
Statistics measure the differences and
relationships between sets of data
Nothing is 100% certain in science
Mean
An average of data
points
Central tendency of
the data
Find the mean of
the given data³:
Answer: 12999.4
Country
# of reported
HIV cases
Argentina
27517
Bahamas
4548
Canada
19468
Dominican
Republic
7167
Ecuador
6297
Range
A measure of the
spread of data
Difference between the
largest and the smallest
observed values
Find the range of the
given data:
Answer: 22969
If one data point were
unusually large or
unusually small, it
would have a great
effect on the range.
Such points are called
outliers.
Country
# of reported
HIV cases
Argentina
27517
Bahamas
4548
Canada
19468
Dominican
Republic
7167
Ecuador
6297
Looking at Data
How accurate is the data? (How close
are the data to the “real” results?) This
is also considered as BIAS
How precise is the data? (All test
systems have some uncertainty, due to
limits of measurement) Estimation of
the limits of the experimental
uncertainty is essential.
Comparing Averages
Once
the 2 averages are
calculated for each set of data,
the average values can be
plotted together on a graph, to
visualize the relationship
between the 2
Drawing error bars
The simplest way to draw an error bar
is to use the mean as the central point,
and to use the distance of the
measurement that is furthest from the
average as the endpoints of the data
bar
Value farthest
from average
Calculated
distance
Average
value
What do error bars suggest?
If the bars show extensive overlap, it is
likely that there is not a significant
difference between those values
Error bars
Graphical
representation of
the variability of
data
Can be used to
show either the
range of data or the
standard deviation
on a graph
Standard deviation
A measure of how the individual observations
of a data set are dispersed or spread out
around the mean.
Determined by a mathematical formula which
is programmed into your calculator
In a normal distribution, about 68% of all
values lie within ±1 standard deviation of the
mean. This rises to about 95% for ±2
standard deviations from the mean.
How is Standard Deviation
calculated?
With this formula!
How to calculate SD
TI-86
http://www.saintmarys.edu/~cpeltier/calcfor
stat/StatTI-86.html
TI-83 and 84
http://www.saintmarys.edu/~cpeltier/calcfor
stat/StatTI-83.html
In Microsoft Excel, type the following code into the
cell where you want the Standard Deviation result,
using the "unbiased," or "n-1" method:
=STDEV(A1:A30) (substitute the cell name of the
first value in your dataset for A1, and the cell name
of the last value for A30.)
How can leaf lengths be displayed
graphically?
Simply measure the lengths of each and plot how
many are of each length
If smoothed, the histogram data
assumes this shape
This Shape?
Is a classic bell-shaped curve, AKA
Gaussian Distribution Curve, AKA a
Normal Distribution curve.
Essentially it means that in all studies
with an adequate number of datapoints
(>30) a significant number of results
tend to be near the mean. Fewer
results are found farther from the mean
The standard deviation is a statistic
that tells you how tightly all the various
examples are clustered around the
mean in a set of data
A typical standard distribution
curve
According to this curve:
One standard deviation away from the
mean in either direction on the
horizontal axis (the red area on the
preceding graph) accounts for
somewhere around 68 percent of the
data in this group.
Two standard deviations away from the
mean (the red and green areas)
account for roughly 95 percent of the
data.
Three Standard Deviations?
three standard deviations (the red,
green and blue areas) account for
about 99 percent of the data
-3sd -2sd
+/-1sd
2sd
+3sd
NRT Example
100 tests taken
Grades plotted on a
graph
Graph likely to be a bell
curve
When data points are
clustered together, the
standard deviation is
small; when they are
spread apart, the
standard deviation is
large
How is SD useful?
Many extremes = large SD
Few extremes = small SD
Comparing the means and
standard deviation between two
or more samples
Height of bean plants in the sunlight in
centimetres ±0.1 cm
Height of bean plants in the shade in
centimetres ±0.1 cm
124
131
120
60
153
160
98
212
123
117
142
65
156
155
128
160
139
145
117
95
Total 1300
Total 1300
Mean: 1300/10 = 130.0 cm
Answers
SD for sunlight data: 17.68 cm
SD for shade data: 47.02 cm
Wide variation makes us question experimental
design
Means alone is not sufficient
Significant difference between
two data sets using the t-test
T-test compares two sets of data to see
if chance alone could make a difference
Scientists like to be at least 95%
certain of their findings before drawing
conclusions
Mean, SD, and sample size are used to
calculate the value of t
Degrees of freedom = sum of sample
sizes of each of the two groups minus 2
T-test calculation
For all data values:
http://www.graphpad.com/quickcalcs/tt
est1.cfm
For means:
http://www.dimensionresearch.com/res
ources/calculators/ttest.html
Worked example
Compare two groups of barnacles living
on a rocky shore. Measure the width
of their shells to see if a significant size
difference is found depending on how
close they live to the water. One group
lives between 0 and 10 metres from
the water level. The second group
lives between 10 and 20 metres above
the water level.
Measurement was taken of the width of
the shells in millimetres. 15 shells were
measured from each group. The mean
of the group closer to the water
indicates that living closer to the water
causes the barnacles to have a larger
shell. If the value of t is 2.25, is that a
significant difference?
Steps to determining significant
difference when given value of t
Determine degree of freedom (# in each set
minus 2)
Ex. 15 + 15 – 2 = 28
Ex. 2.25
Ex. 0.05 or 5%
Use given value of t
Use table of t values to determine probability
(p) of chance
The confidence level is 95%
Ex. We are 95% confident that the difference
between barnacles is significant. Barnacles living
nearer the water have a significantly larger shell
than those living 10 metres or more away from
the water.
Website help
http://graphpad.com/quickcalcs/ttest1.
cfm
Correlation does not mean
causation
Experiments provide a test which
shows cause
Observations without an experiment
can only show a correlation
Correlation test
Correlation signified by value of r
+1 (completely positive correlation)
0 (no correlation)
-1 (completely negative correlation)
http://www.argyll.epsb.ca/jreed/math9
/strand4/scatterplot.htm
Correlation or causation?
1.
2.
3.
4.
5.
Cars with low gas mileage per gallon
of fuel cause global warming.
Drinking red wine protects against
heart disease.
Tanning beds can cause skin cancer.
UV rays increase the risk of cataracts.
Vitamin C cures the common cold.
Resources
¹http://www.globalissues.org/TradeRel
ated/Facts.asp#src1
²http://www.globalissues.org/TradeRel
ated/Consumption.asp
³http://www.who.int/globalatlas/includ
eFiles/generalIncludeFiles/listInstances.
asp