Statistics Notes
Download
Report
Transcript Statistics Notes
STATISTICS!!!
The science of data
What is data?
Information, in the form of facts or
figures obtained from
experiments or surveys, used as a
basis for making calculations or
drawing conclusions
Encarta dictionary
Statistics in Science
Data
can be collected about a
population (surveys)
Data
can be collected about a
process (experimentation)
2 types of Data
Qualitative
Quantitative
Qualitative Data
Information that relates to characteristics or
description (observable qualities)
Information is often grouped by descriptive
category
Examples
Species of plant
Type of insect
Shades of color
Rank of flavor in taste testing
Remember: qualitative data can be “scored” and
evaluated numerically
Qualitative data, manipulated
numerically
Survey results, teens and need for environmental action
Quantitative data
Quantitative
– measured using a
naturally occurring numerical scale
Examples
Chemical concentration
Temperature
Length
Weight…etc.
Quantitation
Measurements are often displayed
graphically
Quantitation = Measurement
In data collection for Biology, data must be
measured carefully, using laboratory equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc)
The limits of the equipment used add some
uncertainty to the data collected. All equipment
has a certain magnitude of uncertainty. For
example, is a ruler that is mass-produced a good
measure of 1 cm? 1mm? 0.1mm?
For quantitative testing, you must indicate
the level of uncertainty of the tool that you
are using for measurement!!
How to determine uncertainty?
Usually the instrument manufacturer will indicate
this – read what is provided by the
manufacturer.
Be sure that the number of significant digits in
the data table/graph reflects the precision of the
instrument used (for ex. If the manufacturer
states that the accuracy of a balance is to 0.1g –
and your average mass is 2.06g, be sure to
round the average to 2.1g) Your data must be
consistent with your measurement tool
regarding significant figures.
Finding the limits
As a “rule-of-thumb”, if not specified, use +/1/2 of the smallest measurement unit (ex
metric ruler is lined to 1mm,so the limit of
uncertainty of the ruler is +/- 0.5 mm.)
If the room temperature is read as 25 degrees
C, with a thermometer that is scored at 1 degree
intervals – what is the range of possible
temperatures for the room?
(ans.s +/- 0.5 degrees Celsius - if you read
15oC, it may in fact be 14.5 or 15.5 degrees)
Looking at Data
How accurate is the data? (How close are
the data to the “real” results?) This is also
considered as BIAS
How precise is the data? (All test systems
have some uncertainty, due to limits of
measurement) Estimation of the limits of
the experimental uncertainty is essential.
Comparing Averages
Once
the 2 averages are
calculated for each set of data,
the average values can be plotted
together on a graph, to visualize
the relationship between the 2
Drawing error bars
The simplest way to draw an error bar is
to use the mean as the central point, and
to use the distance of the measurement
that is furthest from the average as the
endpoints of the data bar
Value farthest
from average
Calculated
distance
Average
value
What do error bars suggest?
If the bars show extensive overlap, it is
likely that there is not a significant
difference between those values
Quick Review – 3 measures of
“Central Tendency”
mode: value that appears most frequently
median: When all data are listed from
least to greatest, the value at which half
of the observations are greater, and half
are lesser.
The most commonly used measure of
central tendency is the mean, or
arithmetic average (sum of data points
divided by the number of points)
How can leaf lengths be displayed
graphically?
Simply measure the lengths of each and plot how many
are of each length
If smoothed, the histogram data
assumes this shape
This Shape?
Is a classic bell-shaped curve, AKA
Gaussian Distribution Curve, AKA a Normal
Distribution curve.
Essentially it means that in all studies with
an adequate number of datapoints (>30)
a significant number of results tend to be
near the mean. Fewer results are found
farther from the mean
Standard Deviation
The standard deviation is a statistic that
tells you how tightly all the various
examples are clustered around the mean
in a set of data
Standard deviation
The STANDARD DEVIATION is a more
sophisticated indicator of the precision of
a set of a given number of measurements
The
standard deviation is like an average
deviation of measurement values from the
mean. In large studies, the standard deviation
is used to draw error bars, instead of the
maximum deviation.
A typical standard distribution curve
According to this curve:
One standard deviation away from the
mean in either direction on the horizontal
axis (the red area on the preceding graph)
accounts for somewhere around 68
percent of the data in this group.
Two standard deviations away from the
mean (the red and green areas) account
for roughly 95 percent of the data.
Three Standard Deviations?
three standard deviations (the red, green
and blue areas) account for about 99
percent of the data
-3sd -2sd
+/-1sd
2sd
+3sd
How is Standard Deviation
calculated?
With this formula!
AGHHH! MRS R-
DO
I NEED TO
KNOW THIS FOR
THE TEST?????
Not the formula!
This can be calculated on a scientific calculator
OR…. In Microsoft Excel, type the following code into the
cell where you want the Standard Deviation result, using
the "unbiased," or "n-1" method: =STDEV(A1:A30)
(substitute the cell name of the first value in your
dataset for A1, and the cell name of the last value for
A30.)
You DO need to know the concept!
standard deviation is a statistic that tells
how tightly all the various datapoints are
clustered around the mean in a set of data.
When the datapoints are tightly bunched
together and the bell-shaped curve is steep, the
standard deviation is small.(precise results,
smaller sd)
When the datapoints are spread apart and the
bell curve is relatively flat, a large standard
deviation value suggests less precise results
THE END
For
today……….