STATISTICS!!!

Download Report

Transcript STATISTICS!!!

STATISTICS!!!
The science of data
What is data?
Information, in the form of facts or
figures obtained from experiments
or surveys, used as a basis for
making calculations or drawing
conclusions
Encarta dictionary
Statistics in Science
• Data can be collected about a
population (surveys)
• Data can be collected about a
process (experimentation)
2 types of Data
Qualitative
Quantitative
Qualitative Data
• Information that relates to characteristics or
description (observable qualities)
• Information is often grouped by descriptive category
• Examples
– Species of plant
– Type of insect
– Shades of color
– Rank of flavor in taste testing
Remember: qualitative data can be “scored” and evaluated
numerically
Qualitative data, manipulated numerically
• Survey results, teens and need for environmental action
Quantitative data
• Quantitative – measured using a
naturally occurring numerical scale
• Examples
–Chemical concentration
–Temperature
–Length
–Weight…etc.
Quantitation
• Measurements are often displayed graphically
Quantitation = Measurement
• In data collection for Biology, data must be measured
carefully, using laboratory equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc)
• The limits of the equipment used add some
uncertainty to the data collected. All equipment has
a certain magnitude of uncertainty. For example, is a
ruler that is mass-produced a good measure of 1 cm?
1mm? 0.1mm?
• For quantitative testing, you must indicate the level
of uncertainty of the tool that you are using for
measurement!!
How to determine uncertainty?
• As a “rule-of-thumb”, if not specified, use +/- 1/2 of
the smallest measurement unit
Looking at Data
• How accurate is the data? (How close are the
data to the “real” results?) This is also
considered as BIAS
• How precise is the data? (All test systems have
some uncertainty, due to limits of
measurement) Estimation of the limits of the
experimental uncertainty is essential.
Comparing Averages
• Once the 2 averages are calculated
for each set of data, the average
values can be plotted together on a
graph, to visualize the relationship
between the 2
Drawing error bars
• The simplest way to draw an error bar is to
use the mean as the central point, and to use
the distance of the measurement that is
furthest from the average as the endpoints of
the data bar
Value farthest
from average
Calculated
distance
Average
value
What do error bars suggest?
• If the bars show extensive overlap, it is likely
that there is not a significant difference
between those values
Quick Review – 3 measures of “Central
Tendency”
• mode: value that appears most frequently
• median: When all data are listed from least to
greatest, the value at which half of the
observations are greater, and half are lesser.
• The most commonly used measure of central
tendency is the mean, or arithmetic average
(sum of data points divided by the number of
points)
How can leaf lengths be displayed
graphically?
Simply measure the lengths of each and plot how many are of
each length
If smoothed, the histogram data assumes this
shape
This Shape?
• Is a classic bell-shaped curve, AKA Gaussian
Distribution Curve, AKA a Normal Distribution
curve.
• Essentially it means that in all studies with an
adequate number of datapoints (>30) a
significant number of results tend to be near
the mean. Fewer results are found farther
from the mean
• The standard deviation is a statistic that tells
you how tightly all the various examples are
clustered around the mean in a set of data
Standard deviation
• The STANDARD DEVIATION is a more
sophisticated indicator of the precision of a
set of a given number of measurements
– The standard deviation is like an average deviation
of measurement values from the mean. In large
studies, the standard deviation is used to draw
error bars, instead of the maximum deviation.
A typical standard distribution curve
According to this curve:
• One standard deviation away from the mean
in either direction on the horizontal axis (the
red area on the preceding graph) accounts for
somewhere around 68 percent of the data in
this group.
• Two standard deviations away from the mean
(the red and green areas) account for roughly
95 percent of the data.
Three Standard Deviations?
• three standard deviations (the red, green and
blue areas) account for about 99 percent of
the data
-3sd -2sd
+/-1sd
2sd
+3sd
How is Standard Deviation calculated?
With this formula!
AGHHH!
•DO I NEED TO
KNOW THIS FOR
THE TEST?????
Not the formula!
• This can be calculated on a scientific calculator
• OR…. In Microsoft Excel, type the following code into the cell
where you want the Standard Deviation result, using the
"unbiased," or "n-1" method: =STDEV(A1:A30) (substitute the
cell name of the first value in your dataset for A1, and the cell
name of the last value for A30.)
• OR….Try this! http://www.pages.drexel.edu/~jdf37/mean.htm
You DO need to know the concept!
• standard deviation is a statistic that tells how
tightly all the various datapoints are clustered
around the mean in a set of data.
• When the datapoints are tightly bunched together
and the bell-shaped curve is steep, the standard
deviation is small.(precise results, smaller sd)
• When the datapoints are spread apart and the bell
curve is relatively flat, a large standard deviation
value suggests less precise results
THE END

• For today……….