statistics!!!

Download Report

Transcript statistics!!!

STATISTICS!!!
I. The science of data
What are data?
Information, in the form of facts or
figures obtained from experiments
or surveys, used as a basis for
making calculations or drawing
conclusions
Encarta dictionary
II. Statistics in Science
 Data can be collected about a
population (surveys)
 Data can be collected about a
process (experimentation)
2 types of Data
A. Qualitative
B. Quantitative
A. Qualitative Data
 Information that relates to
characteristics or description
(observable qualities)
 Information is often grouped by
descriptive category
•Qualitative data are forms of information
gathered in a nonnumeric form.
•Examples:
1.Interview transcript
2. Field notes (notes taken in the
field being studied)
3. Video
4. Audio recordings
5. Images
6. Documents (reports, meeting
minutes, e-mails)
Some types of qualitative data:
Species of plant
Type of insect
Shades of color
Rank of flavor in taste testing
Remember: qualitative data can be
“scored” and evaluated numerically
Qualitative data, manipulated
numerically
 Survey results, teens and need for
environmental action
B. Quantitative data
 Quantitative – measured using a
naturally occurring numerical scale
 Examples
 Chemical
concentration
 Temperature
 Length
 Mass…etc.
Quantitative
 Measurements are often displayed
graphically
Quantitation = Measurement
 In data collection for Biology, data must
be measured carefully, using laboratory
equipment
(ex. timers, meter sticks, pH meters,
balances , etc.)

•The limits of the equipment used add
some uncertainty to the data
collected.
•All equipment has a certain
magnitude of uncertainty. For
example, is a ruler that is massproduced a good measure of 1 cm?
1mm? 0.1mm?
•For quantitative testing, you must
indicate the level of uncertainty of the
tool that you are using for
measurement.
How to determine uncertainty?
 Usually the instrument manufacturer will indicate
this – read what is provided by the manufacturer.
 Be sure that the number of significant figures in
the data table/graph reflects the precision of the
instrument used
 For ex. If the manufacturer states that the accuracy
of a balance is to 0.1g – and your average mass is
2.06g, be sure to round the average to 2.1g.
 Your data must be consistent with your
measurement tool regarding significant figures.
Finding the limits
 As a “rule-of-thumb”, if not specified, use
+/- 1 of the smallest measurement unit (Ex.
= metric ruler is lined to 1mm,so the limit of
uncertainty of the ruler is +/- 1 mm.)
 If the room temperature is read as 25
degrees C, with a thermometer that is scored
at 1 degree intervals – what is the range of
possible temperatures for the room?
 (Ans. +/- 1 degrees Celsius - If you read
25oC, it may in fact be 24 or 26 degrees)
III. Looking at Data
 How accurate are the data? (How close are
the data to the “real” results?) This is also
considered as BIAS.
 How precise are the data? (All test systems
have some uncertainty, due to limits of
measurement) Estimation of the limits of the
experimental uncertainty is essential.
Comparing Averages
 Once the 2 averages (means) are
calculated for each set of data, the
mean values can be plotted
together on a graph, to visualize
the relationship between the 2.
•Biological systems are subject to a genetic
program and environmental variation.
• When we collect a set of data for a given
variable it shows variation.
•When displaying data in graphical formats
we can show the variation by using error
bars.
• Error bars can be used to show either the
range of the data or the standard deviation.
Drawing error bars
 The simplest way to draw an error bar is to
use the mean as the central point, and to
use the distance of the measurement that is
farthest from the average as the endpoints of
the error bar. The ends of the error bar are
equidistant from the mean at the center.
Value farthest
from average
Calculated
distance
Average
value
What do error bars suggest?
 If the bars do overlap, there is not a significant
difference between those values (the numbers
in the data).
Another way of stating this:
When SE bars do overlap, you can be sure
the difference between the two means is not
statistically significant .
What can you conclude when standard
error bars do not overlap?
•When standard error (SE) bars do not
overlap, you cannot be sure that the
difference between two means is
statistically significant.
•T-test is commonly used to compare these
groups.
Quick Review – 3 measures of “Central
Tendency”
 mode: value that appears most frequently
 median: When all data are listed from least to
greatest, the value at which half of the
observations are greater, and half are lesser.
 The most commonly used measure of central
tendency is the mean, or arithmetic average
(sum of data points divided by the number of
points)
How can leaf lengths be displayed graphically?
Simply measure the lengths of each and plot how many are of each
length
If smoothed, the histogram data assumes this
shape
This Shape?
 Is a classic bell-shaped curve, AKA
Gaussian Distribution Curve, AKA a
Normal Distribution curve.
 Essentially it means that in all studies with an
adequate number of data points (>30) a
significant number of results tend to be near
the mean. Fewer results are found farther
from the mean.
Standard Deviation
 The standard deviation is a statistic that tells
you how tightly all the various examples are
clustered around the mean in a set of data
Standard deviation
 The STANDARD DEVIATION is an indicator
of the precision of a set of a given number of
measurements

The standard deviation is like an average
deviation of measurement values from the
mean. In large studies, the standard deviation
is used to draw error bars, instead of the
maximum deviation.
A typical standard distribution curve
According to this curve:
 One standard deviation away from the mean
in either direction on the horizontal axis (the
red area on the preceding graph) accounts
for somewhere around 68 percent of the data
in this group.
 Two standard deviations away from the mean
(the red and green areas) account for roughly
95 percent of the data.
Three Standard Deviations?
 Three standard deviations (the red, green
and blue areas) account for about 99 percent
of the data
-3sd -2sd
+/-1sd
2sd
+3sd
Standard Deviation
SD=1
SD=3
SD=2
Graphs from: http://www.childrensmercy.org/stats/definitions/stdev.htm
How is Standard Deviation calculated?
With this formula!
AGHHH! MRS. C.-
DO I NEED TO
KNOW THIS FOR
THE TEST?????
Not the formula!
 This can be calculated on a scientific
calculator
 OR…. In Microsoft Excel, type the following
code into the cell where you want the
Standard Deviation result, using the
"unbiased," or "n-1" method:
=STDEV(A1:A30) (substitute the cell name of
the first value in your dataset for A1, and the
cell name of the last value for A30.)
You DO need to know the concept!
Standard deviation is a statistic that tells
how tightly all the various datapoints are
clustered around the mean in a set of data.
 When the data points are tightly bunched
together and the bell-shaped curve is steep,
the standard deviation is small.(precise
results, smaller sd)
 When the data points are spread apart and
the bell curve is relatively flat, a large
standard deviation value suggests less
precise results.
