Descriptive Statistics Used in Biology
Download
Report
Transcript Descriptive Statistics Used in Biology
Descriptive Statistics Used in Biology
• It is rarely practical for scientists to measure every
event or individual in a population.
• Instead, they typically collect data on a sample of a
population and use these data to draw conclusions (or
make inferences) about the entire population.
• Statistics is a mathematical discipline that relates to
this type of analysis.
One of your first steps in analyzing a small data set
is to graph the data and examine the distribution.
• Here are two graphs of beak
measurements taken from two
samples of medium ground finches
that lived on the island of Daphne
Major, one of the Galápagos Islands,
during a major drought in 1977.
• The measurements tend to be more or
less symmetrically distributed across a
range, with most measurements around
the center of the distribution.
• This is a characteristic of a normal
distribution.
Measures of Average: Mean, Median, and Mode
• A description of a group of observations can include a value for the
mean, median, or mode.
• These are all measures of central tendency—in other words, they
represent a number close to the center of the distribution.
The Mean
• You calculate the mean (also referred to as the average or arithmetic
mean) by summing all the data points in a data set (ΣX) and then
dividing this number by the total number of data points (N):
What scientists want to understand is the mean of the entire population, which
is represented by µ. They use the sample mean, represented by 𝑥̅, as an
estimate of µ.
Nonsurvivors
Calculate the
mean for
two subsets
of the
Grants’ Data
Survivors
5-bird sample
15-bird sample
5-bird sample
15-bird sample
Bird
Beak Depth
Bird
Beak Depth
Bird
Beak Depth
Bird
Beak Depth
ID #
(mm)
ID #
(mm)
ID #
(mm)
ID #
(mm)
12
7.52
283
11.20
943
9.10
316
9.85
347
9.31
288
9.10
1643
8.80
623
8.80
413
8.20
294
10.50
1884
9.15
673
10.10
522
8.39
315
8.80
2244
11.01
678
9.70
609
10.50
321
8.48
8191
10.86
891
8.00
352
7.70
1019
11.21
413
8.20
1477
10.10
468
9.02
1528
8.55
503
9.10
1797
9.31
507
8.85
1850
10.40
561
10.20
1884
9.15
610
9.00
2242
9.45
619
9.25
2378
9.86
621
7.60
2249
10.68
676
9.70
2939
8.31
Mean
Mean
Mean
Mean
• Note that the mean values
are different for the five
and fifteen-samples.
• Which is a better estimate
of the true mean, µ?
Median
• When the data are ordered from the largest to
the smallest, the median is the midpoint of the
data.
• It is not distorted by extreme values, or even
when the distribution is not normal.
• For this reason, it may be more useful for you to
use the median as the main descriptive statistic
for a sample of data in which some of the
measurements are extremely large or extremely
small.
• Find the median for each of your four sets of
finch data.
Range
• The simplest measure of variability in a sample of normally
distributed data is the range, which is the difference between the
largest and smallest values in a set of data. You can use the range for
data that are not normally distributed.
• For any data, a larger range value indicates a greater spread of the
data—in other words, the larger the range, the greater the variability.
• An extremely large or small value in the data set will make the
variability appear high.
• Calculate the range for your four samples of the Grants’ data.
• The standard deviation provides a more reliable measure of the
“true” spread of the data.
Definitions for Median and Range on BRT
Standard Deviation and Variance
• The standard deviation is the most widely used measure of variability.
• The sample standard deviation (s) is essentially the average of the
deviation between each measurement in the sample and the sample
mean (𝑥).
• The sample standard deviation is an estimate of the standard
deviation in the larger population.
The formula for calculating the sample standard
deviation follows:
What does standard deviation indicate?
• If a population has a normal
distribution, 68% of the sample
should be within one standard
deviation of the mean.
• Approximately 2 standard
deviations should account for
95% of all samples.
Variance and Standard Deviation
• Note that the number calculated at this step provides a statistic called
variance (s2). Variance is an important measure of variability that is
used in certain statistical methods. It is the square of the standard
variation.
6. Take the square root to calculate the standard deviation (s) for the
sample.
Calculate the standard deviation for the two five-bird samples
(survivors and non-survivors)
Results
• Note that the standard
deviation is smaller for the
larger samples.
• This is often, but not
always the case..
• Now let’s look at a larger
data set:
Data for 50 Finches
• Four variables, including beak
depth
• Calculate the mean for each
variable
• Variance is given; you can
calculate the standard deviation
easily
• Enter in table
Fill in everything except 95% confidence
interval
Measures of Confidence: Standard Error of the
Mean and 95% Confidence Interval
• The standard deviation provides a measure of the spread of the data
from the mean.
• A different type of statistic reveals the uncertainty in the calculation
of the mean.
• The sample mean is not necessarily identical to the mean of the
entire population.
• Every time you take a sample and calculate a sample mean, you
would expect a slightly different value.
• In other words, the sample means themselves have variability.
Standard Error of the Mean
• This variability can be expressed by calculating the standard error of the
mean (abbreviated as̅ SEM) or the 95% confidence interval (95% CI).
Why are SEM and 95% CL useful?
• The standard error of the mean represents the standard deviation of
a distribution around the mean and estimates how close the sample
mean is to the population mean.
• The greater the sample size (i.e., 50 rather than 15 or 5 finches), the
more closely the sample mean will estimate the population mean,
and therefore the standard error of the mean becomes smaller.
• The 95% confidence interval (95% CI) is equivalent to 1.96 (typically
rounded to 2) standard errors of the mean.
• Because the sample means are assumed to be normally distributed,
95% of all sample means should fall between 2 standard deviations
above and below the population mean, estimated by 95% CI.
Calculate the 95% Confidence Interval for
each variable mean.
• Both SEM̅ and 95% CI can be illustrated as error bars in a bar graph of the
means of two or more samples that are being compared.
• Depicting SEM or the 95% CI as error bars in a bar graph provides a clear
visual clue to the uncertainty of the calculations of the sample means.
Make a Bar Graph with Error Bars
• On a sheet of graph paper, construct four bar graphs that compare
the means of non-survivors and survivors for each physical
characteristic (wing length, body mass, tarsus length, and beak size).
• Label both axes of each graph and show the 95% CI as error bars.
• Once you complete your four bar graphs, describe any differences
between non-survivors and survivors you observe in each graph.
What have we learned?
• Measurements often cluster around the mean is a form known as a
normal distribution.
• The average variability about the mean is known as the standard
deviation of the mean.
• Since it usually not possible to sample every individual in a
population, there is also variability in the measurement of the mean.
• If the variation of the mean follows a normal distribution, we can
estimate how close our sample mean is to the actual mean.
• 95% confidence intervals allow us to compare sample means of
different groups and infer statistically significant differences.
Observation: seeds of weeds seem to need
light as well as water to germinate
• You will investigate this observation using garden seeds of various
species
• You have the following: lots of seeds, petri dishes, filter paper, water
• Design an experiment to address the observation.
• Each petri dish should have exactly 30 seeds; each group should use
at least 10 petri dishes
• We will collect germination data and analyze it next week.
• A written report, in scientific format, will be completed for homework
in your lab notebook…
Observation: seeds of weeds seem to need
light as well as water to germinate
• You will investigate this observation using dill weed
(Anethum graveolens)
• You have the following: lots of dill seeds, petri
dishes, filter paper, water
• Design an experiment to address the observation.
• Each petri dish should have exactly 30 seeds; each
group should use at least 6 petri dishes
• We will collect germination data and analyze it
next week.
• A written report, in scientific format, will be
completed for homework…