Stats Review (MMSI Sat. Session)

Download Report

Transcript Stats Review (MMSI Sat. Session)

Statistics
Review for AP Biology
From BSCS: Interaction of experiments and
ideas, 2nd Edition. Prentice Hall, 1970 and
Statistics for the Utterly Confused by Lloyd
Jaisingh, McGraw-Hill, 2000
What is statistics?
• a branch of mathematics that provides
techniques to analyze whether or not your data is
significant (meaningful)
• Statistical applications are based on probability
statements
• Nothing is “proved” with statistics
• Statistics are reported
• Statistics report the probability that similar
results would occur if you repeated the
experiment
Statistics deals with numbers
• Need to know nature of numbers collected
– Continuous variables: type of numbers associated
with measuring or weighing; any value in a
continuous interval of measurement.
• Examples:
– Weight of students, height of plants, time to flowering
– Discrete variables: type of numbers that are
counted or categorical
• Examples:
– Numbers of boys, girls, insects, plants
Can you figure out…
Which type of numbers? (discrete or continuous)
– Numbers of persons preferring Brand X in 5
different towns
– The weights of high school seniors
– The lengths of oak leaves
– The number of seeds germinating
– 35 tall and 12 dwarf pea plants
Can you figure out…
• Which type of numbers? (discrete or
continuous)
– Numbers of persons preferring Brand X in 5
different towns
– The weights of high school seniors
– The lengths of oak leaves
– The number of seeds germinating
– 35 tall and 12 dwarf pea plants
– Answers: all are discrete except the 2nd and 3rd
examples are continuous.
Populations and Samples
• Population includes all members of a group
– Example: all 9th grade students in America
– Number of 9th grade students at W-H
• Sample
– Used to make inferences about large populations
– Samples are a selection of the population
– Example: 2nd period AP Biology
• Why the need for statistics?
– Statistics are used to describe sample populations as estimators
of the corresponding population
– Many times, finding complete information about a population is
costly and time consuming. We can use samples to represent a
population.
Sample Populations avoiding Bias
• Individuals in a sample population
– Must be a fair representation of the entire pop.
– Therefore sample members must be randomly
selected (to avoid bias)
– Example: if you were looking at strength in
students - picking students from the football team
would NOT be random
Is there bias?
• A cage has 1000 rats, you pick the first 20 you can catch for
your experiment.
• A public opinion poll is conducted using the telephone
directory.
• You are conducting a study of a new diabetes drug; you
advertise for participants in the newspaper and TV.
Is there bias?
• A cage has 1000 rats, you pick the first 20 you can catch for
your experiment
• A public opinion poll is conducted using the telephone
directory
• You are conducting a study of a new diabetes drug; you
advertise for participants in the newspaper and TV
• All are biased: Rats-you grab the slower rats. Telephone-you
call only people with a phone (wealth?) and people who are
listed (responsible?). Newspaper/TV-you reach only people
with newspaper (wealth/educated?) and TV( wealth?).
Statistical Computations (the Math)
• If you are using a sample population
– Arithmetic Mean (average)
The sum of all the scores
divided by the total number of scores.
– The mean shows that ½ the members of the pop
fall on either side of an estimated value: mean
http://en.wikipedia.org/wiki/Table_of_mathematical_symbols
Distribution Chart of Heights of 100 Control Plants
Looking at profile of data: Distribution
• What is the frequency of distribution, where
are the data points?
Distribution Chart of Heights of 100 Control Plants
Class (height of plants-cm)
Number of plants in each
class
0.0-0.9
3
1.0-1.9
10
2.0-2.9
21
3.0-3.9
30
4.0-4.9
20
5.0-5.9
14
6.0-6.9
2
Histogram-Frequency Distribution
Charts
Number of Plants in each Class
35
30
25
20
Number of plants in each
class
15
10
5
0
0.0-0.9 1.0-1.9 2.0-2.9 3.0-3.9 4.0-4.9 5.0-5.9 6.0-6.9
This is called a “normal” curve or a bell curve
This is an “idealized” curve and is theoretical based on an infinite number
derived from a sample
One of the first steps in data analysis is to
create graphical displays of the data. Visual
displays can make it easy to see patterns
and can clarify how two variables affect
each other.
Line Graphs
• Used when data on both
scales of the graph (the
x and y axes) are
continuous.
• The dots indicate
measurements that
were actually made.
Basic Traits of A Good Graph
1. A Good Title
• A good title is one
that tells exactly
what information
the author is
trying to present
with the graph.
Relation Between Study Time and
Score on a Biology Exam in 2011
-orStudy Time vs. Score on a Biology
Exam in 2011
Basic Traits of A Good Graph
2. Axes should be
consistently
numbered.
3. Axes should
contain labels,
including units.
Basic Traits of A Good Graph
4. A frame should
be put around
the outside of
the graph.
Basic Traits of A Good Graph
5. Small marks,
called index
marks, can be
drawn in.
Basic Traits of A Good Graph
6. The
independent
variable is
always shown
on the x axis.
7. The dependent
variable is
always shown
on the y axis.
Dependent
Variable
Independent
Variable
Basic Traits of A Good Graph
8. The line should
not be extended
to the origin if
the data do not
start there.
Bar Graphs
• Used to visually compare two samples of
categorical or count data.
• Are also used to visually compare the
calculated means with error bars of normal
data .
Sample standard error bars (also
known as the sample error of the
sample mean) are the notations at the
top of each shaded bar that shows the
sample standard error (SE).
Mode and Median
• Mode: most frequently seen value
(if no numbers repeat then the mode = 0)
• Median: the middle number
– If you have an odd number of data then the
median is the value in the middle of the set
– If you have an even number of data then the
median is the average between the two middle
values in the set.
Q1 Calculate…
Q1 Answer
Fast Plants Data Analysis
Calculate Mean
Fast Plants Data Analysis
Standard Deviation
• An important statistic that is also used to
measure variation in biased samples.
• S is the symbol for standard deviation
• Calculated by taking the square root of the
variance (Bozeman)
• Say an sample of pea plants has the following:
Mean = 8cm; Variance = 2.5 ; s=1.6
• Thus the measurements vary plus or minus
+/- 1.6 cm from the mean
What does “S” mean?
• We can predict the probability of finding a pea
plant at a predicted height… the probability of
finding a pea plant above 12.8 cm or below
3.2 cm is less than 1%
• S is a valuable tool because it reveals
predicted limits of finding a particular value
The Normal Curve and Standard
A normal curve:
Deviation
Each vertical line
is a unit of
standard deviation
68% of values fall
within +1 or -1 SD
of the mean
95% of values fall
within +2 & -2 SD
units
Nearly all
members (>99%)
fall within 3 std
dev units
http://classes.kumc.edu/sah/resources/sensory_processing/images/bell_curve.gif
Pea Plant Normal Distribution Curve with Std Dev
Standard Error of the Sample Means
AKA Standard Error
• The mean, the variance, and the std dev help
estimate characteristics of the population from a
single sample
• So if many samples were taken then the means of
the samples would also form a normal distribution
curve that would be close to the whole population.
• The larger the samples the closer the means would
be to the actual value
• But that would most likely be impossible to obtain so
use a simple method to compute the means of all
the samples
A Simple Method for estimating
standard error
Standard error is the calculated standard deviation divided by the square root
of the size, or number of the population
Standard error of the means is used to test the reliability of the data
Example… If there are 10 corn plants with a standard deviation of 0.2
Sex = 0.2/ sq root of 10 = 0.2/3.03 = 0.006
0.006 represents one std dev in a sample of 10 plants
If there were 100 plants the standard error would drop to 0.002
Why?
Because when we take larger samples, our sample means get closer
to the true mean value of the population. Thus, the distribution of the
sample means would be less spread out and would have a lower
standard deviation.
Sample standard error bars (also
known as the sample error of the
sample mean) are the notations at the
top of each shaded bar that shows the
sample standard error (SE).
Fast Plants Graph
Fast Plants Graph
Probability Tests
• What to do when you are comparing two samples to
each other and you want to know if there is a
significant difference between both sample
populations
• (example the control and the experimental setup)
• How do you know there is a difference
• How large is a “difference”?
• How do you know the “difference” was caused by a
treatment and not due to “normal” sampling
variation or sampling bias?
Laws of Probability
• The results of one trial of a chance event do not affect the
results of later trials of the same event. p = 0.5 ( a coin always
has a 50:50 chance of coming up heads)
• The chance that two or more independent events will occur
together is the product of their changes of occurring
separately. (one outcome has nothing to do with the other)
• Example: What’s the likelihood of a 3 coming up on a dice: six
sides to a dice: p = 1/6
• Roll two dice with 3’s p = 1/6 *1/6= 1/36 which means there’s
a 35/36 chance of rolling something else…
• Note probabilities must equal 1.0
Laws of Probability (continued)
• The probability that either of two or more mutually
exclusive events will occur is the sum of their
probabilities (only one can happen at a time).
• Example: What is the probability of rolling a total of
either 2 or 12?
• Probability of rolling a 2 means a 1 on each of the
dice; therefore p = 1/6*1/6 = 1/36
• Probability of rolling a 12 means a 6 and a 6 on each
of the dice; therefore p = 1/36
• So the likelihood of rolling either is 1/36+1/36 = 2/36
or 1/18
The Use of the Null Hypothesis
• Is the difference in two sample populations
due to chance or a real statistical difference?
• The null hypothesis assumes that there will be
no “difference” or no “change” or no “effect”
of the experimental treatment.
• If treatment A is no better than treatment B
then the null hypothesis is supported.
• If there is a significant difference between A
and B then the null hypothesis is rejected...
Chi square
• Used with discrete values
• Phenotypes, choice chambers, etc.
• Not used with continuous variables (like
height… use t-test for samples less than 30
and z-test for samples greater than 30)
• O= observed values
• E= expected values
http://www.jspearson.com/Science/chiSquare.html
http://course1.winona.edu/sberg/Equation/chi-squ2.gif
Interpreting a chi square
•
•
•
•
Calculate degrees of freedom
# of events, trials, phenotypes -1
Example 2 phenotypes-1 =1
Generally use the column labeled 0.05 (which means
there is a 95% chance that any difference between
what you expected and what you observed is within
accepted random chance.
• Any value calculated that is larger means you reject
your null hypothesis and there is a difference
between observed and expect values.
How to use a chi square chart
http://faculty.southwest.tn.edu/jiwilliams/probab2.gif
Q1: Chi Square
•
A hetero red eyed female was crossed with a red
eyed male. The results are shown below. Red
eyes are sex-linked dominant to white, determine
the chi square value. Round to the nearest
hundredth.
Phenotype
# flies observed
Red Eyes
134
White Eyes
66
Chi Square Strategy
• Given—observed
• You have to figure out expected. Usually
to do a Punnett square to figure this out
• Plug in
+
+
Observed—134 red eyes, 66 white eyes
Chi-Square
XR
Xr
XR
XR XR
XR Xr
Y
XR Y
Xr Y
white
red
+
(134-150)2
/150
(66-50)2
+
Expected
/50
3:1 ratio
134+ 66=200
1.70666 ++ 5.12
150 red
50 white
6.83
chi square problems
2013 AP Exam
chi square problems
2013 AP Exam
chi square problems
2013 AP Exam
AP Biology Math Review 2015
1) Take out an APPROVED calculator and formula
sheet.
2) You will solve each problem and grid in the
answer.
Tips
• Grid LEFT to right
• Use the formula sheet
• Don’t round until the end
• Look at HOW the answer should be given
“round to nearest…”
.123
The 1 is in the tenths place
The 2 is in the hundreds place
The 3 is in the thousandths place
Tips
Q2 Calculate…
Q2 Calculate…Answer
Rate
Q2: Rate
Hydrogen peroxide is broken down to water and
oxygen by the enzyme catalase. The following data
were taken over 5 minutes. What is the rate of
enzymatic reaction in mL/min from 2 to 4 minutes?
Round to the nearest hundreds
Time
(mins)
Amount of
O2
produce
d (mL)
1
2.3
2
3.6
3
4.2
4
5.5
5
5.9
Q2 Answer:
• Rise/run= rate= 5.5-3.6/4-2
• Rise/run= rate=1.9/2
• Rise/run= rate= .95
Q3 Calculate Rate…
Q3 Answer
Q4 Hardy-Weinberg
Q2 Answer
Q2: Surface Area and Volume
• What is the SA/V for this cell? Round your
answer to the nearest hundredths.
Q2
SA= 4 r2
=4(3.14) 52
=314
Volume of a sphere= 4/3 r3
=4/3 (3.14)53
=523.33
SA/V=314/523.33
=.60
Q3: Water Potential and Solution Potential
• Solute potential= –iCRT
• i = The number of particles the molecule will make in water; for NaCl this
would be 2; for sucrose or glucose, this number is 1
• C = Molar concentration (from your experimental data)
• R = Pressure constant = 0.0831 liter bar/mole K
• T = Temperature in degrees Kelvin = 273 + °C of solution
Sample Problem
• The molar concentration of a sugar solution in
an open beaker has been determined to be
0.3M. Calculate the solute potential at 27
degrees celsius. Round your answer to the
nearest tenths.
Q3
• Solute potential= –iCRT
-i= 1
C= 0.3
R = Pressure constant = 0.0831
T= 27 +273=300K
Solute concentration= -7.5
Q4: Hardy Weinberg
• A census of birds nesting on a Galapagos
Island revealed that 24 of them show a rare
recessive condition that affected beak
formation. The other 63 birds in this
population show no beak defect. If this
population is in HW equilibrium, what is the
frequency of the dominant allele? Give your
answer to the nearest hundredth
Hardy Weinberg Strategy
• Figure out what you are given
• Allele (p or q) or Genotypes (p2, 2pq, q2)
• Figure out what you are solving for
• Manipulate formulas to go from given to
solving for
• Always dealing with decimals
Q4:Looking for
p—dominant allele
• Homozygous Recessive=q2=24/87= .2758
q2= .2758
q=.5252
p+q=1
p=.47
Q5: Rate
Hydrogen peroxide is broken down to water and
oxygen by the enzyme catalase. The following data
were taken over 5 minutes. What is the rate of
enzymatic reaction in mL/min from 2 to 4 minutes?
Round to the nearest hundreds
Time
(mins)
Amount of
O2
produce
d (mL)
1
2.3
2
3.6
3
4.2
4
5.5
5
5.9
Q5
• Rise/run= rate= 5.5-3.6/4-2
• Rise/run= rate=1.9/2
• Rise/run= rate= .95
Q6: Laws of Probability
• Calculate the probability of
tossing three coins
simultaneously and
obtaining three heads.
Express in fraction form.
Q6
• Probability of a heads is ½
• Probability of heads AND a heads AND a heads
½ X ½ X ½=1/8
Q7: Population Growth
N—total number in pop
•
r—rate of growth
There are 2,000 mice living in a field. If 1,000
mice are born each month and 200 mice die
each month, what is the per capita growth
rate of mice over a month? Round to the
nearest tenths.
•N=2000
•rmax=1000-200=800
•800/2000= 0.4
Q8
• The net annual primary productivity of a
particular wetland ecosystem is found to be
8,000 kcal/m2. If respiration by the aquatic
producers is 12,000 kcal/m2per year, what is
the gross annual primary productivity for this
ecosystem, in kcal/m2 per year? Round to the
nearest whole number.
Q8
• NPP=GPP-R
• 8,000 = GPP – 12,000
• 8,000+ 12,000= GPP
• 20,000=GPP
Q9: Q10
•
Data taken to determine the effect of temperature on
the rate of respiration in a goldfish is given in the table
below. Calculate Q10 for this data. Round to the nearest
whole number.
Temperature
(C)
Respiration
Rate
(Minute)
16
16
21
22
Q9
Q10= ( 22 /16) 10/(21-16)
Q10= (1.375) 2
Q10= 2
Q10:Standard Deviation
•
Grasshoppers in Madagascar show variation in
their back-leg length. Given the following
data, determine the standard deviation for
this data. Round the answer to the nearest
hundredth.
Length(cm): 2.0, 2.2, 2.2, 2.1, 2.0, 2.4 and 2.5
• Average = 2.0 + 2.2 +2.2+2.1+2.0 +2.4
+2.5/7=2.2
• Dev = -.2+ 0+ 0+-.1+-.2+.2+.3
• Dev Squared = .04+0+0+.01+.04
+.04+.09=
• Sum of the Devs Squared = 0.22
Q11: Dilution
• Joe has a 2 g/L solution. He
dilutes it and creates 3 L of a 1
g/L solution. How much of the
original solution did he dilute?
Round to the nearest tenths.
We are looking for V1:
C1V1 = C2V2
2V1 = 1(3)
V1= 1.5
Q12: log
• What is the hydrogen
ion concentration of a
solution of pH 8?
Round to the nearest
whole number
Q12
•[H+] if pH = 8.0
-pH
•[H+] = 10
-8.0
[H+] = 10
•1÷10⁸ = 0.00000001
Q13:Gibbs Free Energy
PICK THE BEST CHOICE:
A chemical reaction is most likely to occur
spontaneously if the
a) Free energy is negative
b) Entropy change is negative
c) Activation energy is positive
d) Heat of reaction is positive
Q13
•Answer: A
Variance (s2)
• Mathematically expressing the degree of
variation of scores (data) from the mean
• A large variance means that the individual
scores (data) of the sample deviate a lot from
the mean.
• A small variance indicates the scores (data)
deviate little from the mean
Calculating the variance for a whole population
Σ = sum of; X = score, value,
µ = mean, N= total of scores or values
OR use the VAR function in Excel
http://www.mnstate.edu/wasson/ed602calcvardevs.htm
Calculating the variance for a Biased SAMPLE population
Σ = sum of; X = score, value,
n -1 = total of scores or values-1
(often read as “x bar”) is the mean (average value of xi).
Note the sample variance is larger…why?
http://www.mnstate.edu/wasson/ed602calcvardevs.htm
Heights in Centimeters of Five Randomly Selected Pea Plants Grown at 8-10 °C
Plant Height
(cm)
Deviations from
mean
Squares of
deviation from
mean
(xi)
(xi- x)
(xi- x)2
A
10
2
4
B
7
-1
1
C
6
-2
4
D
8
0
0
E
9
1
1
Σ xi = 40
Σ (xi- x) = 0
Σ (xi- x)2 = 10
Xi = score or value; X (bar) = mean; Σ = sum of
Finish Calculating the Variance
Σ xi = 40
Σ (xi- x) = 0
Σ (xi- x)2 = 10
There were five plants; n=5; therefore
n-1=4
So 10/4= 2.5
Variance helps to characterize the data concerning a sample by indicating
the degree to which individual members within the sample vary from the
mean
Q2 Answer