Statistics Presentation

Download Report

Transcript Statistics Presentation

Introduction to Statistics
Steven A. Jones
Biomedical Engineering
Louisiana Tech University
(Created for our NSF-funded
Research Experiences in Micro/Nano
Engineering Program)
Experimental Design
1.Develop a Hypothesis
2.Design Statistical Analysis
3.Set up Experiment
4.Test Experiment (Positive Control)
5.Collect Data
6.Perform Statistical Analysis
Step 2 must not be done after the data have been
collected!
Some Common Probability Distributions
Uniform Distribution
Probability Density
Probability Density
Gaussian Distribution
0.15
0.1
0.05
0
0
5
10
15
20
25
Probability Density
Value of x
0.4
0.2
0.1
0
0
5
10
15
Value of x
20
25
Gaussian: Sum of numbers.
Rayleigh Distribution
0.3
Uniform: e.g. Dice throw.
0.2
Rayleigh: Square root of the
sum of the squares of two
Gaussians.
0.1
0
0
2
4
Value of X
6
Introductory Question
There is a 60% chance of rain on Friday and
a 45% chance of rain on Saturday.
What is the probability that it will rain on
Friday and Saturday?
Introductory Question
There is a 60% chance of rain on Friday and
a 45% chance of rain on Saturday.
What is the probability that it will rain on
either Friday or Saturday?
Axioms of Probability
For independent outcomes A and B:
P  A & B   P  APB 
P  A   1  P  A
P  A or B   1  P A P B 
 1  1  P  A 1  PB 
 P A  P B   P AP B 
If probabilities are small, then they can be added when
“or” is used.
Relationships Among Probability Distributions
Assume that xi are uniformly distributed. Then:
N
y i   Ai xi
Is Gaussian (Normal) distributed for N sufficiently large.
i 1
z  y12  y22
w z
Is
 2 distributed.
Is Rayleigh distributed.
Simple Distribution
You expect that the correct value for a
height measurement is 12 meters, and the
standard deviation is 3 meters. One way to
determine whether or not you are correct is
to take some measurements.
What could you conclude if you made one measurement
and the value fell as follows on the expected distribution?
Probability Density
Gaussian Distribution
0.15
0.1
0.05
0
0
5
10
15
Value of x
20
25
What if three values fell as follows on the expected
distribution?
Probability Density
Gaussian Distribution
0.15
0.1
0.05
0
0
5
10
15
Value of x
20
25
Did the two data sets to
the left come from
different distributions?
Set 1:
Set 2:
9
9.5
10
10.5
11
11.5
12
Value of X
Set 1:
What about
these two?
Set 2:
9
9.5
10
10.5
Value of X
11
11.5
12
How confident are you that the data sets in
each plot below come from different
distributions?
9
9.5
10
10.5
11
11.5
12
9
9.5
10
10.5
11
11.5
12
11.5
12
Value of X
Value of X
Lower standard deviation
9
9.5
10
10.5
11
11.5
Value of X
Smaller difference in means
12
9
9.5
10
10.5
Value of X
Fewer data points
11
Student’s T Test
A Student’s T test measures the confidence you can have
that two values are inherently different, based on three
parameters
1. Difference of the means
2. Standard deviations
3. Number of data points obtained
Particularly useful when there are multiple confounding
variables.
E.g. Blood pressure drugs – are we, on average, lowering
blood pressure?
A Student’s T test is used to answer the following question:
Given:
• Difference of the means
• Standard deviations
• Number of data points obtained
• That these data come from normal
distributions
What is the probability (p) that they came from
the same underlying distribution?
Example
Given the mean and standard deviation for pressure, along
with the number of points measured from a clinical drug
trial, what is the probability that the drug had an effect on
the distribution (i.e. that it changed the blood pressure of
these individuals on average).
Sample Mean: Mean from the sample that was
taken (the 2000 people in the drug trial).
Distribution Mean: Mean that would occur if you
could give the drug to everyone in the world
and do the measurement.
Underlying and Sample Distributions
Uniform Distribution
30
Uniform (Theory)
Gaussian Distribution
25
Gaussian (Theory)
Frequency
20
15
10
5
0
-3
-2
-1
0
Bin
1
2
3
Statistical Tests You Should Know
T-test: Are the means of two data sets the same?
F-test: Are the standard deviations of two data
sets the same?
Chi-Squred Test: Does the distribution of a data
set match a proposed distribution?
Anova: Like an F-test for multiple variables.
Pearson’s Correlation Coefficient: Does one
variable depend on another?
To Run a T-Test
Calculate the mean of the data.
Calculate the standard deviation of the data.
Determine the T statistic (e.g.
Nx  )
From T determine p.
p is “the probability that you would get a
difference in means this large or smaller, given
that the two measurement sets come from the
same distribution.”
Interpretation of T test
You set the value that you consider
significant.
Medical applications: p < 0.05 is
“significant.”
Since p < 0.05 is a 1/20 probability, you will
typically be wrong once in every 20 T
tests.
Hypothesis
Null hypothesis: statement that the two
distributions are the same. i.e. “Altase
causes no change in blood pressure.”
Alternative hypothesis: Can vary.
 Altase reduces the mean blood pressure.
 Altase changes the mean blood pressure.
One-Tailed vs Two-Tailed
Depends on “alternative hypothesis.”
 One tail: If alternative hypothesis is that one mean is
greater than the other.
 Two tail: If alternative hypothesis is that the means
are different.
Saying that one of the means is greater is more
restrictive.
The confidence you have in your result depends
on the prediction (1st law of the frisbee).
Example
A friend throws a frisbee. It bounces off a pole,
goes to the roof of the house, rolls along an arc,
flips off the gutter, and then lands in the fountain.
Are you impressed?
A friend predicts that the frisbee will do the above,
and then it happens. Are you impressed?
As with the frisbee, statistical analysis depends on
how far you are willing to stick your neck out.
Confidence Interval
States the range of values that contains the true
value within a given percent confidence.
Depends on  , number of samples, and
desired confidence.
Not a statistical test of significance, but related
to the T-test
• The more samples
we have, the narrower
the confidence interval.
Probability Density
Gaussian Distribution
0.15
0.1
0.05
0
0
5
10
15
Value of x
20
25