Review and Intro to Probability
Download
Report
Transcript Review and Intro to Probability
QM 2113 -- Fall 2003
Statistics for Decision Making
Descriptive Statistics & Excel
Intro to Probability
Instructor: John Seydel, Ph.D.
Student Objectives
Use Excel to assist in performing basic
univariate data analysis (for description)
Perform descriptive bivariate analyses with
help from Excel
Quantitative variables
Qualitative variables
Understand concepts of events and
probability
Use standard probability notation
Relate probability to relative frequency
Define probability distribution
Miscellaneous Items
Exam 1
Collect homework version
Return exam
Grading
Comments
Need help with Excel basics? (Probably!)
Homework:
Data analyses (using Excel)
Probability distribution material
Univariate Analysis: Questions
About the Exercise/Homework?
Looking at the variation in Credits (from
Exam 1)
Univariate analysis tools
Histogram (informal, visual analysis)
Descriptive statistics
Measures of location
Measures of variation
The analysis (via Excel)
Basic descriptive statistics (n, min, max, xbar, s)
Histogram:
No good Excel function
Need to create a flexible table/chart
Bivariate Analysis: Questions
About the Exercise/Homework?
Hypothesis: experienced students are likely to be
more familiar with issues
Appropriate analysis
Examine Level vs Credits
That is, Level = b0 + b1Credits
Tools
Scatterplot (i.e., XY chart)
Regression (using Excel functions)
b0
b1
R2
Syx
Reference material: see Handouts page
Data from Exam 1
Bivariate Analysis: Qualitative
Data
Hypothesis: students with certain majors may be
more likely to favor publishing instructor evaluations
(from Exam 1)
Appropriate analysis
Examine Favor vs Major
But not, Favor = b0 + b1Major
That is, regression applies only to quantitative data
Tools
Crosstabulation (i.e., joint frequency table)
Contingency analysis
Special versions of crosstabulation
Chi-square analysis
Beyond our scope, at least for now
Excel feature that’s helpful: PivotTable
Now, Let’s Consider Probability
Just a numeric way of expressing about how
certain we feel that a particular event will
occur; measures chance
Uses a scale of 0 to 1 (computations)
Conversationally: 0% to 100%
Alternatively, in terms of odds
Can determine probability
Theoretically
Subjectively
Empirically (i.e., using relative frequencies)
Probability allows us to develop inferences
based upon descriptive statistics
Some Foundations
Basic notation:
P( . . . ) is the probability that whatever’s inside
the parentheses will occur, e.g.,
P(B) = probability that event B will occur
P(x=5) = probability that x will be 5
P(Raise) = probability that JoJo will get a raise
P(75) = probability that exam score will be 75
Definitive rules:
0.00 ≤ P( . . . ) ≤ 1.00 or 0% ≤ P( . . . ) ≤ 100%
For exhaustive & mutually exclusive set of events
SP( . . . ) = 1.00
Keep these in mind when doing calculations (i.e.,
the voice of reason)
Relative Frequency
Regardless of method used to determine
probability, it can be interpreted as relative
frequency
Recall that relative frequency is observed
proportion of time some event has occurred
Sites developed in-house
Incomes between $10,000 and $20,000
Probability is just expected proportion of time we
expect something to happen in the future given
similar circumstances
Note also, proportions are probabilities
Example: ASU student credit-hours
Getting some use out of Probability:
Distributions
Random variables are either
Discrete
Limited possible outcomes
Examples: daily sales, defects, emergencies, . . .
Or continuous
Infinite possible outcomes
Examples: waiting time, gas mileage, earnings/share, . . .
Normal distributions: the most well known
continuous distribution
Let’s take a closer look at normally distributed
random variables . . .
An Example
You need a car that gets at least 30 mpg
Suppose a particular model of car has been tested
Average mpg = m (not x-bar) = 34
Standard deviation = s = 3 mpg
Typically histograms for this type of thing look like
25%
20%
15%
10%
5%
0%
32
34
38
That is, mpg is approximately normally distributed (aka
the “bell curve”)
Note: Percentage of area indicates probability!
If Something’s Normally Distributed
It’s described by
m (the population/process average)
s (the population/process standard deviation)
Histogram is symmetric
Thus no skew (average = median)
So P(x < m) = P(x > m) = . . . ?
Shape of histogram can be described by
f(x) = (1/s√2p)e-[(x-m)2/2s 2]
We determine probabilities based upon
distance from the mean (i.e., the number of
standard deviations)
Back to Our Problem at Hand
We need a car that gets at least 30 mpg
How likely is it that this model of vehicle will
meet our needs?
That is, P(x > 30) = . . . ?
First, sketch
Number line with
Average
Also x value of concern
Curve approximating histogram
Identify areas of importance
Then determine how many sigma 30 is from mu
Now use the table
Finally, put it all together
Comments on the Problem
A sketch is essential!
Use to identify regions of concern
Enables putting together results of calculations,
lookups, etc.
Doesn’t need to be perfect; just needs to indicate
relative positioning
Make it large enough to work with; needs
annotation (probabilities, comments, etc.)
Now, what do we do with the probability
we’ve just determined?
Make a decision!
Some Other Exercises
Let x ~ N(34,3) as with the mpg problem
Determine
Tail probabilities
F(30) which is the same as P(x ≤ 30)
P(x > 40)
Tail complements
P(x > 30)
P(x < 40)
Other
P(32 < x < 33)
P(30 < x < 35)
P(20 < x < 30)
Keep In Mind
Probability = proportion of area under the
normal curve
What we get when we use tables is always the
area between the mean and z standard
deviations from the mean
Because of symmetry
P(x > m) = P(x < m) = 0.5000
Tables show probabilities rounded to 4 decimal
places
If z < -3.89 then probability ≈ 0.5000
If z > 3.89 then probability ≈ 0.5000
Theoretically, P(x = a) = 0
P(30 ≤ x ≤ 35) = P(30 < x < 35)
Summary of Objectives
Use Excel to assist in performing basic
univariate data analysis (for description)
Perform descriptive bivariate analyses with
help from Excel
Quantitative variables
Qualitative variables
Understand concepts of events and
probability
Use standard probability notation
Relate probability to relative frequency
Define probability distribution
Appendix
Exam Comments
Overall: pretty good!
Exhibits
Titles
Main
Vertical axis (or columns)
Horizontal axis (or rows)
Names, not codes
Units of measure (exhibits & answers)
Main trouble spots
Empirical Rule
Histogram interpretation
Confidence interval estimate
Other issues/questions?