Review and Intro to Probability

Download Report

Transcript Review and Intro to Probability

QM 2113 -- Fall 2003
Statistics for Decision Making
Descriptive Statistics & Excel
Intro to Probability
Instructor: John Seydel, Ph.D.
Student Objectives
Use Excel to assist in performing basic
univariate data analysis (for description)
Perform descriptive bivariate analyses with
help from Excel


Quantitative variables
Qualitative variables
Understand concepts of events and
probability
Use standard probability notation
Relate probability to relative frequency
Define probability distribution
Miscellaneous Items
Exam 1


Collect homework version
Return exam
 Grading
 Comments
Need help with Excel basics? (Probably!)
Homework:


Data analyses (using Excel)
Probability distribution material
Univariate Analysis: Questions
About the Exercise/Homework?
Looking at the variation in Credits (from
Exam 1)
Univariate analysis tools


Histogram (informal, visual analysis)
Descriptive statistics
 Measures of location
 Measures of variation
The analysis (via Excel)


Basic descriptive statistics (n, min, max, xbar, s)
Histogram:
 No good Excel function
 Need to create a flexible table/chart
Bivariate Analysis: Questions
About the Exercise/Homework?
Hypothesis: experienced students are likely to be
more familiar with issues
Appropriate analysis


Examine Level vs Credits
That is, Level = b0 + b1Credits
Tools


Scatterplot (i.e., XY chart)
Regression (using Excel functions)




b0
b1
R2
Syx
Reference material: see Handouts page
Data from Exam 1
Bivariate Analysis: Qualitative
Data
Hypothesis: students with certain majors may be
more likely to favor publishing instructor evaluations
(from Exam 1)
Appropriate analysis



Examine Favor vs Major
But not, Favor = b0 + b1Major
That is, regression applies only to quantitative data
Tools


Crosstabulation (i.e., joint frequency table)
Contingency analysis
 Special versions of crosstabulation
 Chi-square analysis
 Beyond our scope, at least for now
Excel feature that’s helpful: PivotTable
Now, Let’s Consider Probability
Just a numeric way of expressing about how
certain we feel that a particular event will
occur; measures chance



Uses a scale of 0 to 1 (computations)
Conversationally: 0% to 100%
Alternatively, in terms of odds
Can determine probability



Theoretically
Subjectively
Empirically (i.e., using relative frequencies)
Probability allows us to develop inferences
based upon descriptive statistics
Some Foundations
Basic notation:
P( . . . ) is the probability that whatever’s inside
the parentheses will occur, e.g.,




P(B) = probability that event B will occur
P(x=5) = probability that x will be 5
P(Raise) = probability that JoJo will get a raise
P(75) = probability that exam score will be 75
Definitive rules:



0.00 ≤ P( . . . ) ≤ 1.00 or 0% ≤ P( . . . ) ≤ 100%
For exhaustive & mutually exclusive set of events
SP( . . . ) = 1.00
Keep these in mind when doing calculations (i.e.,
the voice of reason)
Relative Frequency
Regardless of method used to determine
probability, it can be interpreted as relative
frequency

Recall that relative frequency is observed
proportion of time some event has occurred
 Sites developed in-house
 Incomes between $10,000 and $20,000

Probability is just expected proportion of time we
expect something to happen in the future given
similar circumstances
Note also, proportions are probabilities
Example: ASU student credit-hours
Getting some use out of Probability:
Distributions
Random variables are either

Discrete
 Limited possible outcomes
 Examples: daily sales, defects, emergencies, . . .

Or continuous
 Infinite possible outcomes
 Examples: waiting time, gas mileage, earnings/share, . . .
Normal distributions: the most well known
continuous distribution
Let’s take a closer look at normally distributed
random variables . . .
An Example
You need a car that gets at least 30 mpg
Suppose a particular model of car has been tested


Average mpg = m (not x-bar) = 34
Standard deviation = s = 3 mpg
Typically histograms for this type of thing look like
25%
20%
15%
10%
5%
0%
32
34
38
That is, mpg is approximately normally distributed (aka
the “bell curve”)
Note: Percentage of area indicates probability!
If Something’s Normally Distributed
It’s described by


m (the population/process average)
s (the population/process standard deviation)
Histogram is symmetric


Thus no skew (average = median)
So P(x < m) = P(x > m) = . . . ?
Shape of histogram can be described by
f(x) = (1/s√2p)e-[(x-m)2/2s 2]
We determine probabilities based upon
distance from the mean (i.e., the number of
standard deviations)
Back to Our Problem at Hand
We need a car that gets at least 30 mpg
How likely is it that this model of vehicle will
meet our needs?
That is, P(x > 30) = . . . ?

First, sketch
 Number line with


Average
Also x value of concern
 Curve approximating histogram




Identify areas of importance
Then determine how many sigma 30 is from mu
Now use the table
Finally, put it all together
Comments on the Problem
A sketch is essential!




Use to identify regions of concern
Enables putting together results of calculations,
lookups, etc.
Doesn’t need to be perfect; just needs to indicate
relative positioning
Make it large enough to work with; needs
annotation (probabilities, comments, etc.)
Now, what do we do with the probability
we’ve just determined?
Make a decision!
Some Other Exercises
Let x ~ N(34,3) as with the mpg problem
Determine

Tail probabilities
 F(30) which is the same as P(x ≤ 30)
 P(x > 40)

Tail complements
 P(x > 30)
 P(x < 40)

Other
 P(32 < x < 33)
 P(30 < x < 35)
 P(20 < x < 30)
Keep In Mind
Probability = proportion of area under the
normal curve
What we get when we use tables is always the
area between the mean and z standard
deviations from the mean
Because of symmetry
P(x > m) = P(x < m) = 0.5000
Tables show probabilities rounded to 4 decimal
places


If z < -3.89 then probability ≈ 0.5000
If z > 3.89 then probability ≈ 0.5000
Theoretically, P(x = a) = 0
P(30 ≤ x ≤ 35) = P(30 < x < 35)
Summary of Objectives
Use Excel to assist in performing basic
univariate data analysis (for description)
Perform descriptive bivariate analyses with
help from Excel


Quantitative variables
Qualitative variables
Understand concepts of events and
probability
Use standard probability notation
Relate probability to relative frequency
Define probability distribution
Appendix
Exam Comments
Overall: pretty good!
Exhibits

Titles
 Main
 Vertical axis (or columns)
 Horizontal axis (or rows)

Names, not codes
Units of measure (exhibits & answers)
Main trouble spots



Empirical Rule
Histogram interpretation
Confidence interval estimate
Other issues/questions?