Class3 - NYU Stern School of Business

Download Report

Transcript Class3 - NYU Stern School of Business

Statistics & Data Analysis
Course Number
Course Section
Meeting Time
B01.1305
31
Wednesday 6-8:50 pm
CLASS #3
Class #3 Outline
 Brief review of last class + Probability Trees
 Questions on homework
 Chapter 4: Probability Distributions
 Chapter 5: Some Special Distributions
Professor S. D. Balkin -- Feb. 12, 2003
-2-
Review of Last Class
 Introduction to probability
 Ways of determining probabilities
 Rules for combining probabilities
 Conditional probabilities
 Probability Trees
Professor S. D. Balkin -- Feb. 12, 2003
-3-
Combining Events
 The union A  B is the event consisting of all outcomes in A
or in B or in both.
 The intersection A  B is the event consisting of all
outcomes in both A and B.
 If A  B contains no outcomes then A, B are said to be
mutually exclusive .
 The Complement A of the event A consists of all outcomes
in the sample space S which are not in A.
Professor S. D. Balkin -- Feb. 12, 2003
-4-
Conditional Probability (cont.)
Conditiona l Probabilit y P( B | A)
P( A  B)
P( B | A) 
P( A)
Professor S. D. Balkin -- Feb. 12, 2003
-5-
Statistical Independence
Events A and B are statistically independent if and
only if P(B|A) = P(B). Otherwise, they are
dependent.
If events A and B are independent, then
P(A  B) = P(A)P(B)
Professor S. D. Balkin -- Feb. 12, 2003
-6-
Probability Trees
Constructing Probability Trees
1. Events forming the first set of branches must have known
marginal probabilities, must be mutually exclusive, and
should exhaust all possibilities
2. Events forming the second set of branches must be entered
at the tip of each of the sets of first branches. Conditional
probabilities, given the relevant first branch, must be entered,
unless assumed independence allows the use of
unconditional probabilities
3. Branches must always be mutually exclusive and exhaustive
Professor S. D. Balkin -- Feb. 12, 2003
-8-
Let’s Make a Deal
 In the show Let’s Make a Deal, a prize is hidden behind on of
three doors. The contestant picks one of the doors.
 Before opening it, one of the other two doors is opened and it
is shown that the prize isn’t behind that door.
 The contestant is offered the chance to switch to the
remaining door.
 Should the contestant switch?
 Solve by making a tree…
Professor S. D. Balkin -- Feb. 12, 2003
-9-
Employee Drug Testing
 A firm has a mandatory, random drug testing policy
 The testing procedure is not perfect.
• If an employee uses drugs, the test will be positive with probability
0.90.
• If an employee does not use drugs, the test will be negative 95% of the
time.
• Confidential sources say that 8% of the employees are drug users
 8% is an unconditional probability; 90 and 95% are
conditional probabilities
Professor S. D. Balkin -- Feb. 12, 2003
- 10 -
Employee Drug Testing (cont.)
 Create a probability tree and verify the following probabilities:
• Probability of randomly selecting a drug user who tests positive =
0.072
• Probability of randomly selecting a non-user who tests positive = 0.046
• Probability of randomly selecting someone who tests positive = 0.118
• Conditional probability of testing positive given a non-drug user = 0.05
Professor S. D. Balkin -- Feb. 12, 2003
- 11 -
Statistical Modeling
 Statistical modeling is the process of creating mathematical
representations that reflect physical phenomena and make use of
any available data or information
 Possible uses:
•
•
•
•
Description
Prediction
Optimization
uncertainty analysis
 Statistical models may integrate real-world data with results
from physical experiments and computer codes.
Professor S. D. Balkin -- Feb. 12, 2003
- 12 -
CHAPTER 4
Chapter Goals
 To understand the concepts of probability distributions
 To be able to calculate the expected value and standard
deviation of a distribution
 To understand the difference between sample mean and
expected value
Professor S. D. Balkin -- Feb. 12, 2003
- 14 -
Random Variables
 A quantity that takes on different values depending on chance
•
•
•
•
•
Next quarter’s sales for a given company
The proportion of interviewees that express and intention to buy
Your day- trading profits for next year
The number of free throws out of 10 a player makes
Number of defective products produced next week
 A random variable is the result of a random experiment in the abstract
sense, before the experiment is performed
 The value the random variable actually assumes is called an observation
 A probability distribution is the pattern of probabilities a random variable
assumes
Professor S. D. Balkin -- Feb. 12, 2003
- 15 -
Random Variables (cont)
 You can think of your data set as observations of a random
variable resulting from several repetitions of a random
experiment
 We associate the random variable with a population and view
observations of the random variable as data
 Example:
• Suppose we toss a coin five times. The observed data set is a
sequence of zeros and ones, such as 1 1 0 1 0. Each of the five digits
in this sequence represents the outcome of the random experiment of
tossing a coin once, where 1 denotes Heads and 0 denotes Tails. We
have five repetitions of the experiment.
Professor S. D. Balkin -- Feb. 12, 2003
- 16 -
Types of Random Variables
RANDOM VARIABLE
DISCRETE
Can assume only a finite
or countably infinite number
of values
CONTINUOUS
Can assume any, infinite
number of values
Number of Heads on two coin tosses
Sum of the roll of two dice
Number of customers who enter a store in an hour
Number of cars sold in a day
Daily return of a stock
Ounces of liquid in a bottle of beer
A person's weight
Stock portfolio volatility
Professor S. D. Balkin -- Feb. 12, 2003
- 17 -
Discrete Probability Distribution
 A list of the possible values of a discrete random variable,
together with their associated probabilities
 The probability distribution tells us everything we can know
about a random variable, before it becomes an observation
 Example: Distribution of # Heads in Two Tosses
• S = {HH, HT, TH, TT}
x
0
1
2
Professor S. D. Balkin -- Feb. 12, 2003
Prob{Number of Heads = x}
1/4
1/2
1/4
- 18 -
Example: Discrete Distribution
 When Quality Control testing entails destroying the tested product, for
obvious economic reasons, a sample of items are tested.
 A plant that produces cell phones in equal quantities on two production
lines.
 Quality control experts determined the plant should test three randomly
selected phones; 2 from one line and 1 from the other, where the number
from each line is chosen by flipping a coin three times
 Construct the probability distribution of the number of phones chosen from
Line 1
• Construct sample space
• Probability
• Random variable value
Professor S. D. Balkin -- Feb. 12, 2003
- 19 -
Example: Discrete Distribution
 We want to conduct two one-on-one interviews with neurologists to get
their opinions on an existing drug
 Suppose that a random sample of two neuros is to be selected from all
neuros consisting of 70% who have ever prescribed the drug and 30%
who have not
 Questions
• List all possible outcomes of the selection
• Assign probabilities
• Define the quantitative variable Y as the number of neuros who have
prescribed the drug in the sample. Specify the possible values that the
random variable assume and determine the probability of each
Professor S. D. Balkin -- Feb. 12, 2003
- 20 -
Distribution Notation
1. The probabilit y Py ( y ) associated with each value of
Y must lie in the interval 0  Py(y)  1
2. The sum of the probabilit ies for all values of Y equals 1.
 P ( y)
y
y
3. Because different values of Y are mutually exclusive, their
probabilit ies are additive : P(Y  a or Y  b)  Py (Y  a )  Py (Y  b)
Professor S. D. Balkin -- Feb. 12, 2003
- 21 -
Cumulative Distribution Function
 Cumulative Distributi on Function Fy ( y ) for a discrete
random variable Y is a function t hat specifies the probabilit y that
Y  y for all values of y.
 By the addition law of probabilit ies :
Fy ( y )  P(Y  y )  Py (Y  0)  Py (Y  1)    Py (Y  y )
Professor S. D. Balkin -- Feb. 12, 2003
- 22 -
Example: CDF
 Suppose a financial firm plans to release a new fund. The
fund manager has assessed the following subjective
probabilities for the first-year return for a $10,000 investment
 Find the following probabilities, as assessed by the fund
manager:
x
F x (x)
$0 $100 $200 $300 $400 $500 $600 $700 $800
0.05 0.20 0.40 0.60 0.75 0.85 0.90 0.95 1.00
 P ( X  $500)
 P ($200  X  $400)
 P ( X  $100)
Professor S. D. Balkin -- Feb. 12, 2003
- 23 -
Expected Value of Discrete Distribution
 If Y is a discrete random variable, then E (Y )   y  Py ( y )
y
 E(Y) is a weighted average of the possible values of Y
 The weights are the probabilit ies of occurrence of those values
 E(Y) is the long run average value of Y if the experiment is
repeated many times . Therefore, E(Y) may be thought of as the
theoreti cal mean of the random variable Y .
 E(Y) is the mean of the population of values of Y obtained over
many repetition s of the experiment . So we can write E (Y )  
where  is the theoretic al mean of the distributi on
Professor S. D. Balkin -- Feb. 12, 2003
- 24 -
Example: Expected Value
 A firm is considering two possible investments. The firm
assigns rough probabilities of losing 20% per dollar invested,
losing 10%, breaking even, gaining 10%, and gaining 20%.
 Let Y be the return per dollar invested in the first project and Z
the return per dollar invested in the second.
y: -20% -10%
P y (y):
0.1
0.2
0%
0.4
10%
0.2
20%
0.1
z: -20% -10%
P z (z): 0.01 0.04
0%
0.10
10%
0.50
20%
0.35
Professor S. D. Balkin -- Feb. 12, 2003
- 25 -
Standard Deviation
 Measure of probability dispersion, variability, or risk of a
random variable
If Y is a discrete random variable,
 y2  Var (Y )   ( y   y ) 2 Py (Y  y ) where
y
 y  E (Y ).
The standard deviation of Y, denoted  Y , is the square
root of the variance :  Y  Var (Y )
Professor S. D. Balkin -- Feb. 12, 2003
- 26 -
Example: Standard Deviation
 A firm is considering two possible investments. The firm
assigns rough probabilities of losing 20% per dollar invested,
losing 10%, breaking even, gaining 10%, and gaining 20%.
 Let Y be the return per dollar invested in the first project and Z
the return per dollar invested in the second.
y: -20% -10%
P y (y):
0.1
0.2
0%
0.4
10%
0.2
20%
0.1
z: -20% -10%
P z (z): 0.01 0.04
0%
0.10
10%
0.50
20%
0.35
Professor S. D. Balkin -- Feb. 12, 2003
- 27 -
Standard Deviation (cont)
 Note: σ2 and σ defined above are theoretical variance and
standard deviation of X. You don’t need any data to compute
them. You just need to know the distribution of X.
 The mean of a random variable is NOT the same thing as a
sample mean. The variance of a random variable is NOT the
same thing as a sample variance.
Professor S. D. Balkin -- Feb. 12, 2003
- 28 -
Empirical Rule for Random Variables
If the random variable Y has roughly a mound - shaped
probabilit y histogram,
P(Y falls within one  Y of its mean  Y )  0.68
and
P(Y falls within tw o  Y of its mean  Y )  0.95
Professor S. D. Balkin -- Feb. 12, 2003
- 29 -
Continuous Random Variables
 As with discrete probability distribution functions, one can also
determine the expected value and standard deviation of
continue PDFs
 Mathematical definitions for continuous random variables
necessarily involve calculus
 Sums are simply replaced by integrals
 This is beyond the expectations for this class
Professor S. D. Balkin -- Feb. 12, 2003
- 30 -
Example
 A call option on a stock is being evaluated.
 If the stock goes down, the option expires and is worthless. If it goes up,
the payoff depends on how high the stock goes.
 Assume a discrete payoff distribution:
Payoff Probability
0
10
15
20
0.5
0.25
0.15
0.1
 Questions:
•
•
•
•
What is the expected value of the payoff?
What is the standard deviation of the payoff?
Find the probability that the option will pay at least $15
Find the probability that the option will pay less than $20
Professor S. D. Balkin -- Feb. 12, 2003
- 31 -
Chapter 5
Some Special Probability Distributions
Chapter Goals
 Introduce some special, often used distributions
 Understand methods for counting the number of sequences
 Understand situations consisting of a specified number of
distinct success/failure trials
 Understanding random variables that follow a bell-shaped
distribution
Professor S. D. Balkin -- Feb. 12, 2003
- 33 -
Counting Possible Outcomes
 In order to calculate probabilities, we often need to count how
many different ways there are to do some activity
 For example, how many different outcomes are there from
tossing a coin three times?
 To help us to count accurately, we need to learn some
counting rules
 Multiplication Rule : If there are m ways of doing one thing
and n ways of doing another thing, there are m times n ways
of doing both
Professor S. D. Balkin -- Feb. 12, 2003
- 34 -
Example
 An auto dealer wants to advertise that for $20G you can buy
either a convertible or 4-door car with your choice of either
wire or solid wheel covers.
 How many different arrangements of models and wheel
covers can the dealer offer?
Professor S. D. Balkin -- Feb. 12, 2003
- 35 -
Counting Rules
 Recall the classical interpretation of probability:
P(event) = number of outcomes favoring event / total number of outcomes
 Need methods for counting possible outcomes without the labor of listing
entire sample space
 Counting methods arise as answers to:
• How many sequences of k symbols can be formed from a set of r distinct
symbols using each symbol no more than once?
• How many subsets of k symbols can be formed from a set of r distinct symbols
using each symbol no more than once?
 Difference between a sequence and a subset is that order matters for a
sequence, but not for a subset
Professor S. D. Balkin -- Feb. 12, 2003
- 36 -
Counting Rules (cont)
 Create all k=3 letter subsets and sequences of the r=5 letters:
A, B, C, D and E
 How many sequences are there?
 How many subsets are there?
Professor S. D. Balkin -- Feb. 12, 2003
- 37 -
Example
 A group of three electronic parts is to be assembled into a plug-in unit for a
TV set
• The parts can be assembled in any order
• How many different ways can they be assembled?
 There are eight machines but only three spaces on the machine shop
floor.
• How many different ways can eight machines be arranged in the three
available spaces?
 The paint department needs to assign color codes for 42 different parts.
Three colors are to be used for each part. How many colors, taken three
at a time would be adequate to color-code the 42 parts?
Professor S. D. Balkin -- Feb. 12, 2003
- 38 -
Review: Sequence and Subset
 For a sequence, the order of the objects for each possible
outcome is different
 For a subset, order of the objects is not important
Professor S. D. Balkin -- Feb. 12, 2003
- 39 -
Counting Rules (cont)
 Choose a sequence of k  3 letters from r  5 letters
 Number of sequences is called the number of permutatio ns of
r symbols taken k at a time
r!
 r Pk 
(r  k )!
 Choose the number of combinatio ns of k  3 letters from r  5 letters
 Number of sequences is called the number of combinatio ns of
r symbols taken k at a time
r!
 r Ck 
k!(r  k )!
Professor S. D. Balkin -- Feb. 12, 2003
- 40 -
Binomial Distribution
 Percentages play a major role in business
 When percentage is determined by counting the number of times
something happens out of the total possibilities, the occurrences might
following a binomial distribution
 Examples:
•
•
•
•
Number of defective products out of 10 items
Of 100 people interviewed, number who expressed intention to buy
Number of female employees in a group of 75 people
Number of Independent Party votes cast in the next election
Professor S. D. Balkin -- Feb. 12, 2003
- 41 -
Binomial Distribution (cont)

Each time the random experiment is run, either the event happens or it
doesn’t

The random variable X, defined as the number of occurrences of a
particular event out of n trials has a binomial distribution if:
1.
2.
For each of the n trials, the event always has the same probability  of
happening
The trials are independent of one another
Binomial Proportion :
X Number of occurrence s
p 
n
Number of trials
Professor S. D. Balkin -- Feb. 12, 2003
- 42 -
Example: Binomial Distribution
 You are interested in the next n=3 calls to a catalog order desk and
know from experience that 60% of calls will result in an order
 What can we say about the number of calls that will result in an
order?
 Create a probability tree
 Questions:
• What is the expected number of calls resulting in an order
• What is the standard deviation
Professor S. D. Balkin -- Feb. 12, 2003
- 43 -
Binomial Distribution the Easy Way
Mean
Standard
Deviation
Professor S. D. Balkin -- Feb. 12, 2003
Number of
Occurrences, X
Proportion or
Percentage
E(X) = n 
E(p)= 
X=(n (1- ))0.5
p=((1- )/n)0.5
- 44 -
Finding Binomial Probabilities
 n a
na
P( X  a)    (1   )
a
Professor S. D. Balkin -- Feb. 12, 2003
- 45 -
Example: Binomial Probabilities
 How many of your n=6 major customers will call tomorrow?
 There is a 25% chance that each will call
 Questions:
•
•
•
•
How many do you expect to call?
What is the standard deviation?
What is the probability that exactly 2 call?
What is the probability that more than 4 call?
Professor S. D. Balkin -- Feb. 12, 2003
- 46 -
Example
 It’s been a terrible day for the capital markets with losers
beating winners 4 to 1
 You are evaluating a mutual fund comprised of 15 randomly
selected stocks and will assume a binomial distribution for the
number of securities that lost value
 Questions:
•
•
•
•
What assumptions are being made?
How many securities do you expect to lose value?
Find the probability that 8 securities lose value
What is the probability that 12 or more lose value?
Professor S. D. Balkin -- Feb. 12, 2003
- 47 -
Computing Tutorial
 Simulation
 Calculating probabilities
Professor S. D. Balkin -- Feb. 12, 2003
- 48 -
Homework #3
 To be handed out in class
Professor S. D. Balkin -- Feb. 12, 2003
- 49 -
Next Time
 Normal distribution
 Statistical Inference
Professor S. D. Balkin -- Feb. 12, 2003
- 50 -