Basic Review of Statistics

Download Report

Transcript Basic Review of Statistics

Basic Review of Statistics
By this point in your college career, the BB students
should have taken STAT 171 and perhaps DS 303/ ECON
387 (core requirements for the BB degree).
For the BA students, deficiency MA students, and those of
you that haven’t completed your statistics requirements
we will overview the key topics necessary for applications
directly related to Econ 330.
Population Parameters vs. Sample Statistics
Population Parameters: descriptive measures of the entire population
that you’re interested in examining

Ex: All US households

Ex: All Illinois households with m > $25,000

In the absence of complete and detailed information on every
household you are interested in you must estimate the population
parameters. Most common way is using sample statistics.
Sample Statistics: descriptive measures of a representative sample, or
subset, of the population.

Ex: instead of surveying every US household we send out surveys
to a subset of the population and use that basic information to
estimate what the values would be for the overall population.
Measures of Central Tendency
1.
Mean (or “arithmetic mean” or “average”): the sum
of numbers included in the sample divided by the
number of observations, n.
◦ Ex: calculate the average cost per unit (AC) across
different firms given cost data: $20.6, $40.3, $15.8, $23.7

 Typically written as:
x  25.10
◦ Limitation of the Mean: because it is only an average, you
can expect that actual data will rarely coincide exactly
with your estimate. If there is high variation in your data
the average may not be very useful in estimation.
Measures of Central Tendency continued
2. Median: is the middle observation in your
data.
Indicates that half of your observations are above this
value and half of your observations are below this
value
to find the value of the median, rank in ascending or
descending order your observations by value. The
observation in the middle is the median.
Ex: 40, 80, 18, 32, 50


Measures of Central Tendency continued
3. Mode: the most frequent value in the sample.
 useful when there is little variation in the data (values
tend to be continuous and close to one another e.g.
sales)
 ex: sales data of ice cream in gallons over 8 weeks:
100, 99, 100, 102, 97, 110, 100, 103

 Aids
in identifying the most common value for
marketing purposes such as color or size of an item
Measures of Dispersion
1. Range: difference between the largest and the
smallest sample observation value
◦ Our firm’s highest profit this year was $20 million, and the
lowest profit this year was $12 million ___________
________________________________
◦ The larger the range, the more variation or dispersion.
◦ Often used for “best case” and “worst case” scenario
projections.
◦ Limitation: only focuses on the extreme values and may
not be really representative of the entire sample.
2. Variance and Standard Deviation:
Variance (σ2 or s2): arithmetic mean of the squared deviation of each observation
from the overall mean

How far observation values are from the average or how far they deviate from the
average value; whether they are above or below doesn’t matter; squaring the
deviations makes sure positive and negative deviations don’t cancel out each other.
σ2 =
◦ Where x is the value in your sample; μ is the population average or mean so (xμ) is how far your value deviates from the average; n is the number of
observations.
Standard Deviation (σ or s): is the square root of the variance
s2=

x
Often used as a measure of potential risk when there is uncertainty.
3. Coefficient of Variation (V): compares the standard deviation to the
mean.
 Used often by managers because the value is unaffected by the size or the
unit of measure (such as thousands of dollars vs. millions of dollars).
◦ For example: a manager is comparing two projects: one that costs
thousands of dollars and one that costs millions of dollars and
projecting profits for each. Looking at standard deviations and
comparing them doesn’t allow you to compare apples to apples. Need
a measure that isn’t affected by the measurement unit. Coefficient of
Variation is such a measure.
◦ V= σ/ μ
or
v 
s

x
◦ Numerator is a measure of risk; denominator is a central tendency
measure—average outcome.
◦ Hence, in capital budgeting it is used to compare “risk-reward” ratios
for different projects that differ widely in profitability or investment
requirements.
Measure of Goodness of Fit

R2 or “coefficient of determination”:
measures how much variation in the
dependent variable is explained by our
independent variables.

Higher numbers mean greater explanation
and that deviations from the equation will be
smaller

Coefficient of determination numbers are
bounded between 0 and 1
Variable Significance

t-statistics and p-values are commonly used to measure
significance (the influence of an independent variable on the
dependent variable)

Excel which provides both. However, “p-values” are more
commonly used so this is the measure we will use.

You define your research question: Is there a difference in
blood pressures between those in group A (receiving a drug)
and those in group B (receiving a sugar pill—no drug).

The null hypothesis is usually an hypothesis of "no difference"
◦ For example: no difference between blood pressures in group A
and group B.
◦ You then test this hypothesis with data including blood pressures
of member of group A and group B.

The “p- value” or sometimes called the
“calculated probability” is the estimated
probability of rejecting the null hypothesis
(H0) of a study question when that
hypothesis is true.
◦ The probability of saying there is a difference in
blood pressures (rejecting the null) when in fact
there is not (there are no differences in blood
pressure)
◦ Standard practice in the field defines “statistically
significant” if _______________ (smaller
number such as 0.01 means greater significance)
Regression Analysis (OLS)
Regression Analysis: uses data to describe how variables are related to one another.
 In markets, many variables change simultaneously and regression analysis accounts for
multiple changes
Example: Q=f( P, Psub, ADV, m, POP, time)

Where Q=sales of Brand Name icecream (dependent variable)
P=price of brand name ice cream
Psub= price of a substitute, competing, brand
ADV=adverstsing dollars
m=Income
POP=population
t=time (sales quarter, to show trends or seasonality)

The right-hand side variables are called “independent variables”

Using data gathered on all variables, regression analysis allows us to see the relative
importance of each independent variable (Price, income, etc) on the dependent variable,
sales or quantity.
Sample Data
YearQuarter
Unit Sales
(Q)
Price
($)
Advertising
Expenditures
($)
Competitors'
Price ($)
Income
($)
Population
Time
Variable
2003-1
193,334
6.39
15,827
6.92
33,337
4,116,250
1
2003-2
170,041
7.21
20,819
4.84
33,390
4,140,338
2
2003-3
247,709
5.75
14,062
5.28
33,599
4,218,965
3
2003-4
183,259
6.75
16,973
6.17
33,797
4,226,070
4
2004-1
282,118
6.36
18,815
6.36
33,879
4,278,912
5
2004-2
203,396
5.98
14,176
4.88
34,186
4,359,442
6
2004-3
167,447
6.64
17,030
5.22
35,691
4,363,494
7
2004-4
361,677
5.30
14,456
5.80
35,950
4,380,084
8
2003-1
401,805
6.08
27,183
4.99
34,983
9,184,926
1
2003-2
412,312
6.13
27,572
6.13
35,804
9,237,683
2
2003-3
321,972
7.24
34,367
5.82
35,898
9,254,182
3
2003-4
445,236
6.08
26,895
6.05
36,113
9,272,758
4
2004-1
479,713
6.40
30,539
5.37
36,252
9,300,401
5
2004-2
459,379
6.00
26,679
4.86
36,449
9,322,168
6
2004-3
444,040
5.96
26,607
5.29
37,327
9,323,331
7
2004-4
376,046
7.21
32,760
4.89
37,841
9,348,725
8
2003-1
255,203
6.55
19,880
6.97
34,870
5,294,645
1
2003-2
270,881
6.11
19,151
6.25
35,464
5,335,816
2
2003-3
330,271
5.62
15,743
6.03
35,972
5,386,134
3
2003-4
313,485
6.06
17,512
5.08
36,843
37,573
5,409,350
4
Excel: Summary Stats and
Regression Analysis

Show in excel how to create summary statistics
(mean, median, mode, range, etc)

Show in excel how to run the regression
◦
◦
◦
◦
Copy data into excel
Under Data Tab use “Data Analysis”
select regression from drop down list
select y range of data (dependent variable Q—select
only data not title)
◦ select x range of data (all independent variable data)
◦ click OK
◦ results pop into another window showing coefficients
for our variables
SUMMARY STATS (1ST 3 VARIABLES)
Column1
Column2
Column3
Mean
391917.3125 Mean
6.237292 Mean
29203.64583
Standard Error
25371.20712 Standard Error 0.091812 Standard Error
1869.317184
Median
356929 Median
6.12 Median
26643
Mode
#N/A
Mode
7.02 Mode
#N/A
Standard Deviation175776.8791 Standard Deviation
0.636091 Standard Deviation
12951.00935
Sample Variance 30897511243 Sample Variance0.404612 Sample Variance
167728643.2
Kurtosis
-0.424039741 Kurtosis
-0.8518 Kurtosis -0.38947906
Skewness
0.564081425 Skewness
0.082874 Skewness 0.83605057
Range
689728 Range
2.38 Range
46388
Minimum
75396 Minimum
5.03 Minimum
13896
Maximum
765124 Maximum
7.41 Maximum
60284
Sum
18812031 Sum
299.39 Sum
1401775
Count
48 Count
48 Count
48
REGRESSION OUTPUT
Regression Statistics
Multiple R
0.946559307
R Square
0.895974522
Adjusted R Square
0.880751281
Standard Error
60699.98879
Observations
48
Coefficients Standard Error
t Stat
647041.9264
154303.7628 4.193299726
-127436.167
15106.87319 -8.435641532
5.352343471
1.114830567 4.801037601
29339.75679
12388.80657 2.368247224
0.340280347
3.184070945 0.106869587
0.023965899
0.002349065 10.20231336
4407.716892
4401.822046 1.001339183
Intercept
X Variable 1
X Variable 2
X Variable 3
X Variable 4
X Variable 5
X Variable 6

P-value
0.000143148
1.68439E-10
2.12221E-05
0.022668872
0.915413667
8.12444E-13
0.322536268
Lower 95%
335419.159
-157945.1159
3.100897491
4320.054607
-6.090081309
0.019221865
-4481.942977
Upper 95%
Lower 95.0% Upper 95.0%
958664.6939
335419.159
958664.6939
-96927.21787 -157945.1159 -96927.21787
7.603789452
3.100897491
7.603789452
54359.45896
4320.054607
54359.45896
6.770642002 -6.090081309
6.770642002
0.028709932
0.019221865
0.028709932
13297.37676 -4481.942977
13297.37676
Regression equation (using coefficients above)
Q=647071 -127436P +5.35ADV +29339Pcomp + 0.3403m +0.02POP + 4407t
Statistically significant variables


This means changes in price have a statistically significant impact on
sales (same with competitors price and advertising)
◦ Note each coefficient is ∆Q/∆variable
◦ Example: if the firm increased price by $1.00 then estimated impact on
sales is: ____________________________________
◦ If asked for a $0.50 change it would be: _______________________

Income has no discernible effect in this model so predictions about
changes in income would result in zero impact on quantity.