X 2 - AUEB e

Download Report

Transcript X 2 - AUEB e

Athens University of Economics & Business
International Marketing Research
Discriminant Analysis and Logistic Regression
Profiling Customers: Factor Analysis
Dr Charalampos (Babis) Saridakis
Associate Professor of Marketing
University of Leeds
Leeds University Business School
[email protected]
Learning objectives
At the end of the discussion you should be able to:
1. Understand when and how to use Discriminant
Analysis
2. Explain the basics of Logistic Regression
3. Run and interpret Factor Analysis
Discriminant Analysis
Discriminant Vs
ANOVA and Regression Analysis
Scaling of
dependent
(criterion) variable
Scaling of independent
(predictors, explaining)
variables
ANOVA
Ratio/Interval:
income, turnover,
cost, agreement scale
Nominal/Categorical:
Gender, occupation, brand
Multiple
Regression
Ratio/Interval:
income, turnover,
cost, agreement scale
Ratio/Interval:
Age, income, turnover, cost,
agreement scale
Discriminant Nominal/Categorical:
Analysis
occupation, brand
Ratio/Interval:
Age, income, turnover, cost,
agreement scale
Discriminant Vs
ANOVA and Regression Analyses
Scaling of
dependent
(criterion) variable
ANOVA
Scaling of independent
(predictors, explaining)
variables
Ratio/Interval:
Nominal/Categorical:
income, turnover,
Gender, occupation, brand
cost, agreement scale
Multiple
Regression
Discriminant
Analysis
Nominal/Categorical:
occupation, brand
Ratio/Interval:
Age, income, turnover,
cost, agreement scale
If we compare ANOVA and DA we observe that the scaling is
reversed!
Subscriber-analysis
Modern majority magazine
* = Subscriber
# = Non-subscriber
*
*
** * * *
* * *
Age
# #
# # ## #
# # #
Income
Subscriber-analysis
Modern majority magazine
* = Subscriber
# = Non-subscriber
Average =
59 years
Age
Average =
36 years
*
*
** * * *
* * *
S
Discrimination line
# #
# # ## #
# # #
Income
N-S
Subscriber-analysis
Forbes magazine
N-S
Age
# #
# # ##
#
# #
#
Average =
$60.000
S
* = Subscriber
# = Non-subscriber
*
*
** * *
*
* * *
Income
Average =
$140.000
Small numerical example
A test is conducted at the end of a
focus group interview with
respondents in Britain and
Denmark, respectively.
In both countries, five
respondents are asked to express
their degree of consent to two
statements on a “12-point
magnitude scale”, stretching from
“1 = definitely disagree” to “12 =
definitely agree”. The statements
are:
· (X1) “The Royal Family of my
home country is doing a good
job” Abbreviated: ‘RoyalsOK’
(X2) “It is OK to drink up to
three beers a day” Abbr.:
‘BeerOK’
The country of origin of the
respondents (Y) is coded 1 to
indicate a British respondent and
2 to identify a Danish respondent.
Obs.
1
2
3
4
5
Country (Y)
RoyalsOK
(X1)
BeerOK (X2)
British
British
British
British
British
2
3
4
5
6
4
2
5
4
7
British Group Mean
4.0
4.4
Danish
Danish
Danish
Danish
Danish
7
8
9
10
11
6
4
7
6
9
Danish Group Mean
9.0
6.4
Grand Mean
Grand SD
6.5
3.028
5.4
2.011
6
7
8
9
10
Small numerical example
Agree
strongly
10
X2 [BeerOK]
10
8
5
6
8
6
9
Britons
Danes
3
4
1
2
4
7
2
X1 [RoyalsOK]
0
Disagree 0
strongly
2
4
6
Graphical illustration of responses
8
10
12
Agree strongly
Running DA in SPSS
Range of dependent
must be defined
Stepwise method
Basic statistics must be requested
Classification tables
for model fit
Running DA in SPSS (output)
- X1 significantly determines the level of Y
- The higher the value on X1, the bigger the tendency (probability)
that the level of Y is 2 (Danish) and not 1.
- X2 is not significant
Running DA in SPSS (output)
We also look at this table in the output. It
tells us that regarding level 2 of Y (Danes):
- X1 (RoyalsOK) is absolutely critical, while
- X2 (BeerOK) works counterproductive
In this small example
we are able to classify
all cases correctly.
There are no
misclassifications.
X2
Cut-off line
X1
Underlying rationale of discriminant analysis
X2
Britons
Danes
Cut-off line
(perpendicular to
discriminant axis)
(0.147)
(0.368)
X1
23,5o (0.147/0.368) = 0.3995;
Arcsinus/sin-1[0.3995] = 23.5o
Discriminant Axis
1.596
Z = 0.368X1 – 0.147X2; 0.368 (6.5) – 0.147(5.4) = 1.596
Plotting the discriminant function
X2
Misclassifications
Britons
Danes
X1
Note: Overlap of normal distribu
(shaded area) corresponds to th
probability of committing miscla
Discriminant analysis with misclassifications
Britons
Danes
Spaniards
Multiple discriminant analysis
Logistic regression
Logistic Regression: Example of data
Probability of
dying
Logistic Regression: functional form
Binomial
Odds of the
probability of dying
  

g (  )  log 
1  
Severity of infection
Log odds of the
probability of dying
Logistic Regression: functional form
Logistic Regression is easy to run,
but difficult to interpret
Since the estimation is based on a
logarithmic transformation of the
dependent measure, parameter
estimates cannot be interpreted as
in ordinary regression.
Instead, they may be interpreted as
‘tendencies’ or ‘elasticities’
Logistic Regression:
Interpretation of results
Logistic Regression Model Parameters
• Interpreting coefficient β.
odds ( x  1)
 e
odds ( x)

 ( x) 
 odds ( x) 

1


(
x
)


– A unit increase of the independent variable, the logit(p) increases by 0.142.
• Estimating eβ or else EXP(B) provides a more interpretable result of β.
• eβ refers to the ratio of odds and represents its increase for a unit
change of x.
• For the example given above, β = 0.142, and eβ = 1.15. This means for
a unit increase of x, the ratio of odds (die / survive) increases from 1 to
1.15 (representing a 15% increase).
Logistic regression Example Rizatriptan for Migraine
• Response - Complete Pain Relief at 2 hours (Yes/No)
• Predictor - Dose (mg): Placebo (0),2.5,5,10
Dose
0
2.5
5
10
# Patients
67
75
130
145
# Relieved
2
7
29
40
% Relieved
3.0
9.3
22.3
27.6
Logistic regression Example
- Rizatriptan for Migraine (SPSS)
t
d
B
p
.
a
ig
E
S
D
5
7
9
1
0
0
a
1
C
0
5
6
1
0
3
a
V
2.490 0.165x
e
 ( x) 
1  e  2.490 0.165x
^
Logistic Regression – Coupon
discount example
• Let assume we have a pizza delivery company and
we have given a 30% discount coupon to our
customers subscribed in our website.
• Lets say we took a sample of 100 customers, and we
want to predict (dependent variable) if a customer
will use the coupon based on 3 independent
decision variables:
– Family size (X1)
– Age of the customer (X2)
– If the customer has used in the past (X3)
26
Logistic Regression – Coupon
discount example
Variable
Coefficient
Standard
error
Z (critical
value 1.96)
Sig (p-value)
Constant
1.633
1.168
1.40
0.162
X1
-1.077
0.291
-3.69
0.000
X2
-0.013
0.019
-0.67
0.499
X3
2.031
0.611
3.32
0.001
•
•
logL (full model)
logL (constant only)
•
Index ρ2 (likelihood ratio index)
-35,536
-61,910
0,426
27
Logistic Regression – Coupon
discount example
28
Profiling customers–
Factor Analysis
What is FA?
• To test for clusters of variables or measures and identify
underlying dimensions, or factors, which explain the
correlations among a set of variables.
• Data reduction
– Creating a smaller set of new variables (factors), uncorrelated
with each other.
– This set of factors represents a larger set of variables that are
correlated with each other.
• A tool to get rid of multicollinearity problems that appear in
dependence models like regression and discriminant analysis.
• Factors can be used as input in other multivariate techniques,
without the problem of multicollinearity.
What is FA?
Factor Analysis assumes that relationships between variables
are due to the effects of underlying factors and that observed
correlations are the result of variables sharing common factors
Maths
Physics Computing Art
Drama
Maths
1.00
Physics
0.80
1.00
Computing 0.78
0.73
1.00
Art
0.12
0.14
0.15
1.00
Drama
0.04
0.21
0.13
0.68
1.00
English
0.24
0.15
0.07
0.91
0.79
English
1.00
What conclusions can be drawn from the hypothetical correlation matrix above?
What is FA?
• If several variables correlate highly with each other, they
might measure aspects of a common underlying dimension.
– These dimensions are called factors.
• Factors are classification axis along which the measuresvariables can be plotted.
– The greater the loading of variables on a factor, the more
that factor explains relationships between those variables.
Graphical representation of factors
Performance in
positive sciences
Maths
Physics
Computing
Drama
Art
English
Performance
in classical
sciences
An example of factor analysis
• A bank needs to know what is the triggering point which causes
many customers to change to another bank.
• A questioner is send to a small random sample of the customers
of the bank, and they are encouraged to participate the survey. A
credit was given.
• 114 responses, 15 variables (105 pairwise combinations)
Variables
Modern (x1)
Successful (x6)
Good advice (x7)
Stagnant (x8)
Kind (x9)
A bank for me (x14)
Selected bivariate correlations
Correlation with
Friendly (x2)
0,65
0,57
0,64
-0,51
0,72
0,62
Variable
Modern (x1)
Friendly (x2)
Successful (x6)
Good advice (x7)
Stagnant (x8)
Kind (x9)
Good housing advice (x12)
No good tax advice (x13)
A bank for me (x14)
Factor 1
0.76
0.82
0.77
0.77
-0.75
0.80
0.31
-0.12
0.76
Factor 2
0.16
0.16
0.20
0.42
-0.17
0.21
0.77
-0.87
0.24
- Loadings - Simple correlations of variable and ‘unknown’ factor
Factor analysis of the bank data
Variable
Factor 1
Modern (x1)
0.76
0.82
0.77
0.77
-0.75
0.80
0.31
-0.12
0.76
Friendly (x2)
Successful (x6)
Good advice (x7)
Stagnant (x8)
Kind (x9)
Good housing advice (x12)
No good tax advice (x13)
A bank for me (x14)
Factor 2
0.16
0.16
0.20
0.42
-0.17
0.21
0.77
-0.87
0.24
Customer orientation Financial advice
Factor analysis of the bank data
* Good housing advice (X12)
0,5
* Good advice (X7)
(0,76, 0,16)
Factor 1
0,5
-0,5
*
(-0,75 -0,17)
Stagnant (X8)
-0,5
No good tax advice (X13)*
Graphical illustration
Factor 2
* Bank for me (X14)
* Kind (X9)
* Successful (X6)
* * Friendly (X2)
Modern (X1)
* Good advice
Factor 1: Customer orientation
Negative
*
Stagnant
* Bank for me
* Kind
* Successful
* * Friendly
Modern
Positive
Factor 2: Financial advice
* Good housing advice
Positive
* Good advice
Negative
No good tax advice *
Positive
* Good housing advice
* Good advice
* Bank for me
* Kind
* Successful
* * Friendly
Modern
Dimension 1:
Customer orientation (56%)
Negative
*
Positive
Stagnant
Dimension 2:
Financial advice (11%)
No good tax advice *
Negative
Keywords in factor analysis
• Eigenvalue: Tells how much of the variance explained can
be attributed to each factor
• Factor loading: Correlation between a variable and a factor
(Values in F-matrix)
• Factor score: Estimated value (Z-matrix) Columns in Z are
uncorrelated!
• Communality: The variance of a variable accumulated across
factors
• Bartlett´s Test of Sphericity: Test of independence (H0: R=I)
• Kaiser Criterion: Extraction of Eigenvalues  1
• Catell’s Scree plot: Helps determine the best number of
factors to be extracted.
4
3
Kinked curve or
Elbow-structure
(Catell’s advice)
Size of
Eigenvalues
2
Kaiser
Criterion
1
0
F1
F2
F3
F4
F5
F6
In this hypothetical case, applying the visual recommendation of Catell would
Imply to extract 3 factors (up to just before the kink sets in).
However, choosing the number of factors according to the Kaiser Criterion
(extract factors with an EV exceeding unity, would qualify four factors).
Catell’s scree plot in SPSS
Example of FA in SPSS
• 36 persons rated their abilities on 12 activities:
– Baseball
– Crossword
– Soccer
– Scrabble
– Football
– Spelling
– ……..
SPSS Factor Analysis
• Select: Analyze >> Data (or Dimension) reduction
>> Factor
Add in the list the 12
variables
From the
“descriptives” menu,
tick
SPSS Factor Analysis
From the “Extraction”
menu, select
From the “Rotation”
menu, select
Interpretation of SPSS output
• Appropriateness of applying factor analysis
• Evaluation of the solution
• Interpretation of factor solution
Appropriateness of applying FA
• MSA values should be > 0.50
• If not, the respective variable
should be removed
Appropriateness of applying FA
KMO = 0.684 > 0.50
Η0: correlations among
variables are insignificant
We reject Η0, since
sig.< 0.05
Evaluation of the factor solution
• Kaiser’s Extraction:
– Kaiser (1960): retain factors with Eigen values > 1.
• Scree Plot:
– Cattell (1966): use ‘point of inflexion’ of the scree plot.
• Which Rule?
– Use Kaiser’s Extraction when:
• less than 30 variables, communalities after extraction
> 0.7.
• sample size > 250 and mean communality ≥ 0.6.
– Scree plot is good if sample size is > 200.
Evaluation of the factor solution
• Two factors with Eigenvalues > 1 have been generated
• These two factors explain 90.99% >> 50% of the dataset’s variance
Evaluation of the factor solution:
Communality
• Common Variance:
– Variance that a variable shares with other variables.
• Unique Variance:
– Variance that is unique to a particular variable.
• The proportion of common variance in a variable is
called the communality.
• Communality = 1, All variance shared.
• Communality = 0, No variance shared.
• 0 < Communality < 1 = Some variance shared.
Evaluation of the factor solution
This table shows the percentage of
variability of each observable
variable that has been explained by
the extracted components.
For example, the two extracted
components explain:
• 94.7% of ‘basketball’ variability,
• 88.4% of ‘crossword’ variability,
• 91.9% of ‘soccer’ variability, etc...
> 0.50
Interpretation of the factor solution
• This tables shows the rotated
solution and presents the final
“factor loadings” (i.e., simple
correlations between a
component and an original
variable)
• For example the correlation
between basketball and
component 1 is 0.97
• To identify which variables are
grouped under which
component, we simply search for
the higher loading of each
variable
Interpretation of the factor solution
To make it easier, we
can select “Suppress
absolute values less
than ,45” from the
‘options’ menu
Interpretation of the factor solution
• To name a component we simply
identify the variables that load highly on
the component, more specifically:
• The following variables have their
highest loadings on Factor 1: Baseball,
Soccer, Football, Basketball, Sailing,
Jogging.
• We may interpret factor 1 as
‘activities that require physical
exercise’
• The following variables have their
highest loadings on Factor 2: Crossword,
Scrabble, Spelling, Chess, Mind-games,
Quizzes.
• We may interpret factor 2 as ‘activities
that require mental exercise’
Interpretation of the factor solution
• Since the two extracted factors
passed all relevant tests and have
a meaningful interpretation, we
save them as new variables from
the “Scores” menu.
• We can see that in the data view
two new variables have now been
generated
• These two new variables can be
used instead of the 12 original
ones, without the danger of
multicollinearity.