Select this.

Download Report

Transcript Select this.

Analytical methods in
marketing research
L 12
Ing. Jiří Šnajdar
2013
Analytical methods in marketing research
Methods used for analysis of data obtained by
marketing research could be divides to :
• data analysis of secondary character,
• data analysis of primary character.
This segmentation is only working. In practise
appear some further stated processes
in both spheres.
Possibilities of analysing of secondary data
Among main directions of data analysis belong :
analysis of time sequences, frame by frame
analysis and combination of frame by frame and
time analysis.
• Analysis of time sequences
Evaluation of absolute increases
The first step at analysis of time sequences is
investigation of movement of absolute values of
given variable in time.
Methodically is for it used research of absolute
increases, sliding averages, relative values,
regressive analysis of time sequences.
Sliding totals, sliding averages
Other analytical view on time sequences offer
sliding averages, first stage of which
are sliding totals.
• sliding totals = moving sums of few adjoining
values (can be even or uneven)
• sliding averages = averages of sliding total in
period
Advantages of sliding total, averages is that
reduce influence of extreme values,
express better tendency.
Interesting analytical possibility is possible
combination with analysis of absolute
differences (we use sliding averages of first or
second differences).
Moving averages
Averages from last „n“ periods moving with last
period. Value of sliding average can be
set also in last period.
Exponentially weighed moving averages
(EWMA)
Are moving averages with possibility of giving an
other weight to historically more remote data than
data from last periods.
In statistical programmes these processes are
called exponential smoothing.
Evaluation of relative increases, values
Investigation of progress of time sequences is
appropriate to in context, so to apply extent of
change in absolute statement to total basis. It is
possible to use either the relative increases or
their adapted forms, chain indexes.
Relative increases are absolute differences
related to value of given period.
Chain indexes express rate of values of two
adjoining periods (relative increase + 1)
Constant and plus value of chain index means,
that progress runs in progressive
increase.
For evaluation of relative progress of time
sequence is used average rate of increase =
geometrical average of chain indexes, or n-1
radix from basic index.
Basic indexes = rate of value of last period of
given time sequence to value of first period.
Relative rates are space-less characteristics. Are
presented in basic or percentage (point) score.
The average is comparability among indicators
measured with different units.
- Regressive analysis (equalising) of time
sequences
Most often is with equalising of time sequences
meant usage of regressive analysis. With help of
method of least squares we try to afflict a curve,
that best expresses existing movement of
followed quantity in time.
The most often mentioned functions of time
sequences smoothing :
linear: Y = a + bt
quadratic, cubic : Y = a + bt + ct2
Y = a + bt + ct2 + dt3
hyperbolic : Y = a/t
exponential : Y = abt
Choice of concrete type should be justified by
exactness, with which follows progress of followed
values in time.
By following of time sequences have in marketing
research specific position functions of equipage –
for example Gompertz function and the most
famous logistic function (both functions belong
to sphere of so called limit functions).
k
Form of logistic function : Y = --------1 + be –at
Parameter “k” means top limit of saturation.
Critical is inflexion point - transition form
progressive to regressive increase.
Progress of logistic curve is commented from view
of possible phases of equipage process (from
phase of development of initial equipage over
phase of quicken increase, fast increase, slow
increase after phase of saturation).
With help of logistic function we try to apprehend
market situation from view of its saturation,
primary at so called things of long-term
consumption.
Mostly we cancel out no absolute value of number
of articles in household, but so called degree of
equipment (number of households, that own any
number of given PDS to number of all
households).
In addition to this evaluation of equipment comes
in useful so called extent of equipment – number
of all things of followed type of PDS to all
households.
In addition to degree and extent of equipment we
follow age of things, structure, recondition cycle,
or average price.
- Frame by frame analysis
We try to analyse structure of followed
phenomenon according to chosen points of view.
Frame by frame analysis can be based for
example on :
• features of consumers (see segmentation
criterions) :
- sphere of traditional criterions (demo-, geo-),
- also behaviour criterions (for example sphere of
expenses in statistic of family budget),
•
•
•
•
structure of distributors and their offers,
structure of product mix (own, competitors´),
structure of communication mix,
price structure.
At following of structure we try to catch also the
rate of tightness or conversely dispersion of
values according to followed variable. To this
serve characteristics of variability (dispersal,
standard deviation, variable coefficient).
Exist some areas of market research and
consumer behaviour, where are these methods
used (binding demand – incomes, price
flexibility,…).
One of specific method of generalising of binding
between incomes and demand are so called
Törnquist curves.
Three function bindings, each characterises
certain group of goods.
goods of indispensable character :
goods of dispensable character :
goods of luxury character :
- Combination of frame by frame and time
analysis
It is analysed progress of structure in time.
- One of specific view is so called cohort
analysis.
Cohort in demographic meaning are social
categories of individuals, by whom happened in
the same time (or time interval) the same event.
The most used are cohorts, where the same
event is birth.
Sense of cohorts´ usage at investigation of
consumer behaviour explains following
hypothetical example.
We have partly numerousness of people in age
groups 20-29, 30-39, 40-49 years. Further we
know their expenses for books. Following of
cohorts according to date of birth enables show
relations between numerousness, extent of
cohorts, groups (say segments defined by age,
because cohort represents de facto possible
market segment), their movement in time and
their purchase behaviour (towards books).
The most delicate proving of cohorts movement
is going of “baby boomers” (children of after war
boom of birth rate) generation through age
categories.
Correlation and regress analysis
The natural form of combination of time and frame
by frame analysis is use of correlation and regress
analysis for two or more phenomenon in time.
Content analysis
With its character on relative border between
analysis of secondary and primary data and
together between phase of collecting and
processing of data is method, assigned as
content analysis. It is objective and quantitative
analysis of any announcements.
Base of content analysis is :
• decision about type of investigated
announcements and media,
• decision about recorded elements :
- decision about entry units (content positions of
given problem)
- decision about contextual units (conditions of
occurrence of entry units – at modification of
content analysis for marketing it is characteristics
of advertising product, type, category, mark and
media characteristics).
- decision about categories (possible forms of
given unit).
• registering of occurrence of elements into
database
• own analysis of database.
Positions and analysis of primary data
Before it is possible to analyse obtained data,
must be transferred into appropriate form. It
means to edit individual records, forms
(questionnaires), sort codes, tabulate and enter
data into database.
- Editing
Purpose of editing is to examine completeness,
legibility, answers and their consistence,
• continuous editing in terrain – own data
collection of interviewer,
• central : at taking of forms, questionnaires in
research agency.
- Coding
• determination of categories, classes, groups (in
case of processing of open questions),
• sorting of codes (best numerous) to classes of
answers
- Tabulating and entering
Forming of database structure. Entering is
transmitting of individual data in (computer)
database of research.
At creation of database structure it is necessary to
decide about width of intervals for followed signs,
about number of entries (columns) for individual
questions in database, about categories to enter.
* Basic directions of analysis
After data entering and their control we can do
own analysis of research results :
• summarisation,
• following of differences,
• following of dependence.
* Summarisation
also designated as analysis of first grade, general
evaluation of individual questions, recorded items
etc.
Are used these main positions :
frequencies, rates (most often in %)
• Basic central moments :
- modus = the most frequent category,
- median = value that reaches the middle by
ascending order of categories (usable for
scale from ordinals below),
- average : usable only for interval and ratio
scales (sometimes is defined also by
ordinals – with more problematic
interpretation).
• Measurement of variability : variation span,
dispersion, standard deviation, variation
coefficient (if used scales allow it).
- Types of scales
• Nominal (also categorical) scales : category of
objects (man, woman) – nominal scales work
only on modus level.
• Ordinals – ordinal : we are able to determine
order of signs (for example : very fast, fast,
slow, very slow).
• Intervals : contrary to ordinal scales are known
rates of distance between intervals, but does
not exist natural zero – only arbitrary.
• Ratio : the most ideal from view of possible
quantification (age, weight,…)
Nominal and ordinal scales are sometimes
assigned as non-metric (non-metric data),
interval and ratio as metric scales (metric data).
Practical usage of scales is in basic lines, from
these develop concrete modifications.
In marketing is for example successfully used
series of scales in form of so called semantic
differential. It proceeds from Osgood knowledge
that each notion from view of its features is
possible to characterise with different intensity,
typical for this motion.
* Following of difference between primary
data
Analytical processes in scope of difference
analysis and relation analysis can be partially
classified according to few criterions, fulfilment of
which leads to possibility of use of certain
technique.
These criterions are :
• number of samples (selections)
• independence of selections, samples on each
other (yes, no)
• number of variables :
uni-variant techniques
multi-variant techniques
• assumptions of technique, tests :
types of separation
knowledge of dispersion in basic complex
Primary data are based on selective relations.
To this purpose are used two groups of tests : nonparametrical tests and parametrical tests
(non-parametrical tests are less demanding on
assumptions).
- Non-parametrical tests
Chí-square (χ2) – test of good coincidence.
Purpose : differs observed frequency enough
(statistical) relevantly from others values
(characterising conditions of basic complex) ?
McNemar test : for two dependent samples (pre-test,
post-test)
It is a modified test of good coincidence.
Example : We have to evaluate effectiveness of
advertisement campaign for chocolate Milka Nestlé.
To disposal we have records about chocolate
purchase before campaign and after obtained by
interviewing.
Koglomorov-Smirnov test : For ordinal data in
questionnaires.
Used among others for evaluation of preferences,
utilities etc.
Example : Producer of bicycles interests whether
exist bigger preferences for darker shades.
Following data were obtained from sample 100
persons : black preferred 35, dark 25, neutral 20,
light 10 and very light also 10.
- Parametric tests
• group of tests used primarily in cases of interval
and ratio scales.
• based on assumption, that data have normal
classification.
* Measurement of association – following of
connections
- Cross tables
Basic entering tolls of investigation of connections
between primary data are cross tables.
* Contingent cross tables
Phenomenon having more than two situation
alternatives. All rates with help of which we assume
current occurrence of two phenomenon, proceed
from comparison of observing and expecting (in
means of regular spreading) of state. Rate of mutual
occurrence we can compare on different levels.
Possible is verbal interpretation of data, contingent
spread in table.
* Measurement of association – following of
connections
- Cross tables
Basic entering tolls of investigation of connections
between primary data are cross tables.
* Contingent cross tables
Phenomenon having more than two situation
alternatives. All rates with help of which we assume
current occurrence of two phenomenon, proceed
from comparison of observing and expecting (in
means of regular spreading) of state. Rate of mutual
occurrence we can compare on different levels.
Possible is verbal interpretation of data, contingent
spread in table.
- Dispersion analysis
The process follows connection between
phenomenon on the basis of relation between innergroup, between-group and total dispersion. On basis
of dispersion analysis is followed F-statistic :
explained dispersion (between-group)
F = ------------------------------------------------------non-explained (inner-group) dispersion
- Multi-variant analysis
Are used for simultaneous following of connection of
more variables.
Classification of multi-variant techniques :
• Follow dependence on other variables. For
example :
- for interval and ratio scales : multiple regressive
and correlative analysis
- for nominal and ordinal scales : discrimination
analysis, regression analysis with binary
variables
• Is not clearly defined side of dependent variables
and independent variables. For example :
factor analysis, cluster analysis, conjoint analysis,
multi-dimensional scaling (if proceeds for example
from factor analysis or probability schemas)
* Nature of chosen multi-variant techniques
- Usage of binary variables in multiple regressive
analysis (for nominal scales) :
Y = a + b1x1 + b2x2 + … + bkxk
Basic thought : substitution of qualitative variables by
binary. Number of variables is about 1 smaller than
number of degrees of scale. If we follow for example
influence of sex, independent variables are :
x1 … man (x1 = 0)
x2… woman (x2= 1)
- Factor analysis
Sense of factor analysis is :
• to find deeper assigned “coefficients”, wider than
individual criterions of influencing phenomenon,
• to lower number of variables, criterions (similar
influencing and working criterions, entering similar
the same factor),
Factor is understood as variable that is not directly
noted. It is the “coefficient” which
is necessary to uncover.
Example : we follow what advantages are connected
with products for dish washing (we follow binding
between utilities – variables).
5 followed criterions (variables) of expected values :
k1 – price, k2 – effectiveness, k3 – shine,
k4 – aroma, k5 – colour
Correlation matrix : enter into use of factor analysis –
we follow what correlation (correlation coefficients)
achieve all pairs of variables at respondents´
answers.
Note : if in matrix are low values only, the factor
analysis has no practical sense.
From view of given criterions at evaluation of
products for dish washing oscillated answers of
respondents in two levels, factors. Similar testified
criterions of price, effectiveness and shine (rationality
factor). Criterions of aroma and colour create mainly
second factor.
Some important characteristics :
• burden factors : correlation between factor and
variable
• factor´s score : result of each respondent at each
factor
• communality : share of dispersion of variable
• explained dispersion : how much from total
dispersion of all variables explain given factor
- Cluster analysis
The sense of cluster analysis is finding of clusters,
objects according to simultaneously used variables.
Assumption are at least ordinal scales or conversion
on binomial variables (yes/no … marked/not marked)
The basic idea of cluster analysis is usage of
distance between objects in individual criterions.
Own process of objects clustering can use different
methods of clustering, their basic lines are :
hierarchical clustering (from upper – from most
distant to closest or from under – from closest to
most distant), K – averages, FQ analysis (“factoring”
are not variables, but objects).
Designation of numbers of clusters for purposes of
uncovering of market segments is
conditioned by marketing strategic assignment : what
marked homogeneity inside of segment we require,
how open, low homogeneity inside segment we
allow.
- Discrimination analysis
Is used in case of simultaneous operation of more
variables of nominal or ordinal character to search
that what differentiates different groups of
consumers.
Dependent variable is membership in a group.
Sense, purpose of discrimination analysis is :
to find out total effect on differentiating of
membership, to find out which variables influence
most.
Technically it is usage of regress relation for nominal
and ordinal scales.
Example for two variables :
Y = v1X1 + v2X2
Y … frequency of magazine
X1 … character of residence
X2 … education
Y = 0,05 X1 + 0,1 X2
Education has two times bigger discrimination
weight for categorisation among readers of given
magazine than residence.
Discrimination analysis is also used to expand profile
of market segments.
- Conjoint analysis
Conjoint analysis investigates preferences against
certain combination of characters. Entering data are
based on how respondents evaluate different
combinations of product characters.
Methodical process is based on iterations. It is used
mainly for determination of suitable characters of
product.
- Multidimensional scaling
Purpose of multidimensional scaling is to find out
how is the object perceived in multidimensional
space.
The base of process are two phases : to determine
dimensions of space, to place followed objects in
given dimensions.
Methods of multidimensional scales are based on
perception :
• according to characters :
- usage of factor analysis
- usage of discrimination analysis
• not according to characters :
- on base of perception of similarity
- on base of preferences.
Usage of methods of multidimensional scaling
concerns in consumable marketing for example the
knowledge of marks image, ideal objects, market
segmentation.
Ways of data mining
- Data mining – introducing and meaning
In last years comes to fast expansion of information
technologies. Efficiency of computers growths
permanently and possibilities of systems are
expanded.
Large expansion of information systems enables to
collect, process and keep enormous number of data.
Companies and institutions of different branches
keep in these systems data of different aspects of
their activities – production companies record data
from stock items, over information of production
character, to data about sales, customers,
accountant data, different health organisation collect
wide data about health condition of population,
financial institution keep detail data about their
clients, business events, etc.
Contrary development of information technologies
offers large number of new ways how to analyse
data. Current computers enable fast running of
demanding algorithm for data analysis and
presentation of results of these analysis.
Information are at present the most valuable
business commodity. One of ways for effective usage
of these systems and obtaining of important,
valuable, interesting and new information is data
mining.
Exist many definitions of data mining, according to
author. Very accurate is for example definition :
“Data mining is analysis of (often large)
observation data with aim to find out
unsuspected relations and summarise data in
new ways so, that are understandable and useful
for their owners.”
This definition contains few basic parts, that
demonstrate meaning and biggest advantages of
data mining.
Definition talks about “analysis of large data”.
The task of data mining is to find interesting
dependence hid in these large data (for example
what types of clients have more problems with
payment than other types of clients etc.) and so
provide look into data bringing maximal number of
useful information.
In data mining it is possible to use methods that
enable assigning of potentially interesting
hypothesis in general, as sphere of potentially
interesting hypothesis. With help of special algorithm
the system for data mining itself defines and
formulates all hypothesis from given circle of
hypothesis and automatically proves in data their
validity and in output displays list of hypothesis valid
(hidden) in data and renews it on statistical
characteristics.
In relatively short time it is possible to define and test
automatically hundreds and thousands of hypothesis
and to find “unsuspected”, surprising dependence
hidden in data.
Other essential part of above mentioned definition is
“…are useful and understandable for their
owners.”
The task of data mining is not only to find information
in large data, but also to provide the found
information in form, understandable and useful for
owner of data. Results of data analysis is necessary
to interpret properly, that the founded conclusions
would be properly understood and could be used
correctly by the person who ordered the analysis.
* Methodology CRISP-DM
Data mining is not only method of data analysis, but
it is a process, that consists of many partial phases.
At solution of data mining project it is necessary to
make complex of tasks and operations.
The idea to create methodology CRISP-DM arisen in
the year 1996 as result of cooperation among people
from firms Daimler-Benz, SPSS and NCR.
In the year 1997 was founded consortium CRISPDM, financed and supported by European
committee. The name CRISP-DM is an acronym,
originated from words Cross Industry Standard
Process of Data Mining.
Its target is simulating of process of knowledge
extracting from databases as standard universal
process, independent on industry branch, in that are
data of mining analysis performed.
CRISP-DM simulates life cycle of each data of
mining project in 6 phases :
Business understanding, Data understanding
Data preparation, Modelling, Evaluation, Deployment
Purpose of business understanding phase is to
define a problem, that will be solved with help of data
mining, to determine targets and tasks, that should
the project fulfil.
Data understanding phase – its task is to collect
enter data for performing of data analysis, leading
to fulfilment of targets determined in preliminary
phase of project.
In phase of data preparation are all enter data
transformed into form necessary for performing of
own modelling. This phase of data mining project is
usually the most time demanding.
Following phases of data mining project – own
modelling – selection of suitable methods and
algorithms for knowledge obtaining and model
creating wilt help of these algorithms.
Results obtained it the modelling phase is
necessary to interpret – phase of model
evaluation. It is necessary to assess whether the
found results are correct and contribute to
fulfilment of targets, defined in introductory phase
of the whole project.
Only the proved models obtained in previous
phases are possible to use in practice, in the
deployment phase.
• Data mining in market researches
Data mining is used for data analysis from any
sphere, if there is to disposal sufficient amount of
data.
At market research especially at quantitative
researches is done large data collection.
Researches are often done in periodically
repeated waves, large number of respondents is
interviewed.
The advantage of data obtained in typical
quantitative researches is own form and format of
obtained data.
At data mining is maximal relief of work in phase
of data pre-processing, it is not necessary to
transform data from different time periods to
mutual format. All collected data are in the same
format and therefore maximal comparable.
Data obtained in quantitative market researches
are appropriate for usage of data mining methods
if there is to disposal sufficient amount of data,
where is possible to look for hidden information,
potentially interesting and useful.
Data mining is today very modern and appears in
offers of institutions, dealing with data analysis
and institutions for market research.