Transcript Document

CERAM
February-March-April 2008
Quantitative Methods
For Social Sciences
Lionel Nesta
Observatoire Français des Conjonctures Economiques
[email protected]
Objective of The Course
 The objective of the class is to provide students with a set
of techniques to analyze quantitative data. It concerns the
application of quantitative and statistical approaches as
developed in the social sciences, for future decision
makers, policy markers, stake holders, managers, etc.
 All courses are computer-based classes using the SPSS
statistical package. The objective is to reach levels of
competence which provide the student with skills to both
read and understand the work of others and to carry out
one's own research.
 Class Password: stmarec123
Examples
 Rise in biotechnology
 Should the EU fund fundamental research in biotechnology?
 Has biotechnology increased the productivity of firm-level R&D?
 Did it increase the speed of discovery in pharmaceutical R&D?
 Increasing university-industry collaborations
 Does it facilitate innovation by firms?
 Does it increase the production of new knowledge by academics?
 Does it modify the fundamental/applied nature of research?
Examples
 Economic (productivity) Growth
 Does it come mainly from new firms or improving existing firms?
 Is market selection operating correctly?
 Why do good firms exit the market?
 How does the organisation of knowledge impact on performance?
 How do knowledge stock and specialisation impact on productivity?
 How do firms enter into new technological fields?
 Do firms diversify in new technologies/businesses purposively?
Structure of the Class
 Class 1 : Descriptive Statistics
 Class 2 : Statistical Inference
 Class 3 : Relationship Between Variables
 Class 4 : Ordinary Least Squares (OLS)
 Class 5 : Extension to OLS
 Class 6 : Qualitative Dependent variables
Structure of the Class
 Class 1 : Descriptive Statistics
 Mean, variance, standard deviation
 Data management
 Class 2 : Statistical Inference
 Class 3 : Relationship Between Variables
 Class 4 : Ordinary Least Squares (OLS)
 Class 5 : Extension to OLS
 Class 6 : Qualitative Dependent variables
Structure of the Class
 Class 1 : Descriptive Statistics
 Class 2 : Statistical Inference
 Distributions
 Comparison of means
 Class 3 : Relationship Between Variables
 Class 4 : Ordinary Least Squares (OLS)
 Class 5 : Extension to OLS
 Class 6 : Qualitative Dependent variables
Structure of the Class
 Class 1 : Descriptive Statistics
 Class 2 : Statistical Inference
 Class 3 : Relationship Between Variables
 ANOVA, Chi-Square
 Correlation
 Class 4 : Ordinary Least Squares (OLS)
 Class 5 : Extension to OLS
 Class 6 : Qualitative Dependent variables
Structure of the Class
 Class 1 : Descriptive Statistics
 Class 2 : Statistical Inference
 Class 3 : Relationship Between Variables
 Class 4 : Ordinary Least Squares (OLS)
 Correlation coefficient, simple regression
 Multiple regression
 Class 5 : Extension to OLS
 Class 6 : Qualitative Dependent variables
Structure of the Class
 Class 1 : Descriptive Statistics
 Class 2 : Statistical Inference
 Class 3 : Relationship Between Variables
 Class 4 : Ordinary Least Squares (OLS)
 Class 5 : Extension to OLS
 Regressions diagnostics
 Qualitative explanatory variables
 Class 6 : Qualitative Dependent variables
Structure of the Class
 Class 1 : Descriptive Statistics
 Class 2 : Statistical Inference
 Class 3 : Relationship Between Variables
 Class 4 : Ordinary Least Squares (OLS)
 Class 5 : Extension to OLS
 Class 6 : Qualitative Dependent variables
 Linear probability model
 Maximum likelihood (logit, probit)
Class 1
Descriptive Statistics
Types of Data
Descriptive statistics is the branch of statistics which gathers all
techniques used to describe and summarize quantitative and
qualitative data.
Quantitative data
 Continuous
 Measured on a scale (value its the range)
 The size of the number reflect the amount of the variable
 Age; wage, sales; height, weight; GDP
Qualitative data
 Discrete, categorical
 The number reflect the category of the variable
 Type of work; gender; nationality
Descriptive Statistics
All means are good to summarize data in a synthetic way: graphs;
charts; tables.
Quantitative data
 Graphs: scatter plots; line plots; histograms
 Central tendency
 Dispersion
Qualitative data
 Graphs: pie graphs; histograms
 Tables, frequency, percentage, cumulative percentage
 Cross tables
Central Tendency and Dispersion
 A distribution is an ordered set of numbers showing how many
times each occurred, from the lowest to the highest number or the
reverse
 Central tendency: measures of the degree to which scores are
clustered around the mean of a distribution
 Dispersion: measures the fluctuations around the characteristics of
central tendency
 In other words, the characteristics of central tendency produce
stylized facts, when the characteristics of dispersion look at the
representativeness of a given stylized fact.
Central Tendency
 The mode
 The most frequent score in distribution is
called the mode.
 The median
 The middle value of all observed values, when
50% of observed value are higher and 50% of
observed value are lower than the median
 The mean
 The sum of all of the values divided by the
number of value
1 i n
X   xi
N i 1
The mode, the mean and the median ore equal if and only of the distribution is symmetrical and unimodal.
Dispersion
 The range
 Difference between the maximum and
minimum values
R  xmax  xmin
 The variance
 Average of the squared differences between
data points and the mean (average)
quadratic deviation
in
2 
x
i 1
i
X

2
N
 The standard deviation
 Square root of variance, therefore measures
the spread of data about the mean,
measured in the same units as the data
i n
  
2
 x
i 1
i
X
N

2
Dispersion
 The range
 Difference between the maximum and
minimum values
R  xmax  xmin
 The variance
 Average of the squared differences between
data points and the mean (average)
quadratic deviation
in
2 
x
i 1
i
X

2
N
 The standard deviation
 Square root of variance, therefore measures
the spread of data about the mean,
measured in the same units as the data
i n
  
2
 x
i 1
i
X
N

2
Research Productivity in the
Bio-pharmaceutical Industry
EU Framework Programme 7
Stylised Facts about Modern Biotech
1.
2.
Innovations emerge from uncertain, complex processes
involving knowledge and markets: Roles of networks.
Economic value is created in many ways – globally and
in geographical agglomerations
3.
Various linkages exist among diverse actors (LDFs,
DBFs, Univ, Venture Capital) in innovation processes,
but the firm plays a particularly important role.
4.
Regulations, social structures and institutions affect ongoing innovation processes as well as their impacts on
society: Importance of IPR.
SPSS
Statistical Package for the Social Sciences
The SPSS software
 Statistical Package for the Social Sciences (1968)
 Among the most widely used programs for statistical analysis
in social sciences.
 Market researchers, health researchers, survey companies,
government, education researchers, and others.
 Data management (case selection, file reshaping, creating
derived data)
 Features of SPSS are accessible via pull-down menus
 The pull-down menu interface generates command syntax.
SPSS : Opening SPSS
SPSS : Importing data
SPSS : Importing data
SPSS : Importing data
 Settings in the “import text” dialogue box
 No predefine format (1)
 Delimited (2)
 First lines contains the variable names (2)
 One observation per line // all observations (3)
 Tab delimited only (4)
 Finish (6)
SPSS windows
 SPSS has opens automatically windows
 The datasheet window
 Observe, manage, modify, create, data
 The results window
 Everything you do will be stored there
 The syntax window can be opened
SPSS : Data sheet (1)
SPSS : Data sheet (2)
SPSS : Result / Journal
SPSS : Saving data
SPSS : working, at last!
Recoding Variables
 Changing existing values to new values (biotechnologie → DBF,
pharmaceutique → LDF)
1
2
3
Computing New Variables
 Taking logarithm (normalization of continuous variables)
1
2
Creating Dummy Variables
 Taking logarithm (normalization of continuous variables)
1
2
3
Computation of Descriptive Statistics
1
3
2
Descriptive Statistics
Statistiques descriptives
N
patent
assets
rd
spe
pharma
biotech
N valide (listwise)
457
457
457
457
457
457
457
Intervalle
286
35788473.97
1917997.980
2.0235309
1
1
Minimum
0
4422.18
858.53204
-1.1298400
0
0
Maximum
286
35792896.15
1918856.512
.8936909
1
1
Moyenne
11.92
4358371.54
330236.630
-.056808610
.63
.37
Ecart type
22.901
6086530.85
405160.516
.3374751802
.482
.482
Variance
524.470
3.705E+013
164155043889
.114
.232
.232
Splitting Database
1
2
Descriptive Statistics (by type)
Statistiques descriptives
type
DBF
LDF
N
patent
assets
rd
spe
pharma
biotech
N valide (listwise)
patent
assets
rd
spe
pharma
biotech
N valide (listwise)
167
167
167
167
167
167
167
290
290
290
290
290
290
290
Intervalle
202
2442619
495443.5
1.7544527
0
0
Minimum
0
4422.18
858.53204
-1.12984
0
1
Maximum
202
2447041
496302.1
.6246127
0
1
Moyenne
12.11
342934.49
58116.590
-.10630582
.00
1.00
Ecart type
21.066
478511.938
88638.5347
.343286812
.000
.000
Variance
443.764
2E+011
8E+009
.118
.000
.000
286
4E+007
1912600
1.6904465
0
0
0
218006.47
6256.248
-.7967556
1
0
286
4E+007
1918857
.8936909
1
0
11.81
6670709.4
486940.24
-.02830504
1.00
.00
23.929
6605972.68
432514.940
.331330781
.000
.000
572.609
4E+013
2E+011
.110
.000
.000
Assignments
 Compute logarithm for all quantitative variables patent, assets,
rd, and name them lnpatent, lnassets and lnrd, respectively.
 Compute descriptive statistics for both LDFs and DBFs.
 Draw conclusion by comparing means.
Logarithm
 Normalization
 Taking the logarithm is a transformation which usually normalize
distribution.
 Elasticities http://en.wikipedia.org/wiki/Elasticity_(economics)
 A change in log of x is a relative change of x itself.
 Cobb-Douglas production function
  log x 
x

1
x
   log x  
x
x