Lecture 2 - 13-10-2015 - UTH e

Download Report

Transcript Lecture 2 - 13-10-2015 - UTH e

UNIVERSITY OF THESSALY
FACULTY OF ENGINEERING
DEPARTMENT OF PLANNINGAND REGIONAL DEVELOPMENT
MASTER «EUROPEAN REGIONAL DEVELOPMENT STUDIES»
METHODS OF SPATIAL ECONOMIC ANALYSIS
LECTURE 02
Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια,
[email protected]
Τηλ. 24210-74438
Γραφείο Γ.6
1
METHODS OF SPATIAL ECONOMIC ANALYSIS: STATISTICAL
TREATMENT OF SPATIAL DATA, A FIRST APPROACH
1. Exploratory statistical analysis of
Regional data.
2. Familiarization with Eurostat
regional data
3. Familiarization with Statistical
Treatment through SPSS
Main terms used in Statistics
Population
Sample
Complete set
of data
elements
Ex. Census,
Register
Portion of
selected
elements from
a reference’s
population
Parameter
Statistic
Measured
characteristic
of the whole
Population
Estimated
characteristic
of the sample
OBJECTIVE OF THE LECTURE
Objective of the Lecture
 Categorical data:
 Non ordinal:
Data Visualization
Gross Domestic Expenditure on R&D (% of GDP)
family status,
employment status, etc (no
measurement meaning).
 Ordinal: rating-score variable
(Likert-scale).
In
this
case
measurement has meaning.
 Numeric data: they have a clear
meaning as measurement
 Discrete data
 Continuous data
Most of the data used in
Regional analysis are numeric,
allowing a cartographic
visualization.
Source: Eurostat,
EU 2020 indicators
See
LECTURE_02_DATA.xls
STATISTICAL TREATMENT
Types of Data
Initially, the likert scale is a psychometric
scale measuring the level of agreement or
disagreement.
Representation of Likert scale
Typical five-level psychometric scale
This scale has a more general use and
allows to evaluate characteristics according
to objective or subjective criteria.
Five level scale for regions’ classification
Most common used scales are the five,
seven , nine and sometimes eleven levels.
Likert, R. (1932), "A Technique for the Measurement of
Attitudes". Archives of Psychology 140: 1–55.
STATISTICAL TREATMENT
A specific case of ordinal data:
The Likert items
See
LECTURE_02_DATA.xls
Two Sheets
1.
2.
Analytical Data
Data_SPSS
The second sheet has
the appropriate
format in order to
open the data with
SPSS, i.e.:
The 1st Row contains
the variables’ names
The following 28 rows
concern the 28
countries without
EU28
Each column concern
one variable
Source: Eurostat,
EU 2020 indicators
DATA FOR ANALYSIS
Presentation of Data for statistical Treatment
 Arithmetic Mean:
Sum of all elements of the data set
divided by the number of elements.
n
X
X
i 1
i
Central parameters for total
R&D expenditures (% of GDP)
The two variables examined are:
RD_TOT04 and RD_TOT12, i.e. the total
R&D expenditures as % of GDP, in 2004
and 2012.
Statistical Analysis with Excel
n
 Weighted Mean:
E.U. 28
Sum of the weighted scores
n
X   wi . X i
i 1
n
w
i 1
i
Arithmetic Mean
=AVERAGE(.. , ..)
Weigted Mean
1
 Geometric Mean:
The nth root of the product of data
elements
Geometric Mean
=GEOMEAN(.. , ..)
2004
2012
1,82
2,07
1,34
1,67
1,56
1,83
1,08
1,41
Conclusions:
______________________________________
______________________________________
______________________________________
X n
n
X
i 1
i
______________________________________
______________________________________
STATISTICAL TREATMENT
Central Parameters [01]
Examples
Statistical Analysis with Excel
 Mode:
The observed data that occurs most
frequently.
Most frequent value of the variable.
Mode is not necessarily a single value
 Median:
The value of the variable (arranged in
order magnitude), below which 50%
of the elements fall (50% of elements
have a value lower than the Median).
Median = Arithmetic Mean
when the distribution follows
the Laplace-Gauss distribution
(Normal distribution).
2004
2012
Mode
=MODE(.. , ..)
0,51
2,98
Median
=MEDIAN(.. , ..)
1,08
1,48
In 2012, if mean = 1,67%
of GDP, median is quite
smaller!
Be careful,
The “MODE”
command gives us
the highest value
when mode is not a
single value.
In 2012, mode is
effectively not a
single value.
Mode has a very
limited interest
Country
Cyprus
Romania
Bulgaria
Latvia
Greece
Croatia
Slovakia
Malta
Lithuania
Poland
Italy
Spain
Hungary
Luxembourg
Portugal
Ireland
United Kingdom
Czech Republic
Netherlands
Estonia
Belgium
France
Slovenia
Austria
Denmark
Germany
Sweden
Finland
RD_TOT12
0,46
0,49
0,64
0,66
0,69
0,75
0,82
0,84
0,90
0,90
1,27
1,30
1,30
1,46
1,50
1,72
1,72
1,88
2,16
2,18
2,24
2,29
2,80
2,84
2,98
2,98
3,41
3,55
STATISTICAL TREATMENT
Central Parameters [02]
Examples
 Range:
Difference between the highest and
the lowest data element.
Range  X max  X min
 Dispersion Ratio:
Quotient between the highest and
the lowest data element.
DR 
X max
X min
 Percentile (p%):
Statistical Analysis with Excel
2004
2012
Minimum
=MIN(.. , ..)
0,37
0,46
Maximum
=MAX(.. , ..)
3,58
3,55
Range
=MAX - MIN
3,21
3,09
DR RATIO
= MAX / MIN
9,68
7,72
Conclusions:
______________________________________
The value of the variable of the
variable below which p% of the
elements falls.
______________________________________
For dispersion analysis, the 5% and 95% are
very useful.
______________________________________
______________________________________
______________________________________
STATISTICAL TREATMENT
Measures of dispersion [01]
Examples
 Variance:
The square average distance of each
score from the mean.
n
 2  V[X ] 
(X
i 1
i
 X )2
n
 Weighted Variance:
Statistical Analysis with Excel
2004
2012
Arithmetic Mean
=AVERAGE(.. , ..)
1,34
1,67
Variance
=VAR(.. , ..)
0,809
0,880
Standard Deviation
=STDEV(.. , ..)
0,900
0,938
CV coefficient
=STDEV / AVERAGE
1,11
1,07
111%
107%
The square average weighted
distance of each score from the
mean.
n
  V [ X ]   wi ( X i  X )
2
n
2
i 1
 Standard deviation:
σ = square root of variance
 Coefficient of Variation (CV):
CV 

X
w
i 1
i
1
Conclusions:
______________________________________
______________________________________
______________________________________
______________________________________
______________________________________
STATISTICAL TREATMENT
Measures of dispersion [02]
 Weighted
Coefficient of Variation
wCV:
n
wCV 
 w .( X
i 1
i
i
 X )2
X
Examples
Statistical Analysis with Excel
2004
Arithmetic Mean
=AVERAGE(.. , ..)
1,34
Variance
=VAR(.. , ..)
0,809
Standard Deviation =STDEV(.. , ..)
0,900
CV coefficient
=STDEV / AVERAGE
1,11
2012
1,67
0,880
0,938
1,07
111%
107%
2004
1,32
2012
1,57
0,662
0,680
0,813
0,607
61%
0,825
0,494
49%
Arithmetic Mean
Weighted variance
With spatial units, wi is generally the
population weight of the spatial unit i,
in the total area under examination.
wi 
Popi
Pop
Considering the 28 EU countries,
Popi = population of the country
Pop. = EU population
=AVERAGE(.. , ..)
See calculation on
columns J & K
Weighted St.
Deviation
wCV = wSTDEV / AVERAGE
Conclusions:
______________________________________
______________________________________
______________________________________
______________________________________
______________________________________
STATISTICAL TREATMENT
Measures of dispersion [03]
 Perfectly symmetric distribution of
the random variable around the
mean value.
 Mean = Median = Mode.
 Standard Normal Distribution:
If X  N(μ, σ2) Normal distribution
Consequently, the standardized
variable Z  N(0, 1)
where: Z  X  

Representation [01]
P(X <μ) = 0,5 (50%)
STATISTICAL TREATMENT
Normal / Gaussian Distribution
 The distribution shape of a Normal
variable depends on the specific
values of its two parameters: mean
and variance.
High value of variance  flattened curve
(see blue curve): there is no
concentration of values around the mean.
Small value of variance  high
concentration around the mean value,
low degree of variability (see red curve).
Representation [02]
STATISTICAL TREATMENT
Normal / Gaussian Distribution
 Confidence interval:
It gives an estimated range of values
which is likely to include an unknown
population parameter, the estimated
range being calculated from a given
set of sample data.
(1  a)C.I .  X  za .
s
n
Confidence Level
 Confidence level:
The
probability
value
(1-α)
associated with a confidence
interval.
If a = 5%, the confidence level is (10,05) = 0,95 i.e. a 95% confidence
level.
Statistical Analysis with Excel
 Confidence limits:
The lower and upper boundaries of a
confidence interval, that is, the values
which define the range of a
confidence interval.
Confidence interval is very informative
because its width gives us some idea
about how uncertain we are about the
unknown parameter.
2004
2012
Arithmetic Mean
=AVERAGE(.. , ..)
1,34
1,67
Margin of Error
=CONFIDENCE
(α,STDEV;sample
size)
0,333
0,347
Lower born
1,008
1,322
Upper born
1,674
2,016
Confidence Interval
In this example, we choose α=0,05 (5%),
i.e. 95% CI,
sample size = 28
STATISTICAL TREATMENT
Confidence Interval
Measures of Trends
 Skewness [a3]:
 Kurtosis [a4]:
A measure of the asymmetry of the
probability distribution of a random
variable.
n
a3 = 0 : Normal distribution
a3 
(X
i 1
i
 X)
(n  1).s
A measure of the “peakedness” of
the probability distribution of a
random variable.
n
3
a4 
3
a4 = 0 : Normal distribution
a4 > 0 : Peaked distribution
a4 < 0 : Flat distribution
(X
i 1
i
 X )4
(n  1).s 4
3
STATISTICAL TREATMENT
Measures of Trends
Examples
 Pearson coefficient of correlation rp :
Question:
In which extend the R&D expenditures
in 2012 are strongly correlated with the
R&D expenditures in 2004?
It indicates the strength and the
direction of a linear relationship
between two random variables (X and
Y).


  ( X i  X )(Yi  Y ) 
1

rp  . i 1

n
 X . Y




n
Normally, we are waiting for a very positive
coefficient. Countries with initial high
expenditures will continue in tendency to
have high expenditures.
The Correlation coefficient does not indicate
a cause and effect relationship
 Spearman Coefficient of correlation rs:
It indicates the strength and the
direction of a relationship (not
necessarily linear) between two
random variables
n
rs  1 
6 di2
i 1
n (n  1)
2
Statistical Analysis
with Excel
Pearson Correlation
=CORREL(D2:D29,E2:E29)
0,914
STATISTICAL TREATMENT
Measures of Correlation
INTRODUCTION TO METHODS OF SPATIAL ECONOMIC ANALYSIS:
USING SPSS
The Excel file
LECTURE_02_DATA.xls
has to be closed
The Worksheet with
the appropriate
format is:
Data_SPSS
The data that we are
going to open through
SPSS are in the range:
A1:M29
The 1st row has to
contain the names of
the variables.
The names of the
variables cannot
contain special
characters such as
space, %,@,$,*, / etc.
Source: Eurostat,
EU 2020 indicators
It is suggested to utilize short names for the variables, because you can define in detail
the variable in the Label column in the specific sheet describing the variables.
Population in 2004 = POP04 (POP 04 is not allowed because of the space)
DATA FOR ANALYSIS
From EXCEL to SPSS
Command:
File
Open
Data
The Excel file
Data_LECTURE02.xls
has to be closed
The Worksheet with
the appropriate
format is:
1. Select Excel type of
File
Data_SPSS
The data that we are
going to open
through SPSS are in
the range: A1:M29
The 1st row has to
contain the names of
the variables
2. Then select the file
LECTURE02_DATA.xls
3. Open
New window where we
can select the
appropriate Worksheet
(Data_SPSS).
You will have also to
check the range
4.
Source: Eurostat,
EU 2020 indicators
DATA FOR ANALYSIS
From EXCEL to SPSS
As you can observe the names of the
variables are in the initial row without
number. Consequently the 1st row
gives us the data of the 1st one
country (Belgium).
Each data file of SPSS has two sheets:
 Data View with the data
 Variable View where you can enter
information about your data, and
specify the nature and the meaning
of the data.
DATA FOR ANALYSIS
Data in SPSS
How to obtain the most important
statistical parameters of our variables
in order to proceed to an exploratory
analysis (Descriptive statistics)?
Use the following command:
Analyze
Descriptive Statistics
Explore
DATA FOR ANALYSIS
Statistical Treatment with SPSS
1. Select the variables to be explored from
the left-hands list. It is possible to select
more than one variable and to produce all
the results for the various selected
variables.
2. Move the variables to the right pane:
Dependent List.
3.
With Explore, statistical parameters are
calculated as well as the Box-Plot through
which we can detect the presence of outliers.
In some cases, we will examine the statistical
parameters of one or more than one variables
for sub-groups of the total population. In this
case, we will have to move the variable
defining the sub-groups in the pane: Factor
List.
DATA FOR ANALYSIS
Statistical Treatment with SPSS
The results appear in a new worksheet :
Output which is completely independent
from the data sheet.
This sheet can be saved or convert in
word, excel etc.
All the results are summarized in the
table.
DATA FOR ANALYSIS
Results from Explore
With Explore, we also
obtain for each variable
the Box-Plot.
This diagram allows us
to verify in which
extend, the variables
present a quite
“Normal” distribution
while it also allows to
detect the presence of
outliers (values that are
below or above of the
accepted thresholds).
In this case, there is no
outlier.
DATA FOR ANALYSIS
Results from Explore
UNIVERSITY OF THESSALY
FACULTY OF ENGINEERING
DEPARTMENT OF PLANNINGAND REGIONAL DEVELOPMENT
MASTER «EUROPEAN REGIONAL DEVELOPMENT STUDIES»
METHODS OF SPATIAL ECONOMIC ANALYSIS
LECTURE 02
Δρ. Μαρί-Νοέλ Ντυκέν, Αναπληρώτρια Καθηγήτρια,
[email protected]
Τηλ. 24210-74438
Γραφείο Γ.6