Transcript Slide 1
A Multivariate Analysis on the 2004
Summer Olympic Games
Wei Xiong, M.Sc Student,
Department of Mathematics and Statistics,
University of Guelph
May 12-13, 2005
1
OUTLINE
1. Introduction
•
•
•
2004 Summer Olympic Games
Multivariate techniques: cluster analysis,
multivariate analysis of variance, multivariate
regression analysis
Literature review of analyses on Olympic Games
2. Data Analysis and Discussion
3. Conclusions
2
2004 Summer Olympic Games
•
the largest event, 11,000 athletes from 202
countries, 929 metals won by 75 countries/regions.
Multivariate (>1 response variable) Techniques
•
Cluster Analysis: obs’n (countries) classified into clusters
(groups) based on each obsn’s similarity of multi variables
(number of gold, silver, bronze and total), by measuring the
distance or dissimilarity between any two clusters.
3
• Multivariate Analysis of Variance (MANOVA):
a generalization of ANOVA, used to compare more than
two population mean vectors
Hypothesis:
H0: 1 = … = t
versus
Ha: j ≠ k (for some j ≠k)
H0 is rejected if H = SS(Treatment) >> E = SS(Error)
Wilk’s statistic = |E| / |E+H|
4
• Multivariate Regression
model: Y (nxp) = X (nxq) (qxp) + E (nxp)
where n: observations,
p: response variables,
q: explanatory variables
Least square estimator of is:
(X'X )-1X'Y
5
Literature review
•
Condon et al [1] tried to predict a country’s success at the
Olympic Games using linear regression models and neural
network models.
Lins et al [2] developed a Data Envelopment Analysis (DEA)based model to rank each country based on its ability to win
medals in relation to its available resources.
Churilov and Flitman [3] improved the Data Envelopment
Analysis (DEA)-based model by combining different sets of
input parameters with the DEA model.
•
•
This study: uses multivariate techniques to analyze the 2004
Summer Olympic Games and try to explore the factors that
influence the number of medals won.
6
Table 1: Rankings For Participating Countries
Country
Gold
(y1)
Silver
(y2)
Bronze
(y3)
Total
(y4)
Ranking
(by Gold)
[4]
Ranking
(by Cluster
Analysis)
USA
35
39
29
103
1
1
China
32
17
14
63
2
2
Russia
27
27
38
92
3
1
Canada
3
6
3
12
21
4
Syrian
0
0
1
1
71
5
Trinidad
0
0
1
1
71
5
Note: number of countries in cluster 1, 2, 3, 4 and 5
are 2, 3, 7, 7, 56 respectively.
7
Table 2: Least Square Means for Group Medals
Medals
y1 (Gold)
y2 (Silver)
y3 (Bronze)
y4 (Total)
31.00
33.00
33.50
97.50
21.00
16.33
16.00
53.33
3
10.43
8.86 #
11.00
30.29
4
4.86
7.00 #
5.29
17.14
5
1.23
1.34
1.75
4.32
Group
1
(USA, RUS)
2
(CHN, AUS, GER )
Note: # close to each other
8
Multivariate Analysis of Variance (MANOVA):
Compares the metal means for the 5 groups
proc glm;
class group
model y1-y4=group;
manova h=group;
lsmeans group/pdiff;
run;
MANOVA Test: Hypothesis of No Overall Group Effect
Statistic
Value
F Value
Pr > F
Wilks' Lambda
0.02126952
49.34
<.0001
9
Least Squares Means for effect group for silver (y2)
Pr > |t| for H0: LSMean(i)=LSMean(j)
i/j
2
3
4
5
1
<.0001
<.0001
<.0001
<.0001
2
3
4
<.0001
<.0001
<.0001
0.0572
<.0001
<.0001
Note: p-values for other metals < 0.0001
10
?
WHY
• Why some countries won more medals and the others won less
• Hypotheis: the larger the population and GDP, the more the
medals
Population: the larger the population (x1),
the more the outstanding athletes available
GDP (Gross Domestic Product): the higher the GDP,
the more the funding for athletes training
11
Table 3: Multivariate Regression of Medals on
Population (x1) [5] and GDP (x2) [6]
proc glm;
model y1-y4 = x1-x2/xpx i;
run;
y’s
Number of Gold
(y1)
Number of
Silver (y2)
Number of
Bronze (y3)
Number of Total
(y4)
1
p-value
2
p-value
3
p-value
4
p-value
x1
(million)
0.0116
0.0002
0.0043
0.1223
0.0031
0.3712
0.0190
0.0317
x2
($billion)
0.0031
<.0001
0.0033
<.0001
0.0027
<.0001
0.0091
<.0001
x’s
12
Conclusions
The 2004 Summer Olympic Games are analyzed using multivariate
methods: Cluster Analysis, Multivariate Analysis of Variance, Multivariate
Regression Analysis.
Participating countries are classified into 5 groups based on their number
of medals won. It is found that each group differs significantly in terms of
the number of medals in that group.
13
Population and GDP are two significant factors for each group’s number of
medals: an increase of 1 million in population increase the number of gold
by 0.0116, or the number of total medals by 0.019. 1 billion’s increase in
GDP increase the number of gold by 0.0031, silver 0.0033, bronze 0.0027,
or total by 0.0091.
References
[1] Edward M. Condon, Bruce L. Golden and Edward A. Wasil (1999).
Predicting the success of nations at the Summer Olympics using neural
networks. Computers & Operations Research. 26(13),1243-1265.
14
[2] Marcos P. Estellita Lins, Eliane G. Gomes, João Carlos C. B. Soares
de Mello and Adelino José R. Soares de Mello (2003). Olympic ranking
based on a zero sum gains DEA model. European Journal of
Operational Research. 148(2), 312-322.
[3] L. Churilov and A. Flitman (2004). Towards fair ranking of Olympics
achievements: the case of Sydney 2000. Computers & Operations
Research. Available online 6 November 2004.
[4] http://www.athens2004.com/en/OlympicMedals/medals, accessed
May 11, 2005.
[5] http://www.geohive.com/global/index.php, accessed Nov. 25, 2004.
[6] http://www.geohive.com/global/geo.php?xml=ec_gdp1&xsl=ec_gdp1,
accessed May 11, 2005.
15
16
Appendix 1
Table 1. Number of metals for each
country/region
•
Country/Region,Gold,Silver,Bronze,Total
•
USA 35,39,29,103 CHN 32,17,14,63 RUS 27,27,38,92 AUS17,16,16,49
JPN16,9,12,37 GER 14,16,18,48 FRA11,9,13,33 ITA 10,11,11,32 KOR 9,12,9,30
GBR 9,9,12,30 CUB 9 7 11 27 UKR 9 5 9 23 HUN 8 6 3 17 ROM 8 5 6 19 GRE 6
6 4 16 NOR 5 0 1 6 NED 4 9 9 22 BRA 4 3 3 10 SWE 4 1 2 7 ESP 3 11 5 19 CAN
3 6 3 12 TUR 3 3 4 10 POL 3 2 5 10 NZL 3 2 0 5
THAThailand314826BLRBelarus2671527AUTAustria241728ETHEthiopia232729IRII.
R.Iran222630SVKSlovakia222631TPEChineseTaipei221532GEOGeorgia220433BUL
Bulgaria2191234JAMJamaica212535UZBUzbekistan212536MARMorocco210337DE
NDenmark206838ARGArgentina204639CHIChile201340KAZKazakhstan143841KEN
Kenya142742CZECzechRepublic134843RSASouthAfrica132644CROCroatia122545
LTULithuania120346EGYEgypt113547SUISwitzerland113548INAIndonesia112449ZI
MZimbabwe111350AZEAzerbaijan104551BELBelgium102352BAHBahamas101253I
SRIsrael101254CMRCameroon100155DOMDominicanRep100156IRLIreland100157
UAEUArabEmirates100158PRKDPRKorea041559LATLatvia040460MEXMexico0314
61PORPortugal021362FINFinland020263SCGSerbia.Monteneg020264SLOSlovenia
013465ESTEstonia012366HKGHongKong010167INDIndia010168PARParaguay0101
69NGRNigeria002270VENVenezuela002271COLColombia001172ERIEritrea001173
MGLMongolia001174SYRSyrianArabRep001175TRITrinidad.Tobago0011
17
SAS coding-1
data Anthemn2004SummerOlympic;
input Country $ y1-y4;
cards;
see Table 1 for data
;
proc cluster method=eml standard rmsstd rsquare outtree=tree;
var y1-y4 ;
id country;
run;
proc tree data=tree noprint n=5 out=countryout;
id country;
run;
proc tree data=tree n=5;
id country;
run;
proc sort;
by country;
proc sort data=Anthemn2004SummerOlympic out=new;
by country;
data temp;
merge new countryout;
by country;
proc sort;
by cluster;
proc print;
id country;
proc factor heywood rotate=varimax, quartimax;
var y1-y4 ;
by cluster;
proc princomp;
var y1-y4 ;
run;
proc factor heywood rotate=varimax, quartimax;
var y1-y4 ;
run;
18
SAS coding-2
data Anthemn2004SummerOlympic;
input group y1-y4 x1-x2;
cards;
5 35 39
29
103
273
10882
5 27 27
38
92
146
433
4 32 17
14
63
1247 1410
4 17 16
16
49
19
518
4 14 16
18
48
82
2401
;
proc glm;
class group;
model y1-y4=group;
manova h=group/printe printh;
lsmeans group/pdiff;
run;
19
SAS coding
data Anthemn2004SummerOlympic;
input group y1-y4 x1-x2 ;
cards;
……………..
;
proc corr;
var y1-y4 x1-x2;
run;
proc glm;
model y1-y4 = x1-x2/xpx i;
MANOVA H=x1 x2 /printe printh;
run;
20
Cluster analysis: Countries Classified into 5 Groups
- 644
- 144
L
o
g
L
i
k
e
l
i
h
o
o
d
356
856
1356
1856
Groups:
2356
54
3
2
1
U R C A G J F G I K C U H G R C B N E N S N G L MI S C T J U A A E S S C B I C D I U I E Z B
S U H U E P R B T O U K U R O A L E S O WZ E T A R V R P A Z R Z G U L H A S M O R A N S I E
A S N S R N A R A R B R N E M N R D P R E L O U R I K O E MB G E Y I O I H R R ML E A T ML
U R C A G J F G I K C U H G R C B N S N S N G L MI S C C J U A A E S S C B I C D I U I E Z B
n u h u e a r r t o u k u r o a e e p o w e e i o . l r h a z r z g wl h a s a o r A n s i e
i s i s r p a e a r b r n e m n l t a r e w o t r R o o i mb g e y i o i h r m me r d t ml
t s n t ma n a l e a a g e a a a h i wd Z r h o . v a n a e e r p t v l a a e i l a o o b g
e i a r a n c t y a i a c n d r e n a e e g u c I a t e i k n b t z e e me r n a b n n a i
CAN
N V C E MS T F S H I P A E R K
GE O R GY R I C K N A U T S A
R N L I L R I N G GD R T H A Z
N V C E MS T F S H I P A E S K
i eo r oy r i e ona ut oa
gnl i nr i nr ndr s huz
eeo t gi nl b gi a t i t a
r z mr o a i a i K a g r o h k
K C P L MP B T P T B D
E Z R A E ORUOHUE
NE K T X R A RL A L N
K C D L MP B T P T B D
ez Paeo r uo hue
neRt x r ar l al n
yc Kvi t z ka i gm
aho i cu i en l aa
Co u n t r y
21
Table 2: Factor Analysis on Metals
Group
1
2
Latent
Factor
1
1
1
2
(94.71) *
(95.99)
(61.35)
Gold
(y1)
0.9634 #
0.9997
Silver
(y2)
0.9694
Bronze
(y3)
Total
(y4)
(%)
3
4
5
(86.50)
1
(52.10)
2
(83.89)
1
(58.04)
2
(83.09)
0.8783
-.0055
-.0378
0.9844
0.7654
0.0052
0.9839
0.1470
0.9873
0.6388
-.5449
0.1409
0.9893
0.9595
- .9414
0.8314
-.0629
0.8151
-.1716
0.8546
-.1100
0.9999
0.9928
0.8682
0.4932
0.9551
0.2727
0.9186
0.3908
Note: * cumulative eigenvalues, percentage of total variation explained in the four variables (metals)
#
Factor loading, correlation between latent factor and variables
(Factor Analysis, rotation = quartimax, make latent factor strongly or weakly correlated to variables)
22
Correlation Between y’s and x’s
x1 ( # Population) [2] , x2 ( # GDP, Gross Domestic Product) [3]
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
x1
x2
y1
y2
0.46543
<.0001
0.70219
<.0001
0.3038
0.0081
0.76180
<.0001
y3
0.23199
0.0452
0.60769
<.0001
y4
0.34887
0.0022
0.71640
<.0001
Note: reasonable correlation between y’s and x1,
large correlation between y’s and x2.
# Both population and GDP are in 2003
23