Presentation

Download Report

Transcript Presentation

Discovery Challenge –
ECML/PKDD2004
September 20, 2004, Pisa, Italy
Atherosclerosis
Marie Tomečková
EuroMISE Centre – Cardio
Institute of Computer Science, Academy of Sciences of the CR,
Prague, The Czech Republic
Supported by the project LN00B107 of the Ministry of
Education of the Czech Republic
Atherosclerosis
• a total complicated disease of the vessels in all
organism
• a dynamic process, it begins in childhood and
adolescence and continues for the whole life
• opinions on the origin and progress of
the disease are developing
• interaction and influence of genetic predisposition
and exterior environment
• the influence of so-called risk factors is still regarded
• On the other hand – there some so-called protective
factors
Risk factors of atherosclerosis
• non-affectable: sex, age, family history
• affectable:
• factors of life style
• physical activity
• smoking
• reaction on stress
• blood pressure, metabolic factors - level of lipids and glucose,
homocystein
• many other factors: coagulopathies, infections, inflammation,
factors changing the function of endothelium, social and psychological
factors
• combinations, clustering and interactivity: Reaven´s
syndrom
STULONG
(acronym)
• LONGitudinal twenty years lasting STUdy
of risk factors of atherosclerosis
• The study was realized in the years 1975-2000 on the 2nd
Dept. of Internal Medicine, 1st Faculty of Medicine of
Charles University, Prague, the Czech Republic
• The data were transferred to the electronic form by the
European Centre of Medical Informatics, Statistisc, and
Epidemiology of Charles University and Academy of
Sciences of the Czech Republic
STULONG
Main aims of the study:
•
To determine prevalence of the risk factors of
atherosclerosis in middle-aged men
•
To follow up the development of the risk factors
•
To asses the possibilities and the influence of the complex
intervention on the incidence and values of the risk factors
and on the cardiovascular mortality
Population:
• urban population of middle-aged men (centre of Prague)
• 2370 men have been invited
• 1417 men have been examined, the respondence was 59%
• Middle-aged men – it is the population mostly threatened
by the atherosclerosis and by its consequenses
Definition of risk factors
blood pressure  160/95 mm Hg
cholesterol
 260 mg% (6,7 mmol/l)
smoking
 15 cigarettes/day
obesity
 15% above optimal weight
positive family history prematured death on the
atherosclerotic diseases (parents, siblings)
STULONG - analysis
Statistical
- descriptive statistics
- logistic regression
- survival analysis
Data mining
- different methods
- resulting in different conclusions
Basic characteristics of men in STULONG
(risk group - at least 1 RF, without the disease))
• Prevalence of risk factors at the entry
RF
n
%
hypercholesterolemia
290
34.2
hypertension
287
34.0
smoking
543
63.3 !!!
obesity
196
23.0
positive family history
216
25.3
Prevalence of risk factors in risk group
yes
no
100%
36,9
65,7
65,8
34,3
34,2
74,6
76,8
25,4
23,2
RA
(n=855)
obesity
(n=856)
50%
63,1
0%
smoking TCH
(n=860) (n=851)
HT
(n=848)
Basic characteristics of men in STULONG
(risk group, age 46.1±3.6)
mean
s (±)
Nr of RF
1.7
2.0
cholesterol
(mmol/l)
6.25
5.4
systolic blood
pressure (mm Hg)
134.4
67.3
diastolic blood
pressure (mm Hg)
85.3
47.5
Nr of cig/day
9.4
25.0
Brocca index (%)
106.8
47.4
Mortality depending on the number of RF
(atherosclerotic cardiovascular diseases)
per 1000
25
20,0
20
15
10,8
10
5
8,5
5,0
1,6
0
0
1
2
number of RFA
3
4
Survival analysis
N
1 RF
2 RF 3 RF
4 RF
10. year
99 %
98 %
93 %
93 %
91 %
20. year
97 %
89 %
84 %
80 %
63 %
The relative risk of death caused by
atheroslcerotic CVD
variable
cathegory
RR
p
age
35 - 44
45 - 55
1,00
2,08
0,001
education
basic
vocational
middle
university
1,00
0,80
0,67
0,36
0,398
0,151
0,003
smoking
no
yes
1,00
2,36
<0,001
PB
<=140/90 mm Hg
141/91-159/94 mm
Hg
>=160/95 mmHg
1,00
1,36
0,301
2,49
<0,001
<5,2 mmol/l
5,2-6,6 mmol/l
>= 6,7 mmol/l
1,00
1,50
1,87
0,154
0,034
TCH
Discovery Challenges
Atherosclerosis – growing number of the papers
• 2002 – Helsinki …….…5 papers
• 2003 – Cavtat …………9 papers
• 2004 – Pisa ………….. 11 papers
Four data files for
analysis – data mining
 Entry - attributes obtained from entry examination – 1417 men –
244 attributes of each men
 Control – attributes recorded during the follow up (changing of the
social and health status, values of follow risk factors, therapy …) – 10
600 investigations – each with 66 attributes
 Letter – additional information collected at the end of the study by the
postal questionnaire (men, who disscharged the following) - 403 men –
62 attributes of each men
 Death – date and cause of death – 389 men
Four groups of
analytic questions
Related to




the entry examination
the long - term observation – follow-up
the postal questionnaire – at the end of the study
the relations concerning entry examination, control
examination, and death
Approaches to solve the analytic
questions – 1:
given in the past Discovery Challenges
•
•
•
•
•
•
•
Univariated and bivariated data analysis
Assiciation rules
SDS rules (Set Differs of Set)
Trend analysis
Time windows analysis
ROC analysis
Disciminate function
Approaches to solve the analytic
questions – 2:
•
•
•
•
•
•
•
Fuzzy approximate dependencies, fuzzy logic
Functional dependencies
Inductive logic programming technigue
Explicit relations
The selection of the strongest emerging patterns
Genetic approach
Approach to generate a mathematical algebraic model
Analytic guestions - some results
• Protective influence of number of the visits
• Protective influence of the beer drinking,
but not of the wine drinking
• Correlation of Body Mass Index with the
skin foldes – very good discrimination of
the three basic groups of men (normal, risk,
pathological)
Further use and publications of the
STULONG data
are possible only under the condition of the following explicit quotation:
„The study (STULONG) was realized at the 2nd Department of
Internal Medicine, 1st Faculty of Medicine of Charles University
and University Hospital, Prague 2, Czech Republic (head Prof.
M. Aschermann, MD, SDr, FECS), under the supervision of Prof.
F. Boudík, MD, SDr, with the collaboration of M. Tomečková, MD,
PhD, and Ass. Prof. J. Bultas, MD, PhD. The data were transferred
to the electronic form by the European Centre of Medical Informatics,
Statistisc, and Epidemiology of Charles University and Academy of
Sciences of Czech Republic (head Prof. RNDr J. Zvárová, SDr).”
At present time, the data analysis is supported by the project Nr.
LN 00B 107 of the Ministry of Education of the CR.
Thank you
for your effort in the STULONG data set
analysis and for your attention
Marie Tomečková
EuroMISE Centre – Cardio
Pod Vodárenskou věží 2
182 07 Prague, The Czech Republic
[email protected]
http://www.euromise.cz