Thematic structuring of the ESPON 2013 DB

Download Report

Transcript Thematic structuring of the ESPON 2013 DB

ESPON 2013 DATABASE
Malmö Seminar, 2-3 December 2009
Thematic structuring of the ESPON 2013
DB
Geoffrey Caruso and Nuno Madeira
Outline
•
•
•
•
Towards an ESPON thesaurus?
Text mining methods for organising knowledge
Techniques to increase visual perception: first results
Short-term solution
Towards an ESPON thesaurus?
•
•
•
•
Draft technical report
describes some of the main
features for thesaurus
construction
Presents some examples
developed by international
organisations (ILO, UNESCO,
FAO, EUROSTAT, …)
Stresses the importance of
harmonising vocabulary
Explores the usefulness of
text mining methods to
further support the thematic
structuring
Text mining methods for organising knowledge
•
•
•
•
Textual data is usually considered as a
collection of unstructured information
that needs to be prepared in a very
special way before any method can be
applied
Text mining methods transform data
from text to standard numerical forms
For this purpose we have collected
approximately 200 reports, studies,
and policy notes addressing ESPON
evidence and results.
The dependency and ambiguity of
textual data required a primary focus
on data preparation
Techniques to increase visual perception
•
•
•
•
Explore visualisation tools
through maps of keywords based
on co-occurrence data to better
communicate outputs
First results reveal highly
complex structures, though some
interpretation can be discerned
However, it questions the
completeness of our corpus for
analysis, especially in terms of
cluster stability
For instance, how many reports
and studies are sufficient to
guarantee consistent results?
Short-term solution
ESPON 2013 Database
•
Population
Natural population change
Life expectancy at birth
•
Transport
Potential accessibility by air
Potential accessibility by road
•
Environment
Landscape fragmentation
•
Environmental quality
Agriculture
?
?
•
First hierarchical structure not deriving
from text mining methods but rather
adapting the previous ESPON DB based
on indicators delivered so far
Investigate the degree of resemblance
between some important database
classifications (EUROSTAT, OECD, EEA,
UNEP, WPI) and ESPON 2006 DB
Identify patterns that could contribute
to the harmonisation of categories or
themes
Employ matrix visualisation techniques
for cluster analysis
Knowledge acquired from text mining
methods will constitute the basis for
improvement on both hierarchical and
associative relationships
Thank you for
your attention !