Detecting domain dynamics

Download Report

Transcript Detecting domain dynamics

Detecting domain dynamics:
Association Rule Extraction and
diachronic clustering techniques in
support of expertise
Ivana Roche
Maha Ghribi
Nathalie Vedovotto
Claire François
Dominique Besagni
Pascal Cuxac
Dirk Holste
Marianne Hörlesberger
Edgar Schiebel
Work context
European project DBF:
– Development and Verification of a Bibliometric Model
for the identification of Frontier Research
– Part of a Coordination and Support Action (CSA) for the
European Research Council (ERC)
– following the requirements of the High Level Expert
Group of the ERC
• we developed several indicators,
• including a Proximity indicator
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
2
ERC framework (1/2)
First European funding body to support investigator-driven
research through open and direct competition
Main goals:
– Scientific excellence as the only scientific selection criteria
– Major grants for the truly best and creative researchers,
– To identify and explore new opportunities and directions in all fields.
Scientific domains (panels):
– Physics and Engineering (PE)
– Life Sciences (LS)
– Social Sciences and Humanities (SH)
 10 panels
 9 panels
 6 panels
Grant Application schemes:
– Starting researcher grants (StGs)
– Advanced investigator grants (AdGs)
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
3
ERC framework (2/2)
ERC annual budget evolution (2007-2013):
1800
1500
Mio. €
1200
900
600
300
0
Rate of selected proposals:
– StGs (2009)  10% (244 out of 2,503 submitted proposals)
– AdGs (2009)  15% (244 out of 1,584 submitted proposals)
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
4
Definition of the Proximity indicator
Scope:
– it is employed to infer the « innovative degree » of the proposal through the
dynamic change of the scientific landscape corresponding to the proposal’s allocated
panel
Data sources:
– ERC data:
• Panels description
• Projects summary
– Bibliographic databases
Hypothesis:
– the closer a proposal is to regions of positive dynamic change, the more innovative
it is
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
5
Description of the proximity indicator
Panel
description
DB of
Bibliographic bibliographic
references
database
query
T1
DB of
bibliographic
references
ERC
database
Construction of two
indexed corpora
time windows (T1, T2)
Data
from
proposals
T2
Extraction of
terminological
information
Ranking of
Diachronic
clusters by
clustering
novelty
analysis
T1,T2 degree
Calculation of
PROXIMITY
indicator
ERC
database
Translation of
main panels into
database
queries
Similarity of
proposal with
regard to T2
clusters
Data pre-processing and text mining
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
6
Tools
Assisted indexing
– Terminological resources
– TreeTagger
– FastR
Clustering
– Axial K-means (NEURODOC)
– Principal Components Analysis
Fuzzy Association Rule Extraction
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
7
Clusters map
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
8
Clusters relationships
Novelty index = Inter-period index & Intra-period index
The lower the Novelty index value, the higher its innovativeness degree
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
9
Calculation of Proximity indicator
ranking of T2 clusters by Novelty index
Proposal
categorization of clusters
AAA
A
AA
Text mining / assisted indexing
decreasing innovation
decreasing value
of similarity
(N clusters)
Keywords
……
Proximity = geometric means
of similarity
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
10
Case study
Starting grant 2009, panel PE07
– Systems and communication engineering: electronic, communication, optical and
systems engineering
– 29 proposals  4 successful
Database: PASCAL from INIST
First corpus:
– Year 2000
– 20,568 records  21,781 keywords
Second corpus:
– Year 2009
– 19,827 records  18,475 keywords
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
11
Clusters ranking
High
Intermediate
Low
Angiospermae
Optical method
Decision support system
Space remote sensing
Thin film
Optoelectronic device
Statistical simulation
Nanoelectronics
Imagery
Decision aid
Non destructive test
Image processing
Radio frequency identification
Chemical sensor
Computer network
Complementary MOS technology
Smart material
Closed feedback
Data analysis
Microelectromechanical device
System identification
Discrete event system
Wavelet transformation
Photonics
Discrete system
Neural network
Fiber optic sensors
Process control
Particle swarm optimization
Wireless network
Ultrasonic transducer
User interface
Optical fiber network
Control system
Optical sensor
Integrated optics
Hyperspectral imaging sensor
Video signal processing
Signal detection
Microelectronic fabrication
Piezoelectric sensor
Teletraffic
Real time system
Constrained optimization
Wireless LAN
Radiation detector
Actuator
Diffraction grating
Robotics
Noise reduction
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
12
Top-ten results
Project proposal ID
Innovativeness degree rank
Expert panel choice (0/1)
PROP_19
PROP_23
PROP_14
PROP_02
PROP_08
PROP_07
PROP_22
PROP_06
PROP_12
PROP_01
1
2
3
4
5
6
7
8
9
10
0
0
0
0
1
0
0
0
0
1
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
13
Remarks
Proximity is only one of 4 indicators.
The process is still being refined:
– categorization of clusters,
– number of clusters used to calculate the indicator.
The limit of the system:
– A concept is found only when it is explicitly stated.
– Using a terminological resource means we add new
concept only when it goes mainstream.
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
14
Conclusion
We used Association Rule Extraction and diachronic
clustering to detect the evolution of a domain and rate
projects accordingly to that dynamics.
But, how good is it?
We need to:
–
–
–
–
do some more tests on other panels,
meet with the panels experts,
improve our assisted indexing,
add some terminological extraction.
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
15
Acknowledgements
This work was partially funded by the « Ideas » specific
Programme of the EU’s 7th Framework Programme for
Research and Technological Development (project
reference no. 240765)
Project website: http://www.ait.ac.at/dbf
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
16
Thank you
Detecting domains dynamics
GTM 2011 – Atlanta, GA - September 14th, 2011
17