Transcript wp4

Overview
Clustering
Text analysis
Results
Profiling
Profile parts
Intelligent maintenance
and use
WP Intelligent maintenance and use will
provide
Information on: content
Information on: activities
Information on: status
Intelligent management of content
Analyze the characteristics and explore algorithms
Support the activities of different types of actors
Develop effective tools for Intelligent use and
maintenance
Advanced user management
Enhance user experience
Provide personalized support for the use of sciX
services
Will enable customization to the reasonable level
Help users (reduce time) to retrieve, exchange,
publish, etc. Information
Provide basis for easier scientific collaboration
Results
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
Use of clustering
Assist different types of users
Allow users to browse through the papers
by the topics
Help editors to categorize scientific
contributions
Develop topic maps covered in different
repositories
Canonical form
Detect most appropriate canonical forms
similar documents in the field of scientific technical papers
Efficiency of clustering depends on selected
algorithms
Will perform local as well as global
analysis
Results
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
Results
Characteristics of the repositories
2500
Efficiency of text
mining algorithms strongly
depends on characteristics
of the text
1500
1000
500
enlarg
interconnec
scan
sculptur
plant
notat
risk
habit
length
conveny
fourth
elimin
live
rapidly
competit
applicabl
typic
full-scale
directly
environmen
altern
artificy
investig
design
0
univers
Number of occourancies
2000
Distribution of sorted wordstem frequencies
Time dependent word-stem
frequencies – do not effect
growth of vocabulary , but
are very important for the
overview of the
developments in certain
scientific field
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
Results
More results from the analysis
100%
process model
90%
build process
product model
80%
product data
data model
70%
data exchang
60%
design construct
design process
Analysis based
on multi word
frequencies
50%
comput aid
object orient
40%
inform system
30%
knowledg base
project manag
20%
construct project
10%
Crossbow
Crossbow Cluster 5
Crossbow Cluster 3
Crossbow Cluster 7
Crossbow Cluster 4
Crossbow Cluster 2
Crossbow Cluster 6
Crossbow Cluster 1
life cycl
Crossbow Cluster 8
construct process
0%
y1988
Clustifier
Clustifier Cluster 1
Clustifier Cluster 8
Clustifier Cluster 3
Clustifier Cluster 6
y1991
Clustifier Cluster 4
Clustifier Cluster 2
Clustifier Cluster 5
y1992
Clustifier Cluster 7
y1993
y1994
y1995
y1996
y1998
y1999
y2000
y2001
y2002
From the Analysis of
word frequencies
evolutions of
research topics can
be determined
Evaluation of
different clustering
algorithms
Overview
Clustering
Text analysis
Text
Results
mining
Profiling
Profile parts
Text Mining Tools
Custom made analysis tools
Open Source Software
Bow: A Toolkit for Statistical Language
Modeling, Text Retrieval, Classification and
Clustering
Clusstifier (K-mean, semantic analysis)
Commercial:
Text Analyst
Intelligent miner for text
Results
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
Use of results of clustering
Results
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
SCIX profiling
Profiling shouldn’t
Profiling should
Limit general use of
the sciX services
Ease the use of the
sciX services
Interfere with privacy
issues
Help users (reduce
time) to retrieve,
exchange, publish,
etc. Information
Force the user to
customize each and
every page
Do things without
users’ knowledge
Provide basis for
easier scientific
collaboration
Enhance the overall
user experience
Results
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
User profile
Personal Profile
User profile will provide
personalized support for
the use of sciX services
personal data category
gathering data category
document content category
document structure category
document source category
delivering data category
delivery means category
delivery time category
actions data category
security data category
Intelligent maintenance
and use will be based on
advanced usage tracking
as text analysis
Will provide different
modalities – push
technologies
Customization to the
extend that will help
information browsing
and monitoring
Results
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
Personal data about users
Personal Profile
personal data category
gathering data category
document content category
document structure category
document source category
delivering data category
delivery means category
Data model
username
Given Name
Family Name
Email
Title
Affiliation
Country
Address
Security services for
scientific publishing not
comparable to industry
standards
delivery time category
actions data category
security data category
Will provide a secure way of
exchange of personal
information
Results
Overview
Clustering
Text analysis
Results
Profiling
Profile parts
Results
Actions: advanced usage tracking
Welcome Pages
ind ex.htm
82 2 247
304
Browse Pages
171
S earc h Forms
SearchF orm
BrowseAZ
13 9
344
1405
12 3
144
Go
6937
1321
797
248
BasketShow
1464
344
449
15 0
223
90 9
Bask etAdd
Sho w
335
287
2547
31 2
188
43 7
199
Record Details
252
24 94
Search
abou t.htm
B rowseKeyw ords
779
Search Resul ts
18 0
43 0
212
12 4
AdvancedSearchForm
137
B asket man ipulat ion
855
1089
134
Session
1
2
3
4
5
6
7
19 394
25 83
Lo ginForm
Login
148
Paper 1
0
1
0
1
0
1
1
Paper 2
1
0
1
1
0
0
0
Paper 3
1
0
0
0
1
0
0
Paper
Paper 1
Paper 2
Paper 3
Paper 4
Paper 5
Sum
Paper 4
0
1
1
0
0
1
0
Paper 5
0
1
0
1
0
1
0
Session
2
4
6
Sum
Paper 1
1
1
1
3
Paper 2
0
1
0
Paper 3
0
0
0
Paper 4
1
0
1
Paper 5
1
1
1
1
0
2
3
78 53
394
379
38 9
406
BasketAd dOne
22 1
27 4
368
Misc Pages
Session Paper 1 Paper 2 ... Paper i ... Paper n-1 Paper n
1
0
1
1
0
1
2
1
1
0
0
0
...
J
0
0
1
1
0
...
s-1
1
1
0
1
0
S
0
0
1
0
0
Del ete
Dis playS tructu re
Reg3
228
3
1
0
2
3
Paper
Paper 5
Paper 4
Paper 2
Sum
3
2
1