Lost in the Ocean of Text Documents?

Download Report

Transcript Lost in the Ocean of Text Documents?

Roma, 24 novembre 2005
Visual Text Mining with SWAPit
Detection of semantic relationships among text documents
and associated data sources
Andreas Becks
Fraunhofer-Institute of
Applied Information Technology
Sankt Augustin & Aachen, Germany
Aachen
St.Augustin
Lost in the Ocean of Text Documents?
A huge amount of organisational knowledge
is stored in text documents
85 to 90 percent of all corporate data
according to Merrill Lynch and Gartner studies
Even when DMS and desktop search are used, a huge
amount of time is necessary to find important
information
80% of companies and 40% of public administrations
need more than one day
[Zylab survey]
Text Mining helps to explore and
analyse natural-language texts
uncover relationships, recognize trends
group, condense pieces of knowledge
categorize text information
© Fraunhofer-FIT 2005
2
SWAPit Helps You to Navigate Through Your Text Data
The tool visualises semantic relationships among text documents...
X-ray view for
document
archives
© Fraunhofer-FIT 2005
3
SWAPit Integrates Text and Data Mining
... and allows to navigate, search, browse and analyse text
documents and associated data and metadata
related
structured data
text
documents
catalogue of
text categories
© Fraunhofer-FIT 2005
associations
Fact View
categorization
Similarity View
Category View
Tools for
analysis and
search
4
Application Example: Document Management
Document similarity
helps to create
‘fascicoli’ and find
misclassified
documents
Protocollazione
Project
selection
New text
documents
DL-based
categorization
Information about
type, AOO/UO,
‘Fascicoli’, etc.
Titolario
© Fraunhofer-FIT 2005
5
SWAPit as a Single Point of Access
From scattered information...
multi-schema databases,
distributed & data-centred access
...to integrated information
intuitive, user-centred access
text documents
Virtual
Integrated
Database
DL-based integration
DL-based
categorization
user-specific schema &
integrated access
operational databases
© Fraunhofer-FIT 2005
10
Monitoring Documents with SWAPit and DL
From information overflow...
...to information overview
 3 news in 1 minute
DL-based filter
unfiltered and
unstructured
text documents
 1 document map per day
intuitively structured
text documents
conceptually
filtered, relevant
text documents
DL-based
catalogue
builder
© Fraunhofer-FIT 2005
11
Displaying XML Documents in SWAPit
From complex, machine-readable
documents...
...to a human-oriented
presentation
data with technically
rich structural annotation
customized, task-oriented
view
metadata (selected attributes and elements)
XMLXML
XML
XML
XMLXML
XML
XML
XML
text content from
specified attributes
and elements
web ontology
ontology-context
of specified elements
© Fraunhofer-FIT 2005
12
Conclusion: Visual and Intuitive Text Mining with SWAPit
SWAPit combines views on text documents and associated data sources on
a single sreen
 Overview instead of overflow
 Improves quality of text access tasks
 Leverages knowledge sources
Flexible architecture
 Designed to integrate Semantic Web technology
 Derives additional power from integration of DL technologies
 Can be integrated easily into existing infrastructures or company portals
 Can be tailored to specific needs of different market segments
Long-standing experience in research and practical applications
 Document Management, Business Intelligence, Customer Relationship
Management, ...
 Main sectors: Insurance, Textile, Engineering, Social Science
 Technology has been extended in a joint project with Maurizio Lenzerini
(SEWASIE)
© Fraunhofer-FIT 2005
13
Grazie dell’attenzione!
© Fraunhofer-FIT 2005
14