SemanticTalk - Der Innovationsbeschleuniger

Download Report

Transcript SemanticTalk - Der Innovationsbeschleuniger

Automatically Building Concept Structures
and Displaying Concept Trails for the Use
in Brainstorming Sessions and Content
Management Systems
Chris Biemann, Karsten Böhm, Gerhard Heyer, Ronny Melz
University of Leipzig
I2CS 2004 – Guadalajara - Mexico
06-23-2004
1
Support for Creativity
Produktgattung
Acquisition of Knowledge
Branche
Gathering information from structured
and unstructered texts, databases,
document collections, web etc.
Markt
Future
Flugzeugbeteiligung
Warrant
Derivat
Aktie
Devisen
Wertpapier
Sachbeteiligung
Schiffsbeteiligung
Hedgefonds
Aktienfonds
Logistik
Autos
Transport
Software
Mischfonds
Rentenfonds
Sparbuch
Technologie
Internet
Konsum
Fonds
Sparbrief
Immobilienbeteiligung
EuroStoxx 50
Deutschland
Immobilienfonds
Lebensversicherung
Frankreich
Versicherung
Europa
Polen
Global
Telekommunikation
Emerging Markets
Medien
Asien
Südamerika
Versicherungen
Pharma
Amerika
EU-Anwärter
Öl
Nordamerika
Energie
Argentinien
Banken
Chemie
Japan
Bau
Singapur
Processing Knowledge
Generation of semantic maps and
associations in cooperative teamwork
meetings
Using Knowledge
Visualization of terms and relations.
Filters define views on semantically
relevant contents and structures.
2
Goal 1:
Computer-aided Associating
Software realizes
• Protocol function by displaying identified keywords
• Adding associations from database
• Displaying keywords reflecting semantical similarity
Desired effects:
• Users can remember the session later easily
• During the session, associations remind users of terms
they might have forgotten otherwise
• The weight and the relatedness of differrent topics in a
session becomes visible
3
Goal 2:
Semantic Map and Red Thread
Software realises:
• Calculation and visualisation of large document collections
by using important terms (keywords)
• Positioning of terms reflects semantic closeness
• Small documents can be drawn into the semantic map: red
thread functionality
Desired effects
• A fixed map gives rise to orientation in the contents of the
document collection
• Important terms can be overseen quickly
• Red thread functionality can be used for „fast reading“
4
Data Sources
• Projekt Deutscher Wortschatz:
Word list and co-occurrences
- for associations
- as a reference corpus for the semantic map
• Manual Annotation: typed (coloured) edges and nodes
- Semantic primitives
- Semantic relations
5
Calculating Associations:
Statistical Co-occurrences
• Co-occurrence: occurrence of two or more words within a
well-defined unit of information (sentence, nearest
neighbors)
• Significant Co-occurrences reflect relations between
words
• Significance measure (log-likelihood)
sig ( A, B)  x  k log x  log k !
with n  number of sentences,
ab
x .
n
• This measure defines the association degree between all
words. High degrees result in edges in the semantic map
6
Example for Co-occurrences
Significant Co-occurrences of Guadalajara:
Camarena (194), Mexico (104), Mexican (58), kidnapped (43), Zavala (40), ranch (40), Avelar (37),
abducted (35), Alvarez (33), drug (33), Camarena's (32), pilot (32), Caro (30), Enrique (29), agent
(27), Enforcement (25), Quintero (25), gynecologist (23), tortured (23), Jalisco (22), DEA (21), Drug
(21), miles (21), torture (21), Alfredo (20), Machain (18), Feb (16), bodies (16), southeast (16),
Monterrey (15), Rafael (15), found (15), Radelat (14), Paso (13), consulate (13), Administration (12),
Salazar (12), body (12), killed (12), outside (12), Vasquez (11), Verdugo (11), bullet-riddled (11),
murder (11), El (10), Humberto (10), Lopez (10), lord (10), Felix (9), Gallardo (9), Hernandez (9),
Mexico's (9), arrested (9), cartel (9), Alberto (8), City (8), March (8), Zuno (8), city (8), homicide (8),
indictment (8), kidnapping (8), Caro-Quintero (7), February (7), Tijuana (7), Zuno-Arce (7), buried (7),
marijuana (7), racketeering (7), slayings (7), 31-year-old (6), April (6), Consulate (6), Culiacan (6),
Javier (6), Machain's (6), agents (6), office (6)
Significant left Neighbours of Guadalajara:
outside (12), near (5)
Significant right Neighbours of Guadalajara:
gynecologist (27), office (8), street (8), home (6),
Haggadah (5), drug (5)
7
Calculating Semantic Maps
Requires: document collection
• Calculate co-occurrences and keywords by differential
frequency analysis: important words are much more
frequent in the document collection than in a large
reference corpus
• take the highest ranked words from the differential
frequency analysis as nodes
• Take highly significant co-occurrences to existing nodes as
further nodes
• Remove stopwords (functional words, determiners...)
• Insert edges between nodes that have a high association
degree by co-occurrence significance
8
Positioning in Semantic Maps
force-directed: nodes and edges are thrown on a plane and
then driven to equilibrium by minimizing the energy
9
Domain Adjustment
Session knowledge / Project knowledge
Enrichment of database by incorporating
task-specific knowledge and know-how.
PARTIAL
OVERLAP
Community knowledge / Domain knowledge
Generation of a semantic map by processing
domain-relevant documents and incorporating
existing ontologies.
PARTIAL
OVERLAP
Wortschatz- Database
(Very Large Corpus)
10
Visualization
Extension of Touchgraph (www.touchgraph.com):
• Force-directed model for positioning
• Label filling colours for runtime-type (keyword, associated,
red thread)
• Label edge colours for semantic primitives
• Edge colours for semantic relations
• Nodes can be displayed as lables or dots
Is-A Relation
co-hyponymy Relation
white: keyword by user
grey: association from DB
primitive: Noun
11
primitive: organisation
Zooms
• Conceptual zoom:
lexicalize nodes or
display them as dots
• Granularity: reduce
number of visible nodes
• Optical zoom: size of
window compared to
total size of the map
12
Adding nodes in Association Mode
• User keywords are added to the graph. They fade if they
get not connected for a certain time.
• Grey words are added if they are associated to at least two
user keywords
Lasst uns über Mexiko sprechen.
Die Mexikaner tragen Sombreros,
das sind Hüte für den Sonnenschutz.
Das ist ein Land in Mittelamerika.
So einen Hut hätte
ich auch gern!
13
... it knows lots of countries
14
Semantic Map Example
15
Red Thread Functionality
• Given: semantic map, additional input
• Terms from the additional input that are found in the semantic map are
coloured in red and connected in sequence of their occurrence
- red connection: the edge already existed in the semantic map
- yellow connection: the edge is new
• Long-range yellow edges visualize topic shifts
Afghanistan
Georgia
Iraq
16
SemanticTalk GUI
topic survey window
zoom rulers
local context window
17
Embedding in the system
•
•
•
•
Implementation as java servlet with tomcat webserver
Mysql-Database for Graphs and associations
Linguatec VoicePro 10 – Interface for speech recognition
Several (language recognition)-clients can be connected via
LAN
18
Interfaces
• Import/Export
- various formats for text files
- XML/RDF/RDB for maps
- PNG for maps
The results obtained with SemanticTalk can be saved, loaded
and exported to other tools for further processing
• Retrieval:
- words (nodes) with links to occurrences in the
document collection
- associations (edges) with links to occurrences in the
document collection
- explicit links, e.g. pictures for words
19
Further Processing of Net
Topology Structures
Varianten
Product model
Exchange format
(e.g. rdf)
Process model
Semantic Map
Transformation in application models
Ressource model
20
Questions?
THANK YOU!
21