Transcript Slajd 1

Semantic Memory Knowledge
Acquisition Through Active Dialogues
Włodzisław Duch, Julian Szymański
The knowledge representation using relations between concepts and keywords is
relatively simple model for modeling language. However it gives the possibilities for
implementation quite interesting linguistic competences, not demonstrated by more
sophisticated knowledge models, for example frames used in CYC. One of the
presented linguistic abilities is a twenty questions game based on semantic
memory built on relational model for knowledge representation. The next linguistic
competence of the implemented system is to talk about possessed knowledge.
The presented interaction with the human user is organized in form of active
dialog. It shows how artificial system uses predefined sentence templates for
acquiring new knowledge. We present dialog scenarios for mining knowledge and
discuss the data acquired into semantic memory structures using them.
Psycholinguistic models of the
Semantic Memory
Endel Tulving „Episodic and Semantic Memory” 1972.
Semantic memory refers to the memory of meanings and understandings.
It stores concept-based, generic, context-free knowledge.
Pernament container for general knowledge (facts, ideas, words etc).
Hierarchical Model
Collins & Quillian, 1969
Semantic network
Collins & Loftus, 1975
Semantic knowledge representation
wCRK
weight Concept Relation Keyword
CDV – Concept Description Vector
forms Semantic Matrix
Cobra
is_a
is_a
is_a
is_a
is_a
is_a
is_a
is_a
is_a
is_a
is_a
is_a
is_a
has
has
has
has
has
…
animal
beast
being
brute
creature
entity
fauna
object
organism
reptile
serpent
snake
vertebrate
belly
body part
cell
chest
costa
Idea for semantic data aquisition
Play 20 questions with Avatar!
http://diodor.eti.pg.gda.pl/p420q/newAjaxInterface.aspx
Think about animal – system tries to guess it,
asking no more than 20 questions
that should be answered only with Yes or No.
Given answers narrows the subspace of the most
probable objects.
System learns from the games – obtains new
knowledge
from interaction with the human users.
Is it vertebrate? Y
Is it mammal? Y
Does it have hoof? Y
Is it equine? N
Is it bovine? N
Does it have horn? N
Does it have long neck? Y
I guess it is giraffe.
Algorithm for 20 questions game
K
I (keyword )   p(keyword  vi ) log p( keyword  vi )
i 0
, where p(keyword=vi) is fraction of concepts for which the keyword has value vi
Subspace of candidate concepts O(A) are selected according to:
O(A) = {i; d=|CDVi-ANSW| is minimal}
 0
1  dist (CDVn  ANSWn ) 

0  y

n 1
d (CDV , ANSW ) 
, where : dist ( x, y )  
len( ANSW )
 2
 x  y
N
if
if
y  NULL 


x  NULL 


,where CDVi is a vector for i-concept and ANSW is a partial vector of retrieved
answers
● we can deal with user mistakes choosing d > minimal
Automatic data acquisition
Basic semantic data obtained from aggregation of machine redable
dictionaries: Wordnet ConceptNet Sumo Ontology
– Used relations for semantic category: animal
– Semantic space truncated using word popularity rank:
IC  GR  BNC
Rank ( word ) 
max( Rank )
• IC – information content is an amount of appearances of the particular word
in WordNet descriptions
• GR - GoogleRank is an amount of web pages returned by Google search
engine for a given word
• BNC - are the words statistics taken from British National Norpus.
● Initial semantic space reduced to 94 objects and 72 features
Human interaction knowledge
aquisition
•
Data obtained from machine readable dictionaries:
–
–
–
–
•
Not complete
Not Common Sence
Sometimes specialised concepts
Some errors
Knowledge correction in the semantic space:
N
w
w0 *    ANS
N 
, where:
W0 – initial weight, initial knowledge (from dictionaries)
ANS – answer given by user
N – amount of answers
β - parametr for indicating importance initial knowledge
Active Dialogues
Dialogues with the user for obtaining new knowledge:
While system fails gues the object:
I give up. Tell me what did you think of?
The concepts used in the game corrects the semantic space
While two concepts has the same CDV
Tell me what is characteristic for <concept1/2> ?
The new keywords for specified concepts are stored in the semantic memory
While system needs more knowledge for same concept:
I dont have any particular knowledge about <concept>. Tell
me more about <concept>.
System obtains new keywords for a given concept.