Mimicking human text

Download Report

Transcript Mimicking human text

Information Inference
Mimicking human text-based reasoning
P.D. Bruza & D. Song
Information Ecology Project
Distributed Systems Technology Centre
Penguin Books U.K
Why Linus chose a penguin
Surfing the Himalayas
Introductory remarks




Information inference is a common and real phenomenom
It can be modelled by symbolic inference, but this isn’t satisfying
The inferences are often latent associations triggered by seeing a
word(s) in the context of other words- so inference is not
deductive, but about producing appropriate implicit associations
appropriate to the context
We need to look at the problem from a cognitive perspective….
Since last time….



(Philosophical) positioning of the work is clearer
Some encouraging experimental results using
information inference to derive query models
Some initial ideas about how information inference fits
into an abductive logic for text-based knowledge
discovery
Dretske’s Information Content
To a person with prior knowledge K, r being F carries the information
that s is G if and only if the conditional probability of s being G
given r is F is 1 (and less than one given K alone)
We can say that s being G is inferred (informationally) from r is F and K
T= “Why Linus chose a penguin”
K  {Linus Torvalds invented Linux, The Linux logo is a penguin,
Linus is a cartoon character in " Peanuts" }
Pr(" Linus" being " Linus Torvalds" | K)  1
Pr(" Linus" is " Linus Torvalds" | K," Linus" is with " penguin" in T)  1
So Dretske’s definition does not permit the inference
“Linus” is “Linus Torvalds”, though a human being may proceed
under this “hasty” judgment.
Dretske’s information content “sets too high a standard”
(Barwise & Seligman)
Inferential information content
(Barwise &Seligman)
To a person with prior knowledge K, r being F carries the information that
s is G, if the person could legitimately infer that s is G from r being F
together with K (but could not from K alone)
T= “Why Linus chose a penguin”
K  {Linus Torvalds invented Linux, The Linux logo is a penguin,
Linus is a cartoon character in " Peanuts" }
" Linus" being " Linus Torvalds" can' t be legitimate ly inferred from K alone
“Linus” being with “penguin” in T, together with K, carries the information that
“Linus” is “Linus Torvalds”
Barwise & Seligman (con’t)
“… by relativizing information flow to human inference, this definition
makes room for different standards in what sorts of inferences the person
is able and willing to make”
Remarks:
- Psychologistic stance taken
- Onerous from an engineering standpoint: “different standards” implies
“nonmonotonicity”. Consider,
“Linux Online: Why Linus chose a penguin” (willing)
v.s.
“Why Linus chose a penguin” (not willing)
Consequences of psychologism


Representations of information need not be propositional
Semantics is not a model-theoretic issue, but a cognitive one - the
“meanings” stored and manipulated by the system should accord
with what we have in our heads.
Gärdenfors’ cognitive model
symbolic
conceptual
associationist
(sub-conceptual)
Propositional
representation
Geometric
representation
Connectionist
representation
Conceptual spaces: the property
“red”
hue
chromaticity
red(x)
brightness
Properties and concepts are dimensional (geometric) objects.
Dimensions may be integral - the value in a dimension(s) determines the
value in another.
Barwise & Seligman’s real valued
state spaces
 red  hue : 445, chrom : 0.6, brightness : 0.7
Observation function
Gärdenfors’ cognitive model: how
we realize it
symbolic
Propositional keywords
representation
LSA
conceptual
Geometric
representation
HAL
associationist
(sub-conceptual)
Connectionist
representation
Geometric representations of
words via Hyperspace Analogue to
Language (HAL)
reagan = < administration: 0.45, bill: 0.05, budget: 0.07,
house: 0.06, president: 0.83, reagan: 0.21, trade: 0.05, veto:
0.06, … >
This example demonstrates how a word is represented as a weighted vector
Whose dimensions comprise other words.
The weights represent the strengths of association between “reagan”
and other words seen in the same context(s)
How HAL vectors are constructed
…….Kemp urges Reagan to oppose stock tax…..
Slide a window of width n across corpus
Per word: Compute weight of association with other words within window
the weight is inversely proportional to distance
HAL space: each word in the corpus represented by a multi-dimensional
vector - a weighted sum of the contexts the word appeared in.
(Burgess et al refer to it as a “high dimensional context space”, or a
“high dimensional semantic space”)
Remarks about HAL


A HAL space is easy to construct
Cognitive compatibility with human information processing
–
–

“word representations learned by HAL account for a variety of
semantic phenomena” (Burgess et al)
Therefore a good candidate for represented “meanings” in accord
with our psychologistic stance
A HAL space is a real-valued state space, thus opening the door
to driving information inference according to Barwise &
Seligman’s definition
–
A HAL vector represents a word’s “state” in the context of the text
corpus it was derived from
Differences with Burgess et al.




We (often) normalize the weights
Pre- and post- vectors are added into a single vector
HAL vectors derived from small text corpora (e.g.,
Reuters-21758) seem to be OK
HAL vectors are “summed” representations- similar in
spirit to “prototypical concepts” (which are averaged
representations
Reagan traces
President Reagan was ignorant about much of the Iran arms scandal
Reagan says U.S. to offer missile treaty
REAGAN SEEKS MORE AID FOR CENTRAL AMERICA
Kemp urges Reagan to oppose stock tax
Prototypical concepts
*
*
*
*
*
*
Prototypical “Reagan” = average of
vectors from traces
president: 3.23,
administration: 1.82,
trade: 0.40,
budget: 0.37,
veto: 0.34,
bill: 0.31,
congress: 0.31,
tax: 0.29,
:
:
Concept combination: “Pink
Elephant”
Elephant = <
,
,
……
>
Heuristic concept combination:
“Star wars”
Observation: “star” dominates “wars”
star = <trek: 0.2, episode: 0.05, soviet: 0.3, bush: 0.4, missile: 0.25>
wars = <soviet: 0.1, missile:0.2, iran: 0.33, iraq: 0.28, gulf: 0.4>
starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65,
iran: 0.2, iraq: 0.18, gulf: 0.25>
How to weight dimensions appropriately according to context?
Weights are affected by how one concept appears in the light of another concept:
Intersecting dimensions are emphasized, weights are adjusted according to degree of
dominance. (NB moving prototypical concepts in the HAL space is a cleaner way of
dealing with context)
Theoretical background: Information inference via HAL-based
information flow computations
Barwise&Seligman: state-based “information flow”
ij
on, live  light iff s(on)  s(live)  s(light)
HAL-based “information flow”
i1,  , i n  j iff degree(c i  c j )  
symbolic
conceptual
reagan, iran  scandal

Degree of inclusion (flow)
computation
degree( ci  c j ) 
 w cipl
pl(QP (ci )QP(c j ))
 w cipk
p kQP (ci )
source
target
Consider the “quality properties” above mean weight in the source concept.
(Intuition: how much of the salient aspects of the source are contained in the
target)
Compute the ratio of intersecting dimensions between source and target
concept to the dimensions in the source concept
Visualizing degree of inclusion
between HAL vectors
A
.
F
.
K
.
.
Q
source
A
B
C
D
F
G
K
L
M
target
Many of the above avg.
“quality properties” of the
source concept are
present in the target, so
the degree of inclusion will
be high
Information Inference in practice:
deriving query models


Construct HAL vectors for all vocabulary terms from the document
collection
Given a query such as “space program”, compute the information
flows from it and use these to expand the query, e.g.
space  program - nasa
Query expansion term derived via information flow computation
(We used the top 80 information flows for expansion without feedback, 65 with feedback)
The experiments



Associated Press 88/89 collections
TREC topics 1 – 50, 100-150, 151-200 (titles only).
Models for comparison: Baseline, Composition,
Relevance Model, Markov chain model
Baseline Model



BM-25 term weighting (terms were stemmed)
Replication of Lafferty & Zhai’s baseline (SIGIR 2001)
Dot product matching function
Composition model

Combine the HAL vectors of individual query terms by
recursively applying the concept combination heuristic;
query terms ranked according to idf (dominance
ranking)
starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65,
iran: 0.2, iraq: 0.18, gulf: 0.25>
Results
Baseline
Model
Composition
Model
Info flow
Model
AvgPr
0.182
0.197
(+8%)
0.247
(+35%)
InitPr
0.476
0.520
(+10%)
0.544
(+14%)
Recall
1667/3301
1996/3301
(+15%)
2269/3301
(+35%)
The effect of information inference
26% of the 35% improvement in precision of the HAL-based information
flow model is due to information inference
For example, the query “space program”. The information flow model infers
query expansion terms such as “Reagan”, “satellites”,”scientists”,
“pentagon”, “mars”, “moon”.
These are real inferences with respect “space program”, as these terms do
not appear as dimensions in HAL vectors of the concept combination:
spaceprogram
Comparison with probabilistic
query language models

MC: Markov chain model (Lafferty & Zhai, SIGIR 2001)
1-50
AP89
MC
IM
MCwP
IMwP
0.201
0.247
0.232
0.258
Scores are average precision
Comparison with probabilistic
query language models (con’t)

RM: Relevance model (Lavrenko & Croft, SIGIR 2001)
IM
IMwP
RM
101-150
AP
0.265
0.301
0.261
151-200
AP
0.298
0.344
0.319
Scores are average precision
Text-based scientific discovery
B1
Blood viscosity
Fish Oil
A
C
Raynaud
B2
Platelet Aggregation
B3
Vascular Reactivity
“.., he made the connection between these literatures and formulated the hypothesis that
fish oil may be used for treating Raynaud’s disease..”
Weeber et al “Using Concepts in Literature-Based Discovery JASIST 52(7):548-557
Logic of Abduction (Gabbay &
Woods)
Abductive logic
Logic of discovery
HAL-based info flow
Logic of
justification
?
Hypothesis testing
?
Raw material for abduction?
Information flows from “Raynaud”
Raynaud: 1.0
myocardial: 0.56
coronary: 0.54
renal: 0.52
ventricular: 0.52
.
.
.
oil: 0.23
.
fish: 0.20
.
.
.
.
Raynaud
Some promise, but lack of representation of
integral dimensions a problem
Index expressions
“Beneficial effects of fish oil on blood viscosity”
beneficial
effects
of
on
fish
blood
oil
viscosity
Power index expressions for
representing integral dimensions
eff of fish oil
fish
oil
eff on blood viscosity
effects
blood
viscosity
Information flows are single terms, power index expressions determine
how they may be combined into higher order syntactic structures
Initial results from using
information flow computations as a
logic of discovery
27
27
27
27
27
27
27
27
27
26
25
23
23
4
ventricular (0.52) infarction (0.46)
thromboplastin (0.17)
pulmonary (0.51) arteries (0.25)
placental (0.19) protein (0.42)
monoamine (0.17) oxidase (0.18)
lupus (0.37) nephritis (0.17)
instruments (0.17)
coagulant (0.21)
blood (0.63) coagulation (0.29)
umbilical (0.24) vein (0.32)
fish (0.20)
viscosity (0.21)
cigarette (0.26) smokers (0.22)
fish (0.20) oil (0.23)
Summary



(Barwise & Seligman) and Gärdenfors have very stance wrt “human
stance” (Gabbay and Woods also)… psychologism is alive….
An integration of a primitive approximation of a conceptual space with an
information inference mechanism driven by information flow computations
An initial attempt towards realizing Gärdenfors’ conceptual spaces
–
–


A HAL space is only a primitive approximation
We are looking at Voronoi tessellations
A tiny contribution to Barwise & Seligman’s call for a “distinctively different
model of human reasoning”
(We are looking beyond IR)