Transcript Document

Concept Space Construction
Todd Littell
27SEP06
Roadmap
• Last Year:
– High quality concept space construction performed
offline.
– First version of inference engine via graph operations.
• This Year:
– High quality concept space construction performed
online.
– Second implementation of reasoning engine or graph
mining application?
• Future:
– Online semantic net construction.
– Sophisticated reasoning engine.
Semantic Net
•
Goal: Efficient, high-quality learning of a semantic network from text.
•
Semantic Net is a kind of Knowledge Representation (KR) structure that is
typically represented by a graph model.
What does a Semantic Net comprise of?
•
–
–
–
–
•
Concepts
Entities
Types
Relationships: associative, categorical, functional, structural, mechanical,
temporal, spatial…
Concept Space or Association Network is a simplified SN that captures
concepts and concept-associations.
Ref: www.jfsowa.com: “A cat is on the mat”.
Applications
• Uses of Semantic Nets:
– Knowledge Base for Reasoning and Inference: Fuzzy ER Model,
Markov Net, Bayesian Net, Causal Net, etc.
– Information Browsing: Navigation thru space, drill-up, drill-down,
drill-across.
– Domain Modeling: Communication, Information System
construction, etc.
– Query Formulation for Retrieval: Feedback, User Modeling, etc.
Related Knowledge Models
• Other kinds of knowledge models:
– Concept Graphs: see John Sowa’s site: http://www.jfsowa.com/cg/index.htm,
defines rules for assertions, composition and limited inference. Also OWL/RDF.
– Graphical Models: Belief Nets, Causal Diagrams, Signed Directed Graphs,
Lattices.
– Concept Lattices: FCA – mathematical theory for objects, attributes &
mappings.
– UML: formal modeling notation & semantics defined specifically for software
industry.
– Express-G: formal modeling notation & semantics defined for engineering.
– Moore/Mealy/Petri Nets: actionable semantics for modeling & simulation.
• Aspects: domain, expressability, ad hoc vs. well-defined semantics,
discrete vs. continuous, typing, purpose/application, underlying theory, etc.
Why Bother?
P yra m id o f K n o w le d g e
T e xt M in in g
D a ta M in in g
? ?
C om puter A utom ated R esearch (A uto -pilot)
Kn
Intelligent-driven R esearch (P rofit)
H idden R elationships (N etw ork)
S em antics (N odes )
Re
Co
nc
R aw T ext (Lit.)
Te
xt
la t
ep
ow
io n
ts
s
. K
C om puter A utom ated B usiness (A uto-pilot)
w
no
Pa
tt
In f
Intelligent-driven B usiness (P rofit)
.
P redictions (T rends )
s
e rn
o rm
o
a ti
Da
n
A ggregations (R eports)
R aw D ata (T xns )
ta
Levels of Complexity
Ordering of Graphical Models
Belief Net
(WAN + rules)
Causal Diagrams
(WAN + rules)
Fuzzy ER Model
(WAN + operators)
Weighted Associative
Network
(objects + weighted
associations)
Semantic Network/
Conceptual Graph
(objects + relations)
Associative
Network
(objects + binary associations)
Example Semantic Net
Concept Graph
uses
Data
Compression
processes
Computer
uses
runs
purchases
School
kind-of
College
Matrix
computation
OS
uses
operates
uses
library
kind-of
Scientific
software
operates
kind-of
kind-of
Math library
Associative Net (1)
A sso cia tive N e tw o rk
D a ta
C o m p u te r
C o m p re ssio n
S ch o o l
C o lle g e
M a trix
co m p u ta tio n
OS
lib ra ry
S cie n tific
so ftw a re
M a th lib ra ry
Associative Net (2)
Weighted Associative Network
0.9
Data
Compression
0.8
Computer
0.4
0.2
0.1
0.2
School
Matrix
computation
OS
0.8
0.5
library
0.5
College
Scientific
software
0.3
0.7
Math library
Associative Net (3)
Ref: Mark Steyvers, Joshua B. Tenebaum, “The Large-Scale Structure of Semantic Networks: Statistical
Analyses and a Model of Semantic Growth”, Cognitive Science 29, 2005.
Basic Algorithm
•
Calculate measure of association between two terms using a similarity measure and
output best associations.
Let T = set of terms, D = set of documents, F(t,d) = freq of t in d.
Let adj(t) := {d | f(t,d) > 0 } be term adjacency list.
Let adj(d) := {t | f(t,d) > 0 } be document adjacency list.
For each t1 in T
For each d in adj(t1)
For each t2 in adj(d)
s1 += g(f(t1,d), f(t2,d))
Calculate s := h(s1, params)
if (s >= thresh)
output (t1, t2, s).
•
Note:
–
–
–
–
Term vectors are sparse –don’t need to iterate through all dimensions.
Many similarity/distance measures exist, as well as other kinds of measures.
Characteristic of “ideal similarity metric” is tied to application.
All calculations are independent, hence easily parallelizable.
BeeSpace Variations
•
•
Only interested in calculating for representative terms with thresh1 < freq < thresh2.
In some cases, only interested in calculating for user-specified documents. Implies:
Fˆ  D R  F  D C
where D_R,D_C are selection matrices and F is term-by-doc freq matrix.
•
•
Only need to output top K similar terms.
Only need to output terms with sim > thresh3.
Co-occurrence Metrics
Set-based
Dice, Jaccard, …
Distance-based
Cosine, chord, p-norm (0<p<=Inf), …
Information-based
PWMI, MI, KL, Jensen-Shannon, …
Statistical Tests
Chi^2, z-score, …
Heuristic
many
•
•
•
Many extensions possible for incorporating weighting functions, features such as
POS tags, context, word distance, window size, etc.
Ref: Frequency Estimates for Statistical Word Measures by Terra & Clarke.
Similarity Reqs: s(x,y) >= 0; s(x,y) > s(x,z) => y is more similar to x than z; optionally
s(x,y)=s(y,x).
MI & PWMI
Contingency table
x
~x
y
f(x,y)
f(~x,y)
f(y)
~y
f(x,~y)
f(~x,~y)
f(~y)
f(x)
f(~x)
N
p( x) 
f ( x)
p( y) 
p ( x, y ) 
N
N
I ( X ;Y ) 
f ( y)

x , y { 0 ,1}
 p ( x, y ) 

p ( x , y ) log 
 p( x) p( y) 
f ( x, y )
N

p ( x, y ) 

PWMI ( X ; Y )  log 
 p( x)  p( y) 
assuming MLE
f(x,y) := |adj(x)^adj(y)| = sum_d(1 : f(x,d)>0 & f(y,d)>0)
MI & PWMI Generalizations
•
Obvious generalizations: utilize parameters p(d):
p ( x, y ) 

p (d ) p ( x, y | d )
d
•
p( x) 

p (d ) p ( x | d )
d
Non-obvious generalizations:
I ( a ; b ;... z )  H ( a )    H ( z )  H ( a , b ,  z )
 o ( a; b; z ) 
   ( s , o ( a ; b ;... z ))   ( s , e ( a ; b ;... z ))
PWMI ( a ; b ;... z )  log 
 e( a ; b; z ) 
Ref: Barry Robson, “Clinical and Pharmacogenomic Data Mining…”, Journal of
Proteome Research, 2003.
Ref: Jonathan Wren, “Extending the mutual information measure to rank inferred
literature relationships”, BMC Bioinformatics, 2004.
Results
• Look at results in spreadsheet…
“Metric Closeness”
PWMI
DICE
MI
NMI3
COS
“Make a Faster Wheel”
•
•
•
•
•
Optimized I/O
Parallelize
Better Algorithm
Better Code
Smarter Data Structures
Optimize I/O
• Only use formatted I/O for human consumption;
use binary I/O for all other cases.
• Use buffered I/O if reading/writing small chunks
at a time.
• See handout.
Parallelize
• Do the problems split naturally?
• Divide-n-conquer apply?
• Level of parallelization:
–
–
–
–
Very coarse grained: distributed agents.
Coarse grained: parallel jobs.
Medium grained: forked processes.
Fine grained: multi-threaded.
Better Algorithm
•
How to compare algorithms?
a.
b.
c.
d.
Time complexity.
Space complexity..
Parallelizable.
Time-to-develop.
Better Code
•
•
•
•
•
•
•
•
Know your language.
Factor invariant expressions outside of loops.
Pre-compute whenever possible: cache results.
Sacrifice OO-ness.
Customized data structures.
Optimized I/O.
Avoid “long” calls (e.g. network, disk, etc.).
Tune to memory hierarchy.
Smarter Data Structures
• Understand your language’s built-in collections
library.
• Roll-your-own data structures can often out
perform generic libraries. Why?
• Hybrid techniques.
To-Do/Unresolved
a. Decide what the complete set of applications
will be for this component: browsing, inference,
retrieval, etc.
b. Evaluate metrics using SME.
c. Decide what set of mined relations are
significant for the applications in (a).
d. Investigate more advanced methods and
compare trade-offs.