people.cs.umass.edu

Download Report

Transcript people.cs.umass.edu

Unified Models of
Information Extraction and Data Mining
with Application to Social Network Analysis
Andrew McCallum
Information Extraction and Synthesis Laboratory
Computer Science Department
University of Massachusetts Amherst
Joint work with David Jensen
Site Visit
March 2005
Qui ckTi me™ and a
TIFF (Uncompressed) decompr essor
are needed to see this pictur e.
Intelligence Technology Innovation Center
ITIC
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
Goal:
Mine actionable knowledge
from unstructured text.
Extracting Job Openings from the Web
foodscience.com-Job2
JobTitle: Ice Cream Guru
Employer: foodscience.com
JobCategory: Travel/Hospitality
JobFunction: Food Services
JobLocation: Upper Midwest
Contact Phone: 800-488-2611
DateExtracted: January 8, 2001
Source: www.foodscience.com/jobs_midwest.htm
OtherCompanyJobs: foodscience.com-Job1
A Portal for Job Openings
Category = High Tech
Keyword = Java
Location = U.S.
Job Openings:
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Data Mining the Extracted Job Information
IE from
Chinese Documents regarding Weather
Department of Terrestrial System, Chinese Academy of Sciences
200k+ documents
several millennia old
- Qing Dynasty Archives
- memos
- newspaper articles
- diaries
IE from Cargo Container Ship Manifests
Cargo Tracking Div.
US Navy
IE from Research Papers
[McCallum et al ‘99]
IE from Research Papers
Mining Research Papers
[Rosen-Zvi, Griffiths, Steyvers,
Smyth, 2004]
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + clustering + association
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
Microsoft Corporation
CEO
Bill Gates
Microsoft
Gates
Microsoft
Bill Veghte
Microsoft
VP
Richard Stallman
founder
Free Software Foundation
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + association + clustering
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
Microsoft Corporation
CEO
Bill Gates
Microsoft
Gates
Microsoft
Bill Veghte
Microsoft
VP
Richard Stallman
founder
Free Software Foundation
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + association + clustering
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
Microsoft Corporation
CEO
Bill Gates
Microsoft
Gates
Microsoft
Bill Veghte
Microsoft
VP
Richard Stallman
founder
Free Software Foundation
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + association + clustering
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
* Microsoft Corporation
CEO
Bill Gates
* Microsoft
Gates
* Microsoft
Bill Veghte
* Microsoft
VP
Richard Stallman
founder
Free Software Foundation
Larger Context
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Prediction
Outlier detection
Decision support
Outline
• Examples of IE and Data Mining.
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
Hidden Markov Models
HMMs are the standard sequence modeling tool in
genomics, music, speech, NLP, …
Graphical model
Finite state model
S t-1
St
S t+1
...
...
observations
...
Generates:
State
sequence
Observation
sequence
transitions
O
Ot
t -1
O t +1

|o|
o1
o2
o3
o4
o5
o6
o7
o8
 
P( s , o )   P( st | st 1 ) P(ot | st )
S={s1,s2,…}
Start state probabilities: P(st )
Transition probabilities: P(st|st-1 )
t 1
Parameters: for all states
Usually a multinomial over
Observation (emission) probabilities: P(ot|st ) atomic, fixed alphabet
Training:
Maximize probability of training observations (w/ prior)
IE with Hidden Markov Models
Given a sequence of observations:
Yesterday Rich Caruana spoke this example sentence.
and a trained HMM:
person name
location name
background
Find the most likely state sequence: (Viterbi)
Yesterday Rich Caruana spoke this example sentence.
Any words said to be generated by the designated “person name”
state extract as a person name:
Person name: Rich Caruana
We want More than an Atomic View of Words
Would like richer representation of text:
many arbitrary, overlapping features of the words.
S t-1
identity of word
ends in “-ski”
is capitalized
is part of a noun phrase
is “Wisniewski”
is in a list of city names
is under node X in WordNet
part of
ends in
is in bold font
noun phrase
“-ski”
is indented
O t 1
is in hyperlink anchor
last person name was female
next two words are “and Associates”
St
S t+1
…
…
Ot
O t +1
Problems with Richer Representation
and a Joint Model
These arbitrary features are not independent.
– Multiple levels of granularity (chars, words, phrases)
– Multiple dependent modalities (words, formatting, layout)
– Past & future
Two choices:
Model the dependencies.
Each state would have its own
Bayes Net. But we are already
starved for training data!
Ignore the dependencies.
This causes “over-counting” of
evidence (ala naïve Bayes).
Big problem when combining
evidence, as in Viterbi!
S t-1
St
S t+1
S t-1
St
S t+1
O
Ot
O t +1
O
Ot
O t +1
t -1
t -1
Conditional Sequence Models
• We prefer a model that is trained to maximize a
conditional probability rather than joint probability:
P(s|o) instead of P(s,o):
– Can examine features, but not responsible for generating
them.
– Don’t have to explicitly model their dependencies.
– Don’t “waste modeling effort” trying to generate what we are
given at test time anyway.
(Linear Chain) Conditional Random Fields
[Lafferty, McCallum, Pereira 2001]
Undirected graphical model,
trained to maximize conditional probability of outputs given inputs
Finite state model
Graphical model
OTHER
y t-1
PERSON
yt
OTHER
y t+1
ORG
y t+2
TITLE …
y t+3
output seq
FSM states
...
observations
x
x
t -1
said
t
Veght
1 T
p(y | x) 
y (y t , y t1 )xy (x t , y t )

Z(x) t1
x
a
x
t +1
t +2
Microsoft
where
x
t +3
VP …
input seq


()  exp k f k ()
 k

Fast-growing, wide-spread interest, many positive experimental results.
Noun phrase, Named entity [HLT’03], [CoNLL’03]
Protein structure prediction [ICML’04]
IE from Bioinformatics text [Bioinformatics ‘04],…
Asian word segmentation [COLING’04], [ACL’04]
IE from
 Research papers [HTL’04]
Object classification in images [CVPR ‘04]
Table Extraction from Government Reports
Cash receipts from marketings of milk during 1995 at $19.9 billion dollars, was
slightly below 1994. Producer returns averaged $12.93 per hundredweight,
$0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds,
1 percent above 1994. Marketings include whole milk sold to plants and dealers
as well as milk sold directly to consumers.
An estimated 1.56 billion pounds of milk were used on farms where produced,
8 percent less than 1994. Calves were fed 78 percent of this milk with the
remainder consumed in producer households.
Milk Cows and Production of Milk and Milkfat:
United States, 1993-95
-------------------------------------------------------------------------------:
:
Production of Milk and Milkfat 2/
:
Number
:------------------------------------------------------Year
:
of
:
Per Milk Cow
:
Percentage
:
Total
:Milk Cows 1/:-------------------: of Fat in All :-----------------:
: Milk : Milkfat : Milk Produced : Milk : Milkfat
-------------------------------------------------------------------------------: 1,000 Head
--- Pounds --Percent
Million Pounds
:
1993
:
9,589
15,704
575
3.66
150,582 5,514.4
1994
:
9,500
16,175
592
3.66
153,664 5,623.7
1995
:
9,461
16,451
602
3.66
155,644 5,694.3
-------------------------------------------------------------------------------1/ Average number during year, excluding heifers not yet fresh.
2/ Excludes milk sucked by calves.
Table Extraction from Government Reports
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
100+ documents from www.fedstats.gov
Labels:
CRF
of milk during 1995 at $19.9 billion dollars, was
eturns averaged $12.93 per hundredweight,
1994. Marketings totaled 154 billion pounds,
ngs include whole milk sold to plants and dealers
consumers.
ds of milk were used on farms where produced,
es were fed 78 percent of this milk with the
cer households.
1993-95
------------------------------------
n of Milk and Milkfat 2/
-------------------------------------: Percentage :
Non-Table
Table Title
Table Header
Table Data Row
Table Section Data Row
Table Footnote
... (12 in all)
Features:
uction of Milk and Milkfat:
w
•
•
•
•
•
•
•
Total
----: of Fat in All :-----------------Milk Produced : Milk : Milkfat
------------------------------------
•
•
•
•
•
•
•
Percentage of digit chars
Percentage of alpha chars
Indented
Contains 5+ consecutive spaces
Whitespace in this line aligns with prev.
...
Conjunctions of all previous features,
time offset: {0,0}, {-1,0}, {0,1}, {1,2}.
Table Extraction Experimental Results
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
Line labels,
percent correct
HMM
65 %
Stateless
MaxEnt
85 %
CRF
95 %
Table segments,
F1
64 %
92 %
IE from Research Papers
[McCallum et al ‘99]
IE from Research Papers
Field-level F1
Hidden Markov Models (HMMs)
75.6
[Seymore, McCallum, Rosenfeld, 1999]
Support Vector Machines (SVMs)
89.7
 error
40%
[Han, Giles, et al, 2003]
Conditional Random Fields (CRFs)
[Peng, McCallum, 2004]
93.9
Named Entity Recognition
CRICKET MILLNS SIGNS FOR BOLAND
CAPE TOWN 1996-08-22
South African provincial side
Boland said on Thursday they
had signed Leicestershire fast
bowler David Millns on a one
year contract.
Millns, who toured Australia with
England A in 1992, replaces
former England all-rounder
Phillip DeFreitas as Boland's
overseas professional.
Labels:
PER
ORG
LOC
MISC
Examples:
Yayuk Basuki
Innocent Butare
3M
KDP
Cleveland
Cleveland
Nirmal Hriday
The Oval
Java
Basque
1,000 Lakes Rally
Named Entity Extraction Results
[McCallum & Li, 2003, CoNLL]
Method
F1
HMMs BBN's Identifinder
73%
CRFs w/out Feature Induction 83%
CRFs with Feature Induction
based on LikelihoodGain
90%
Outline
a
• Examples of IE and Data Mining.
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
Larger Context
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Prediction
Outlier detection
Decision support
Knowledge
Discovery
IE
Segment
Classify
Associate
Cluster
Problem:
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Combined in serial juxtaposition,
IE and DM are unaware of each others’
weaknesses and opportunities.
1) DM begins from a populated DB, unaware of
where the data came from, or its inherent
uncertainties.
2) IE is unaware of emerging patterns and
regularities in the DB.
The accuracy of both suffers, and significant mining
of complex text sources is beyond reach.
Solution:
Uncertainty Info
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Emerging Patterns
Prediction
Outlier detection
Decision support
Solution:
Unified Model
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Probabilistic
Model
Discover patterns
- entity types
- links / relations
- events
Discriminatively-trained undirected graphical models
Document
collection
Conditional Random Fields
[Lafferty, McCallum, Pereira]
Conditional PRMs
[Koller…], [Jensen…],
[Geetor…], [Domingos…]
Complex Inference and Learning
Just what we researchers like to sink our teeth into!
Actionable
knowledge
Prediction
Outlier detection
Decision support
Larger-scale Joint Inference for IE
• What model structures will capture salient dependencies?
• Will joint inference improve accuracy?
• How do to inference in these large graphical models?
• How to efficiently train these models,
which are built from multiple large components?
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
But errors cascade--must be perfect at every stage to do well.
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
Joint prediction of part-of-speech and noun-phrase in newswire,
matching accuracy with only 50% of the training data.
Inference:
Tree reparameterization BP
[Wainwright et al, 2002]
2. Jointly labeling distant mentions
Skip-chain CRFs [Sutton, McCallum, SRL 2004]
…
Senator Joe Green said today
…
.
Green ran
for …
Dependency among similar, distant mentions ignored.
2. Jointly labeling distant mentions
Skip-chain CRFs [Sutton, McCallum, SRL 2004]
…
Senator Joe Green said today
…
.
Green ran
14% reduction in error on most repeated field
in email seminar announcements.
Inference:
Tree reparameterization BP
[Wainwright et al, 2002]
for …
3. Joint co-reference among all pairs
Affinity Matrix CRF
“Entity resolution”
“Object correspondence”
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
99
Y/N
Y/N
11
~25% reduction in error on
co-reference of
proper nouns in newswire.
. . . she . . .
Inference:
Correlational clustering
graph partitioning
[Bansal, Blum, Chawla, 2002]
[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]
Coreference Resolution
AKA "record linkage", "database record deduplication",
"entity resolution", "object correspondence", "identity uncertainty"
Output
Input
News article,
with named-entity "mentions" tagged
Number of entities, N = 3
Today Secretary of State Colin Powell
met with . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . he . . . . . .
. . . . . . . . . . . . . Condoleezza Rice . . . . .
. . . . Mr Powell . . . . . . . . . .she . . . . . . .
. . . . . . . . . . . . . . Powell . . . . . . . . . . . .
. . . President Bush . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . Rice . . . . . . . . . .
. . . . . . Bush . . . . . . . . . . . . . . . . . . . . .
........... . . . . . . . . . . . . . . . .
#1
Secretary of State Colin Powell
he
Mr. Powell
Powell
#2
Condoleezza Rice
she
Rice
.........................
#3
President Bush
Bush
Inside the Traditional Solution
Pair-wise Affinity Metric
Mention (3)
. . . Mr Powell . . .
N
Y
Y
Y
Y
N
Y
Y
N
Y
N
N
Y
Y
Mention (4)
Y/N?
. . . Powell . . .
Two words in common
One word in common
"Normalized" mentions are string identical
Capitalized word in common
> 50% character tri-gram overlap
< 25% character tri-gram overlap
In same sentence
Within two sentences
Further than 3 sentences apart
"Hobbs Distance" < 3
Number of entities in between two mentions = 0
Number of entities in between two mentions > 4
Font matches
Default
OVERALL SCORE =
29
13
39
17
19
-34
9
8
-1
11
12
-3
1
-19
98
> threshold=0
The Problem
. . . Mr Powell . . .
affinity = 98
Y
affinity = 104
Pair-wise merging
decisions are being
made independently
from each other
. . . Powell . . .
N
Y
affinity = 11
. . . she . . .
Affinity measures are noisy and imperfect.
They should be made
in relational dependence
with each other.
A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003, ICML]
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
30
Y/N
Y/N
Make pair-wise merging
decisions in dependent
relation to each other by
- calculating a joint prob.
- including all edge weights
- adding dependence on
consistent triangles.
11
. . . she . . .


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
30
Y/N
Y/N
11
Make pair-wise merging
decisions in dependent
relation to each other by
- calculating a joint prob.
- including all edge weights
- adding dependence on
consistent triangles.

. . . she . . .


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
(45)
. . . Powell . . .
N
(30)
N
Y
(11)
. . . she . . .
4


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
(45)
. . . Powell . . .
Y
(30)
N
Y
(11)
. . . she . . .
infinity


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
(45)
. . . Powell . . .
Y
(30)
N
N
(11)
. . . she . . .
64


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

Inference in these MRFs = Graph Partitioning
[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
45
. . . Powell . . .
106
30
134
11
. . . Condoleezza Rice . . .
. . . she . . .
10
log(P(y | x )   l f l (x i, x j , y ij ) 
i, j
l
w
i, j w/in
paritions
ij

w
i, j across
paritions
ij
Inference in these MRFs = Graph Partitioning
[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
45
. . . Powell . . .
106
30
134
11
. . . Condoleezza Rice . . .
. . . she . . .
10
log(P(y | x )   l f l (x i, x j , y ij ) 
i, j
l
w
i, j w/in
paritions
ij

w
i, j across
paritions
ij
= 22
Inference in these MRFs = Graph Partitioning
[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
45
. . . Powell . . .
106
30
134
11
. . . Condoleezza Rice . . .
. . . she . . .
10
log(P(y | x )   l f l (x i, x j , y ij ) 
i, j
l
w
i, j w/in
paritions
ij

w'
i, j across
paritions
ij
= 314
Co-reference Experimental Results
[McCallum & Wellner, 2003]
Proper noun co-reference
DARPA ACE broadcast news transcripts, 117 stories
Single-link threshold
Best prev match [Morton]
MRFs
Partition F1
16 %
83 %
88 %
error=30%
Pair F1
18 %
89 %
92 %
error=28%
DARPA MUC-6 newswire article corpus, 30 stories
Single-link threshold
Best prev match [Morton]
MRFs
Partition F1
11%
70 %
74 %
error=13%
Pair F1
7%
76 %
80 %
error=17%
Joint co-reference among all pairs
Affinity Matrix CRF
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
99
Y/N
Y/N
11
~25% reduction in error on
co-reference of
proper nouns in newswire.
. . . she . . .
Inference:
Correlational clustering
graph partitioning
[Bansal, Blum, Chawla, 2002]
[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]
4. Joint segmentation and co-reference
Extraction from and matching of
research paper citations.
o
s
Laurel, B. Interface Agents:
Metaphors with Character, in
The Art of Human-Computer Interface
Design, B. Laurel (ed), AddisonWesley, 1990.
World
Knowledge
c
y
Brenda Laurel. Interface Agents:
Metaphors with Character, in
Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
p
Co-reference
decisions
y
Database
field values
c
s
c
y
Citation attributes
s
o
Segmentation
o
35% reduction in co-reference error by using segmentation uncertainty.
6-14% reduction in segmentation error by using co-reference.
Inference:
Variant of Iterated Conditional Modes
[Besag, 1986]
[Wellner, McCallum, Peng, Hay, UAI 2004]
see also [Marthi, Milch, Russell, 2003]
4. Joint segmentation and co-reference
Joint IE and Coreference from Research Paper Citations
Textual citation mentions
(noisy, with duplicates)
Paper database, with fields,
clean, duplicates collapsed
AUTHORS
TITLE
Cowell, Dawid…
Probab…
Montemerlo, Thrun…FastSLAM…
Kjaerulff
Approxi…
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
VENUE
Springer
AAAI…
Technic…
Citation Segmentation and Coreference
Laurel, B.
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
Citation Segmentation and Coreference
Laurel, B.
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1)
Segment citation fields
Citation Segmentation and Coreference
Laurel, B.
Y
?
N
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1)
Segment citation fields
2)
Resolve coreferent citations
Citation Segmentation and Coreference
Laurel, B.
Y
?
N
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
AUTHOR =
TITLE =
PAGES =
BOOKTITLE =
EDITOR =
PUBLISHER =
YEAR =
Brenda Laurel
Interface Agents: Metaphors with Character
355-366
The Art of Human-Computer Interface Design
T. Smith
Addison-Wesley
1990
1)
Segment citation fields
2)
Resolve coreferent citations
3)
Form canonical database record
Resolving conflicts
Citation Segmentation and Coreference
Laurel, B.
Y
?
N
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
AUTHOR =
TITLE =
PAGES =
BOOKTITLE =
EDITOR =
PUBLISHER =
YEAR =
Perform
Brenda Laurel
Interface Agents: Metaphors with Character
355-366
The Art of Human-Computer Interface Design
T. Smith
Addison-Wesley
1990
1)
Segment citation fields
2)
Resolve coreferent citations
3)
Form canonical database record
jointly.
IE + Coreference Model
AUT AUT YR TITL TITL
CRF Segmentation
s
Observed citation
x
J Besag 1986 On the…
IE + Coreference Model
AUTHOR = “J Besag”
YEAR =
“1986”
TITLE = “On the…”
Citation mention attributes
c
CRF Segmentation
s
Observed citation
x
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Structure for each
citation mention
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Binary coreference variables
for each pair of mentions
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Binary coreference variables
for each pair of mentions
y
n
n
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
AUTHOR = “P Smyth”
YEAR =
“2001”
TITLE = “Data Mining…”
...
Research paper entity
attribute nodes
y
n
n
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
Research paper entity
attribute node
,
P Data mining…
y
y
y
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
y
n
n
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
Such a highly connected graph makes
exact inference intractable…
IE + Coreference Model
Smyth
,
P Data mining…
Exact inference on
these linear-chain regions
From each chain
pass an N-best List
into coreference
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Approximate inference
by graph partitioning…
Make scale to 1M
citations with Canopies
…integrating out
uncertainty
in samples
of extraction
Smyth . 2001 Data Mining…
[McCallum, Nigam, Ungar 2000]
J Besag 1986 On the…
Inference:
Sample = N-best List from CRF Segmentation
Name
Title
Book Title
Year
Laurel, B. Interface
Agents: Metaphors
with Character
The Art of Human
Computer Interface
Design
1990
Laurel, B.
Interface Agents:
Metaphors with
Character The Art
of Human Computer
Interface Design
1990
Agents: Metaphors
with Character
The Art of Human
Computer Interface
Design
Laurel, B. Interface
When calculating
similarity with another
citation, have more
opportunity to find
correct, matching fields.
Name
Title
…
Laurel, B
Interface Agents:
Metaphors with
Character The
…
Laurel, B.
Interface Agents:
Metaphors with
Character
…
Laurel, B. Interface
Agents
Metaphors with
Character
…
1990
y?n
IE + Coreference Model
Smyth
,
P Data mining…
Exact (exhaustive) inference
over entity attributes
y
n
n
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Revisit exact inference
on IE linear chain,
now conditioned on
entity attributes
y
n
n
Smyth . 2001 Data Mining…
J Besag 1986 On the…
Parameter Estimation
Separately for different regions
IE Linear-chain
Exact MAP
Coref graph edge weights
MAP on individual edges
Entity attribute potentials
MAP, pseudo-likelihood
y
n
n
In all cases:
Climb MAP gradient with
quasi-Newton method
4. Joint segmentation and co-reference
[Wellner, McCallum,
Peng, Hay, UAI 2004]
o
Extraction from and matching of
research paper citations.
s
Laurel, B. Interface Agents:
Metaphors with Character, in
The Art of Human-Computer Interface
Design, B. Laurel (ed), AddisonWesley, 1990.
World
Knowledge
c
y
Brenda Laurel. Interface Agents:
Metaphors with Character, in
Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
p
Co-reference
decisions
y
Database
field values
c
s
c
y
s
o
Citation attributes
Segmentation
o
35% reduction in co-reference error by using segmentation uncertainty.
6-14% reduction in segmentation error by using co-reference.
Inference:
Variant of Iterated Conditional Modes
[Besag, 1986]
Outline
• Examples of IE and Data Mining.
a
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
a
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
Piecewise Training
Efficiently Learning Large
Probabilistic Relational Models
Charles Sutton and Andrew McCallum
Information Extraction and Synthesis Laboratory
Computer Science Department
University of Massachusetts Amherst
Qui ckTi me™ and a
TIFF (Uncompressed) decompr essor
are needed to see this pictur e.
Intelligence Technology Innovation Center
ITIC
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
Parameter Estimation
Separately for different regions
IE Linear-chain
Exact MAP
Coref graph edge weights
MAP on individual edges
Entity attribute potentials
MAP, pseudo-likelihood
y
n
n
In all cases:
Climb MAP gradient with
quasi-Newton method
Piecewise Training
Piecewise Training with NOTA
Experimental Results
Named entity tagging (CoNLL-2003)
Training set = 15k newswire sentences
9 labels
Test F1
Training time
CRF
89.87
9 hours
MEMM
88.90
1 hour
CRF-PT
90.50
5.3 hours
stat. sig. improvement at
p = 0.001
Experimental Results 2
Part-of-speech tagging (Penn Treebank, small subset)
Training set = 1154 newswire sentences
45 labels
Test F1
Training time
CRF
88.1
14 hours
MEMM
88.1
2 hours
CRF-PT
88.8
2.5 hours
stat. sig. improvement at
p = 0.001
“Parameter Independence Diagrams”
Graphical models = formalism for representing
independence assumptions among variables.
Here we represent
independence assumptions among parameters (in factor graph)
Piecewise Training Research Questions
• How to select the boundaries of “pieces”?
• What choices of limited interaction are best?
• How to sample sparse subsets of NOTA instances?
• Application to simpler models (classifiers)
• Application to more complex models (parsing)
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
Emailed seminar ann’mt entities
Email English words
GRAND CHALLENGES FOR MACHINE
LEARNING
60k words training.
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from
obscurity in the 1970s into a vibrant
and popular discipline in artificial
intelligence during the 1980s and 1990s.
As a result of its success and growth,
machine learning is evolving into a
collection of related disciplines:
inductive concept acquisition, analytic
learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist
learning, hybrid systems, and so on.
Too little labeled training data.
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
Train on “related” task with more data.
Newswire named entities
Newswire English words
200k words training.
CRICKET MILLNS SIGNS FOR BOLAND
CAPE TOWN 1996-08-22
South African provincial side Boland said
on Thursday they had signed
Leicestershire fast bowler David Millns on
a one year contract.
Millns, who toured Australia with England
A in 1992, replaces former England allrounder Phillip DeFreitas as Boland's
overseas professional.
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
At test time, label email with newswire NEs...
Newswire named entities
Email English words
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
…then use these labels as features for final task
Emailed seminar ann’mt entities
Newswire named entities
Email English words
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
Piecewise training of a joint model.
Seminar Announcement entities
Newswire named entities
English words
CRF Transfer Experimental Results
[Sutton, McCallum, 2005]
Seminar Announcements Dataset [Freitag 1998]
CRF
stime
etime
location speaker overall
No transfer
99.1
97.3
81.0
73.7
87.8
Cascaded transfer
99.2
96.0
84.3
74.2
88.4
Joint transfer
99.1
96.0
85.3
76.3
89.2
New “best published”
accuracy on common
dataset
Social Network Analysis
from Textual Message Data
Andrew McCallum, Andres Corrada, Xuerui Wang
Information Extraction and Synthesis Laboratory
Computer Science Department
University of Massachusetts Amherst
Qui ckTi me™ and a
TIFF (Uncompressed) decompr essor
are needed to see this pictur e.
Intelligence Technology Innovation Center
ITIC
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
Managing and Understanding
Connections of People in our Email World
Workplace effectiveness ~ Ability to leverage network of acquaintances
But filling Contacts DB by hand is tedious, and incomplete.
Contacts DB
Email Inbox
QuickTi me™ and a
T IFF (Uncompressed) decompressor
are needed to see thi s pi cture.
Automatically
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
WWW
Qu i c k Ti m e ™ a n d a
TIF F (Un c o m p re s s e d ) d e c o m p re s s o r
a re n e e d e d to s e e th i s p i c tu r e .
System Overview
WWW
Email
CRF
Qu i c k Ti m e ™ a n d a
TIF F (Un c o m p re s s e d ) d e c o m p re s s o r
a re n e e d e d to s e e th i s p i c tu r e .
Keyword
Extraction
Person
Name
Extraction
Name
Coreference
Homepage
Retrieval
names
Contact
Info and
Person
Name
Extraction
Social
Network
Analysis
An Example
To: “Andrew McCallum” [email protected]
Subject ...
Search for
new people
First
Name:
Andrew
Middle
Name:
Kachites
Last
Name:
McCallum
JobTitle:
Associate Professor
Company:
University of Massachusetts
Street
Address:
140 Governor’s Dr.
City:
Amherst
State:
MA
Zip:
01003
Company
Phone:
(413) 545-1323
Links:
Fernando Pereira, Sam
Roweis,…
Key
Words:
Information extraction,
social network,…
Example keywords
extracted
Person
Keywords
William Cohen
Logic programming
Text categorization
Data integration
Rule learning
Daphne Koller
Bayesian networks
Relational models
Probabilistic models
Hidden variables
Deborah
McGuiness
Semantic web
Description logics
Knowledge representation
Ontologies
Tom Mitchell
1.
2.
Machine learning
Cognitive states
Learning apprentice
Artificial intelligence
Summary of Results
Contact info and name extraction performance (25 fields)
CRF
Token
Acc
Field
Prec
Field
Recall
Field
F1
94.50
85.73
76.33
80.76
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Expert Finding:
When solving some task, find friends-of-friends with relevant expertise.
Avoid “stove-piping” in large org’s by automatically suggesting collaborators.
Given a task, automatically suggest the right team for the job. (Hiring aid!)
Social Network Analysis:
Understand the social structure of your organization.
Suggest structural changes for improved efficiency.
Clustering words into topics with
Latent Dirichlet Allocation
[Blei, Ng, Jordan 2003]
Example topics
induced from a large collection of text
JOB
SCIENCE
BALL
FIELD
STORY
MIND
DISEASE
WATER
WORK
STUDY
GAME
MAGNETIC
STORIES
WORLD
BACTERIA
FISH
JOBS
SCIENTISTS
TEAM
MAGNET
TELL
DREAM
DISEASES
SEA
CAREER
SCIENTIFIC FOOTBALL
WIRE
CHARACTER
DREAMS
GERMS
SWIM
KNOWLEDGE
BASEBALL EXPERIENCE
NEEDLE
THOUGHT CHARACTERS
FEVER
SWIMMING
WORK
PLAYERS EMPLOYMENT
CURRENT
AUTHOR
IMAGINATION
CAUSE
POOL
OPPORTUNITIES
RESEARCH
PLAY
COIL
READ
MOMENT
CAUSED
LIKE
WORKING
CHEMISTRY
FIELD
POLES
TOLD
THOUGHTS
SPREAD
SHELL
TRAINING
TECHNOLOGY PLAYER
IRON
SETTING
OWN
VIRUSES
SHARK
SKILLS
MANY
BASKETBALL
COMPASS
TALES
REAL
INFECTION
TANK
CAREERS
MATHEMATICS COACH
LINES
PLOT
LIFE
VIRUS
SHELLS
POSITIONS
BIOLOGY
PLAYED
CORE
TELLING
IMAGINE
MICROORGANISMS SHARKS
FIND
FIELD
PLAYING
ELECTRIC
SHORT
SENSE
PERSON
DIVING
POSITION
PHYSICS
HIT
DIRECTION
INFECTIOUS
DOLPHINS CONSCIOUSNESS FICTION
FIELD
LABORATORY
TENNIS
FORCE
ACTION
STRANGE
COMMON
SWAM
OCCUPATIONS
STUDIES
TEAMS
MAGNETS
TRUE
FEELING
CAUSING
LONG
REQUIRE
WORLD
GAMES
BE
EVENTS
WHOLE
SMALLPOX
SEAL
OPPORTUNITY
SPORTS
MAGNETISM SCIENTIST
TELLS
BEING
BODY
DIVE
EARN
STUDYING
BAT
POLE
TALE
MIGHT
INFECTIONS
DOLPHIN
ABLE
SCIENCES
TERRY
INDUCED
NOVEL
HOPE
CERTAIN
UNDERWATER
[Tennenbaum et al]
Example topics
induced from a large collection of text
JOB
SCIENCE
BALL
FIELD
STORY
MIND
DISEASE
WATER
WORK
STUDY
GAME
MAGNETIC
STORIES
WORLD
BACTERIA
FISH
JOBS
SCIENTISTS
TEAM
MAGNET
TELL
DREAM
DISEASES
SEA
CAREER
SCIENTIFIC FOOTBALL
WIRE
CHARACTER
DREAMS
GERMS
SWIM
KNOWLEDGE
BASEBALL EXPERIENCE
NEEDLE
THOUGHT CHARACTERS
FEVER
SWIMMING
WORK
PLAYERS EMPLOYMENT
CURRENT
AUTHOR
IMAGINATION
CAUSE
POOL
OPPORTUNITIES
RESEARCH
PLAY
COIL
READ
MOMENT
CAUSED
LIKE
WORKING
CHEMISTRY
FIELD
POLES
TOLD
THOUGHTS
SPREAD
SHELL
TRAINING
TECHNOLOGY PLAYER
IRON
SETTING
OWN
VIRUSES
SHARK
SKILLS
MANY
BASKETBALL
COMPASS
TALES
REAL
INFECTION
TANK
CAREERS
MATHEMATICS COACH
LINES
PLOT
LIFE
VIRUS
SHELLS
POSITIONS
BIOLOGY
PLAYED
CORE
TELLING
IMAGINE
MICROORGANISMS SHARKS
FIND
FIELD
PLAYING
ELECTRIC
SHORT
SENSE
PERSON
DIVING
POSITION
PHYSICS
HIT
DIRECTION
INFECTIOUS
DOLPHINS CONSCIOUSNESS FICTION
FIELD
LABORATORY
TENNIS
FORCE
ACTION
STRANGE
COMMON
SWAM
OCCUPATIONS
STUDIES
TEAMS
MAGNETS
TRUE
FEELING
CAUSING
LONG
REQUIRE
WORLD
GAMES
BE
EVENTS
WHOLE
SMALLPOX
SEAL
OPPORTUNITY
SPORTS
MAGNETISM SCIENTIST
TELLS
BEING
BODY
DIVE
EARN
STUDYING
BAT
POLE
TALE
MIGHT
INFECTIONS
DOLPHIN
ABLE
SCIENCES
TERRY
INDUCED
NOVEL
HOPE
CERTAIN
UNDERWATER
[Tennenbaum et al]
From LDA to Author-Recipient-Topic
(ART)
Inference and Estimation
Gibbs Sampling:
- Easy to implement
- Reasonably fast
r
Enron Email Corpus
• 250k email messages
• 23k people
Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)
From: [email protected]
To: [email protected]
Subject: Enron/TransAltaContract dated Jan 1, 2001
Please see below. Katalin Kiss of TransAlta has requested an
electronic copy of our final draft? Are you OK with this? If
so, the only version I have is the original draft without
revisions.
DP
Debra Perlingiere
Enron North America Corp.
Legal Department
1400 Smith Street, EB 3885
Houston, Texas 77002
[email protected]
Topics, and prominent sender/receivers
discovered by ART
Topics, and prominent sender/receivers
discovered by ART
Beck = “Chief Operations Officer”
Dasovich = “Government Relations Executive”
Shapiro = “Vice Presidence of Regulatory Affairs”
Steffes = “Vice President of Government Affairs”
Comparing Role Discovery
Traditional SNA
ART
Author-Topic
distribution over
authored topics
distribution over
authored topics
connection strength (A,B) =
distribution over
recipients
Comparing Role Discovery
Tracy Geaconne  Dan McCarty
Traditional SNA
ART
Similar roles
Different roles
Geaconne = “Secretary”
McCarty = “Vice President”
Author-Topic
Different roles
Comparing Role Discovery
Tracy Geaconne  Rod Hayslett
Traditional SNA
Different roles
ART
Not very similar
Author-Topic
Very similar
Geaconne = “Secretary”
Hayslett = “Vice President & CTO”
Comparing Role Discovery
Lynn Blair  Kimberly Watson
Traditional SNA
Different roles
ART
Very similar
Author-Topic
Very different
Blair = “Gas pipeline logistics”
Watson = “Pipeline facilities planning”
McCallum Email Corpus 2004
• January - October 2004
• 23k email messages
• 825 people
From: [email protected]
Subject: NIPS and ....
Date: June 14, 2004 2:27:41 PM EDT
To: [email protected]
There is pertinent stuff on the first yellow folder that is
completed either travel or other things, so please sign that
first folder anyway. Then, here is the reminder of the things
I'm still waiting for:
NIPS registration receipt.
CALO registration receipt.
Thanks,
Kate
McCallum Email Blockstructure
Four most prominent topics
in discussions with ____?
Two most prominent topics
in discussions with ____?
Words
love
hous e
time
great
hope
dinner
s aturday
left
ll
vis it
evening
s tay
bring
weekend
road
s unday
kids
flight
P rob
0 .0 3 0 5 1 4
0 .0 1 5 4 0 2
0 .0 1 3 6 5 9
0 .0 1 2 3 5 1
0 .0 1 1 3 3 4
0 .0 1 1 0 4 3
0 .0 0 9 5 9
0 .0 0 9 1 5 4
0 .0 0 9 1 5 4
0 .0 0 9 0 0 9
0 .0 0 8 2 8 2
0 .0 0 8 1 3 7
0 .0 0 8 1 3 7
0 .0 0 7 8 4 7
0 .0 0 7 7 0 1
0 .0 0 7 4 1 1
0 .0 0 7 1 2
0 .0 0 6 8 2 9
0 .0 0 6 5 3 9
0 .0 0 6 5 3 9
Words
today
tomorrow
time
ll
meeting
week
talk
meet
morning
monday
bac k
c all
free
home
won
day
hope
leave
offic e
tues day
P rob
0 .0 5 1 1 5 2
0 .0 4 5 3 9 3
0 .0 4 1 2 8 9
0 .0 3 9 1 4 5
0 .0 3 3 8 7 7
0 .0 2 5 4 8 4
0 .0 2 4 6 2 6
0 .0 2 3 2 7 9
0 .0 2 2 7 8 9
0 .0 2 0 7 6 7
0 .0 1 9 3 5 8
0 .0 1 6 4 1 8
0 .0 1 5 6 2 1
0 .0 1 3 9 6 7
0 .0 1 3 7 8 3
0 .0 1 3 1 1
0 .0 1 2 9 8 7
0 .0 1 2 9 8 7
0 .0 1 2 7 4 2
0 .0 1 2 5 5 8
Role-Author-Recipient-Topic Models
Information Extraction and Mining
the Research Literature
Information Extraction and Synthesis Laboratory
Computer Science Department
University of Massachusetts Amherst
Qui ckTi me™ and a
TIFF (Uncompressed) decompr essor
are needed to see this pictur e.
Intelligence Technology Innovation Center
ITIC
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
Previous Systems
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Previous Systems
Cites
Research
Paper
More Entities and Relations
Expertise
Cites
Research
Paper
Grant
Venue
Person
University
Groups
Summary
• Conditional Random Fields
– Conditional probability models of structured data
• Data mining complex unstructured text suggests the
need for joint inference IE + DM.
• Early examples
–
–
–
–
Factorial finite state models
Jointly labeling distant entities
Coreference analysis
Segmentation uncertainty aiding coreference
• Piecewise Training
– Faster + higher accuracy
• Current projects
– Email, contact management, expert-finding, SNA
– Mining the scientific literature
End of Talk