I3.2-Subtask 2

Download Report

Transcript I3.2-Subtask 2

Joint Enhancement of Topic Modeling and
Information Network Mining
Mid-Year PI Report focusing on I3.2
Heng Ji
City University of New York
NSCTA/INARC
March 24, 2011
INARC Project Major Contributions

I3.2-Subtask 1:
–
–
Disambiguate objects with rich semantic structures extracted from interconnected texts (ACL2011)
A new Collaborative Network Ranking Theory for Coreference Resolution (EMNLP2011-sub):
–
Markov Logic Networks and Learning-to-Rank to Enhance Open Domain Role Discovery (TAC2010,
LNCS, SIGIR2011-sub, EMNLP2011-sub)
16.4% improvement over state-of-the-art entity linking and 13%-22% improvement over link
discovery
–

I3.2-Subtask 2 (with H. Deng (UIUC) and J. Han (UIUC); Focus of this Talk)
–
–
–

I3.2-Subtask 3 (with H. Deng (UIUC) and J. Han (UIUC))
–

Novel topic modeling: Multi-typed objects are treated differently along with their inherent textual
information and the rich semantics of the heterogeneous information network (KDD2011-sub, IEEE
Journal invited-sub)
Exploit the power of extended topic modeling for event network partitioning and refinement
through active learning and topic cluster driven inferences. (ACL2011-sub, IEEE Journal invited-sub)
Model the dynamics of information networks through a new temporal event network
representation theory, evaluation metric and corresponding kernel methods (ACL2011-sub,
EMNLP2011-sub)
Self-Boosting Terrorism Network Search and Browsing (Springer Book Chapter, SIGIR2011-sub)
*I3.1: Uncovering Hierarchical Relationships among Linked Objects (with C. Wang
(UIUC) and J. Han (UIUC), KDD'11 sub, presented by J. Han)
CUNY Students and Post-docs: Q. Li, X. Li, W. Lin, Z. Chen, S. Tamang, S. Anzaroot, J. Artiles
2
Mining and Modeling
Interconnected Information Networks

Text-rich heterogeneous information network
– Textual documents (news, blogs, twitter, papers, reports) are getting richer

Approximately 80% percent of all data in information network is held in an unstructured
format; Thousands of "attack" events and hundreds of "arrest" events can be mined from
one week's unstructured textual data

Identify topics and events from documents using topic models
– Interconnect with users and other objects

3
How topics propagate from documents to objects?
A Starting Point: ‘Isolated’ Information Network
Website: We are all Khaled Said
Node Pair in InfoNet
PER
GPE
Link Types in Open-domain Information Network
ORG
Spouse, Parents, Children, Siblings
Member
Static
Birth-Place, Death-Place, Nationality, Origin
Subsidiaries, Parents
Location, Headquarter, Political-Affiliation
Located-Country, Capital
Residence: Contact-Meet,
Tahrir (Feb Contact-Phone_Write,
18th, 2001-present)
Justice, Sport
Leader, Schools-Attended, Employee, Founder, Shareholder, Justice
Dynamic
Resides-Place, Leader, Conflict-Attack, Conflict-Demonstrate, Justice,
Movement-Transport, Injure
Business-Merge, Sport, Transaction
Conflict-Attack
Joint Enhancement of Topic Modeling and
Heterogeneous Information Network Mining

Fundamental Theory: InforNet construction and knowledge discovery capability can
be mutually enhanced by network analysis on text and interconnected data

Q1: How to discover latent topics and identify clusters of multi-typed objects simultaneously?
A1: Probabilistic Topic Modeling with Biased Propagation to take advantage of inter-connectivity
in InforNets
Q2: How can text data and heterogeneous InforNet mutually enhance each other in topic
modeling and other text mining tasks?
A2: Incorporate topic clusters to partition and refine InforNets, yield new representation,
evaluation metric and modeling theory

Biased propagation
Topic model
Preliminaries

– Maximize the log likelihood of a collection of docs
6
Probabilistic Topic Models with Biased Propagation
Intuition:
InforNet
provides valuable information
Different objects have their own inherent
information (e.g., D with rich text and U without
explicit text)
To treat documents with rich text and other
objects without explicit text in a different way
Topic(D)  inherent text + connected U
Topic(U)  connected D
Basic Idea: (Biased Topic Propagation)
Propagate
the topic probabilities obtained by topic models from
documents to other objects through the heterogeneous InforNet
A simple and unbiased topic propagation does not make much sense
7
Biased Random Walk

Basic criterion
– The topic of an object without explicit text depends on
the topic of the documents it connects

E.g., the research topic of an author could be
characterized by his/her published papers;
– The topic of a document is correlated with its objects
to some extent, and should be principally determined
by its inherent content of the text
The topic distribution of an object is
determined by the average topic
distribution of connected documents
Inherent topic
distributions of docs
8
Propagated topic
distribution
ξ: control the balance between
inherent topic distribution and the
propagated topic distribution
Biased Regularization: Put All Together

9
Joint Enhancement of Topic Modeling and
Heterogeneous Information Network Mining

Fundamental Theory: InforNet construction and knowledge discovery capability can
be mutually enhanced by network analysis on text and interconnected data

Q1: How to discover latent topics and identify clusters of multi-typed objects simultaneously?
A1: Probabilistic Topic Modeling with Biased Propagation to take advantage of interconnectivity in InforNets
Q2: How can text data and heterogeneous InforNet mutually enhance each other in topic
modeling and other text mining tasks?
A2: Incorporate topic clusters to partition and refine InforNets, yield new representation,
evaluation metric and modeling theory


√
Biased propagation
Topic model
TMBP for InforNet Partitioning
yang
Washington
nuclear
program
China
weapons
United
States
Iraqi
Saddam
fighting
army regime British
forces military
city
Kurdish
control
Baghdad
………
Event Type: "Business"
Trigger: form, dissolve
Arguments:
"Org""Place" "TimeWithin" "Agent"
Event Type: "Attack"
Trigger: blew, attack
Arguments: "Attacker"
"Target" "Place" "TimeWithin"
Event Type: "Justice"
Trigger: Arrest, Jail
Arguments:"Defendant"
"Time-Within"
"Adjudicator" "Place"
………
court York dollars
case
million
AFP
government
media
convicted
billion company
sentence
EventType:"Transaction"
Trigger: Borrow, Launch
Arguments: "Giver"
"Recipient""Money""Sell
er""Artifact""Buyer"
the leaders
of Germany
and France
Doc 1
Doc 2
former
Chinese
president
Jiang Zeming
T
SU YPE
BT ="
YP PER
E=
"B -SOC
usi
ne "
ss"
l"
ne
on ct"
rs Ele
e
P "
=" E=
PE YP
TY BT
SU
Doc 3
Russian
President
Vladimir Putin
TYPE="Movement"
SUBTYPE="Transport"
troops
Event Type: "Contact"
Trigger: talk, meet etc.
Arguments: "Entity"
"Instrument" "Place"
"Time-Within"
TYPE="Contact"
SUBTYPE="Meet"
north
talks
Korea
south
Putin
Pyong officials
March 2004
Doc 4
………
Doc N
"
YS d"
PH ate
=" "Loc
E
P
=
TY YPE
BT
SU
T
SU YPE
BT ="
YP PER
E=
"B -SOC
usi
ne "
ss"
US
President
George W.
Bush
Evian,
France
Doc 5
Doc 6
Saint
Petersburg
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Palestinian
Israel
police
Israeli
people
bank
Monday
killed
west
security
Attack
Iraq
war
United
States
Bush
Nations
Iraqi
minister
council
resolution
country
north
nuclear
Korea
weapons
Korean
talks
officials
Washington
Putin
south
China
court
dollars
year
appeal
million
years
government
convicted
billion
sentence
AFP
TMBP for InforNet Refinement


Across a heterogeneous information network, a particular object can
sometimes be an event trigger and sometimes not, and can represent
different event types
Within a cluster of topically-related documents, the distribution is much more
convergent


e.g. In the overall information networks only 7% of “fire” indicate “End-Position” events;
while all of “fire” in a topic cluster are “End-Position” events
Topic Modeling can enhance information network construction by grouping
similar objects, event types and roles together
Open-domain Progressive Information Network
Analysis with TMBP

Bombing Threats Tracking and Dynamic Terrorism Networks Construction
Islamic Republic of
Iran Broadcasting
0.8
Tehran
University
0.9 0.3
Iran
Supreme National
Security Council
0.6
Ali Larijani
0.4
Farideh
Motahari
Hassan
Rowhani
– Most information obtained from text-rich InforNet construction so far is viewed as
static, ignoring the temporal dimension of many links in the networks
– It’s not enough to rely on information reporting time (publication years, blog post
dates, news release time, narrative order, etc.) for open-domain real-world
scenarios – only 3.71% correlation with gold-standards
– Temporal information on individual documents can be sparse, incomplete and
inaccurate. About 50% events don’t include explicit time arguments
13
13
TMBP based Information Aggregation

Toward deep analysis and global aggregation across information
networks
– Partition Infornet based on topic modeling
– Within a topic cluster, we can recover temporal information by gleaning
knowledge across networks and reach a global estimation of time boundaries

Research Methods
– Novel representation of complex temporal information
– Meaningful comparison of approaches through InforNet-specific metrics
– Design novel dependency path based kernel methods to capture long contexts
– Global inference and aggregation over text-rich InforNet in order to reduce
vagueness and over-constraining, resolve contradiction, and improve information
quality
New Representation Theory and Evaluation Metric

4-tuple representation
–
T1=Earliest possible start/ T2=latest possible start /T3= Earliest possible end /
T4=latest possible end

–
Can represent punctual start/end points (T1 = T2, T3 = T4)
–
Captures uncertainty when necessary (T1 < T2, T3 < T4)
–
Consistency restrictions: T1 <= T2, T3 <= T4, T1<=T3, T2<=T4
A new quality of information metric based on formal constraints:
–
–
–
–
Detect cases of non-informative nodes and links in information networks
Allow independent parameterization of vagueness and over-constraining errors
Error penalization can be tuned for more coarse or fine grained penalization
ti: automatic output; gi: gold-standard

coverconstraining ,if (i  {1,3}  ti  g i )  (i  {2, 4}  ti  g i )
c

cvagueness , otherwise
Vague model
Over-constraining
model
Dependency Paths based Kernel Method and
Information Aggregation with CCMs

Dependency paths based kernel method for local network prediction

Maximize global network quality by aggregating temporal information across
documents over the entire information networks, using Conditional Constraint
Models for optimization (Collaboration with Dan Roth (UIUC))
T  T (i )  max(T1 , T1(i ) ), min(T2 , T2(i ) ), max(T3 , T3(i ) ), min(T4 , T4( i ) )
max  (ln( pi ,k ) xi ,k ) s.t.
i
k
–
i, j, i  j, and T (i )conflicts withT ( j ) :
K
xi  xk  1 i, k : xi ,k {0,1},  xi ,k  1
k 1
Topic Modeling Experiments
Compared to State-of-the-art



17
Data Collection
– DBLP
– NSF-Awards
Metrics
– Accuracy (AC)
– Normalized mutual information (NMI)
Results: improve 20%-40% over Probabilistic Latent Semantic Analysis (PLSA)
Topic Modeling based Active Learning
for Event and Role Mining (Enhance Portability)




Data: open-domain news
with gold-standard
information annotation
Learning algorithm:
combining pattern
matching and Maximum
Entropy based classification
of triggers, arguments and
roles
Automatically select
topically-related
documents as for event
training data annotation
Using Topic modeling, with
only 1/4 training data we
can achieve comparable
performance as passive
learning
18
Topic Modeling based MLN Inference
(Enhance Quality)

Topic-cluster wide cross-document inference based on Markov Logic
Networks (MLN) to enhance event and role mining




One trigger sense per topic cluster / One argument role per topic cluster
Remove events and roles with low local and cluster-wide confidence
Adjust event and role labeling to achieve cluster-wide consistency
Results: Precision (P), Recall (R), F-Measure (F)
Approach
Event Discovery (%)
Role Discovery (%)
P
R
F
P
R
F
Baseline
74.1
49.6
59.4
50.4
28.7
36.6
State-of-the-art
(Information Retrieval
based Clustering)
66.5
67.4
66.9
60.8
32.2
42.1
Topic Modeling
73.3
66.3
69.6
59.4
36.5
45.2
19
Progressive Temporal Infornet Mining Results

Data
– 1.3 million newswire documents and 0.4 million web blogs/forum documents


Overall Comparison with State-of-the-Art
Approach
Exploit InforNet Structures?
Accuracy
Quality
1-gram kernel
No
54.9
0.66
2-gram kernel
No
56.8
0.67
3-gram kernel
No
56.5
0.66
Our Approach
Yes
61.5
0.76
Impact of Information Aggregation
No InforNet
Aggregation over 2 tuples
Aggregation over 10 tuples
Exploit InforNet
20
What’s New in Network Science?
Previous Approaches
Our Approaches
only considered the textual
information while ignored the
network structures or could
merely integrate with
homogeneous networks
Declaratively model the inter-connectivity in information
networks using probabilistic topic modeling with biased
propagation; Multi-typed objects are treated differently along
with their inherent textual information and the rich semantics of
the heterogeneous information network
analyzed text documents and
information networks
separately
text data and heterogeneous information network mutually
enhance each other in topic modeling and event/role discovery
based on information network partitioning and refinement
focused on the analysis of one
or a small set of documents
Leverage information redundancy and semantic links across
documents in information networks through cross-document
aggregation and reasoning; reach global quality optimization in
multi-dimensional space (topic, entity, event, time, place)
treated equally static and
dynamic information
discovered from ambiguous
and uncertain information
networks
Develop a new temporal event network representation theory
and evaluation metric with formal constraints that can account
for uncertain temporal ranges, a new kernel method based on
dependency paths to capture long contexts
21
Potential Army Impact and
Technology Transition

Enrich and enhance the quality of information gathering from daily events
and trends, and detecting terrorism or other potential threats by exploring
unstructured text messages, blogs, twitters, news, reports integrated
information networks

Improved information quality has potential of pointing the soldiers and
military data analysts to more relevant information, go beyond keyword
based Information Retrieval approaches

Multi-facet object search can provide methods for finding groups of soldiers
with certain expertise and finding characteristics of enemies that may pose
an imminent threat (An example: Web-scale Terrorism Network Search and
Browsing)
– Developed methods to efficiently trace membership relations, attack/arrest/die
activities and information clusters involving any specific entities
– Improve the quality of information by the interconnected network itself (selfboosting information networks)
22
22
Collaborations

Within Task:
– With J. Han on subtask 2 and 3, >2 teleconferences every week, frequent
teleconferences/emails among students/post-docs, submitted 2 joint research
papers (1 SIGIR2011 submission and 1 ACL2011 submission), preparing 3 new joint
research papers
– With D. Roth, collaboration on Constrained Conditional Models (I1.1) for
Information Aggregation, entity coreference resolution and event extraction

Cross-Task:
– With J. Han on I3.1, weekly teleconferences, regular emails, submitted 1 joint
research paper to KDD2011
– With T. Huang on I1.1, on multi-media InforNet construction and utilization,
published 2 joint research papers, submitted a joint NSF proposal

Cross-Center:
– With S. Parsons (SCNARC and T1.4), on using text-rich information networks for trust
prediction and dynamic social network analysis, co-advising a PhD student
Research Plans for Next Six Months

Continue research conducted in the current I3.2 APP
– Explore topic correlation and social correlation from neighbors for improving topic
modeling (with Hongbo Deng, Jiawei Han and collaboration with SCNARC)
– Introduce more constraints in cross-link inferences (with D. Roth)
– Exploit new graph alignment algorithms for text mining (with X. Yan)
– Exploit implicit links for InforNet analysis, such as the response structures in twitter data
– Technology Transition: Apply all of the successful approaches to military applications, e.g.
conduct tight collaborations with ARL (e.g. Dr. Robert Cole) to make terrorism network
search engine deliverable; with ARL (Dr. Robert Winkler) on entity coreference resolution;
with A. Leung on military data topic and event analysis

Collaborations with researchers in other tasks and networks
– I3.1 APP: Continue collaborations with Jiawei Han (UIUC), to extend the work of
uncovering hierarchical relationships to more general relation types, data genres and
domains
– Work with Thomas Huang (UIUC, I1.1) on cross-media transfer learning
– Work with Jiawei Han (UIUC, E2.3) on evolution of information networks
– Work with Simon Parsons (T1.4) on automatic social network analysis, and exploit logic
reasoning to enhance entity disambiguation and information aggregation
24
A Research Path Ahead to 2012

Next year research planned if funded:
– Effective theories and methods for mining text-rich heterogeneous
networks involving social and communication networks
– Leverage topic modeling for improving expert finding (expertise ranking
problem) on heterogeneous information network
– Continue to exploit network structures to enhance knowledge discovery
and population
– Multi-dimensional, hierarchical abstractive summarization based on
information network analysis
– Explore collaborations with information fusion tasks in I1
– Explore collaborations with social network and trust projects on automatic
social network construction and mining
– Application of effective theories and methods in military applications
25
Research Papers
I3.1
I3.2
(UIUC+CUNY) C. Wang, J. Han, X. Li, Q. Li, W. Lin, A. Lee, H. Li and H. Ji. 2011. Uncovering Hierarchical
Relationships among Linked Objects: A Probabilistic Modeling Approach. Submitted to KDD2011.
Accepted/Published:

Z. Chen, S. Tamang, A. Lee, X. Li, W. Lin, J. Artiles, M. Snover, M. Passantino and H. Ji. CUNY-BLENDER TACKBP2010 Entity Linking and Slot Filling System Description. Proc. TAC2010.

H. Li, X. Li, H. Ji and Y. Marton. Domain-Independent Novel Event Discovery and Semi-Automatic Event
Annotation. Proc. PACLIC 2010.

H. Ji, R. Grishman. Knowledge Base Population: Successful Approaches and Challenges. Proc. ACL-HLT2011.

H. Ji, Adam Lee and Wen-Pin Lin. Information Network Construction and Alignment from Automatically
Acquired Comparable Corpora. Invited book chapter for Building and Using Comparable Corpora. Springer.

H. Ji, B. Favre, W. Lin, D. Gillick, D. Hakkani-Tur and R. Grishman. Open-domain Multi-document Summarization
via Information Extraction: Challenges and Prospects. Invited book chapter for Multi-source, Multilingual
Information Extraction and Summarisation. Springer.
Submitted

(CUNY + UIUC) H. Ji and J. Han. 2011. Web-Scale Knowledge Discovery and Information Extraction. Invited
Paper for IEEE Special Issue on Web-Scale Multimedia Processing and Applications.

(CUNY + UIUC) H. Li, H. Ji, H. Deng and J. Han. 2011. Topically Related Data is Better Data: Topic Modeling for
Event Extraction. ACL-HLT2011.

(CUNY + UIUC) S. Anzaroot, J. Artiles, H. Ji, H. Deng and J. Han. 2011. Search and Browsing Self-Boosting
Information Networks. SIGIR2011.

J. Artiles, Q. Li, E. Amigo and H. Ji. 2011. Leveraging Cross-document Redundancy for Temporal Information
Extraction. EMNLP2011.

J. Artiles, E. Amigo, Q. Li and H. Ji. 2011. Evaluating Temporal Information Extraction. ACL-HLT2011

Z. Chen and H. Ji. 2011. Collaborative Ranking: A Case Study in Entity Linking. EMNLP2011.

Q. Li, J. Artiles and H. Ji. 2011. Dependency Paths Kernel for Temporal Relation Classification. ACL-HLT2011.

S. Tamang and H. Ji. 2011. Learning-to-Rank for Slot Filling System Combination and Assessment. EMNLP2011.

Z. Chen, S. Tamang, A. Lee and H. Ji. 2011. A Toolkit for Knowledge Base Population. SIGIR2011.

X. Li and H. Ji. 2011. Comment-guided Learning for Automatic Assessment. EMNLP2011.
26
Awards and Keynote Speech





Heng Ji. CUNY Chancellor's "Salute to Scholar" Award, November 2010.
Heng Ji. National Science Foundation Research Experiences for
Undergraduates, March 2011
Heng Ji, Web-Scale Knowledge Discovery and Population from Unstructured
Data, Keynote Speech ACLCLP 2010 Information Retrieval Conference,
December 2010.
Heng Ji. Overview of the TAC2010 Knowledge Base Population Track, Keynote
Speech at Web People Search (WePS-3) Conference, September 2010.
Five students received university-wide awards
27
Brief Summary of My Team’s Other Research
Work in I3.1 and I3.2
28
Leverage Semantic Information Network to Enhance
Entity Coreference Resolution / Entity Identification
Disambiguation
Name Variant Clustering
Apply Graph-cutting based algorithms on semantic information networks
9.4% absolute improvement in micro-averaged accuracy
29
Micro and Macro Collaborative Networks Ranking for
Entity and Event Coreference Resolution
oA( q ) Previous methods only focused on the
0.7
0.4
cq2 q
cq1
cq4 q
cq3
cq7
target node and one learning theory itself
Propose a new collaborative network
ranking theory which imitates human
collaborative learning
Leverage inter-connections among
collaborative entities in information
networks
cq5
Automatic
profiling for each node
Construct a collaborative network for each
entity based on graph-based clustering
Rank multiple decisions from collaborative
entities (micro) and algorithms (macro) based
on global prediction
0.3
cq6 0.6
(q)
correct rank :
oB
7%
absolute improvement in microaveraged accuracy
On-going CUNY+UIUC work: using topic
modeling for entity clustering
30
30
Markov Logic Networks and Learning-to-Rank to Enhance
Open Domain Role Discovery
V6 forum V7
Wail Al-Shehri
V8
Al-Qaeda
twitter
Waleed Al-Shehri
V9
Khamis Mushait
Boston
V15 residence V14
V13
member
origin
Wail Al-Shehri
residence
Waleed Al-Shehri
sibling
911 Suspect
Terrorist Network
V4
V3
Abdul Rahman Al-Omari
V3
Abdul
Rahman
Al-Omari
Abdul Aziz Al-Omari
pilot
V11
news page
Terrorist
Information Network
V10
V12
Mohamed Atta
V16
web blog
pilot
Saudi Arabian Airlines
V4
Abdul Aziz Al-Omari
Mohamed Atta
Discovered
26 roles for persons, 16 roles for organizations and 13 roles for locations
Markov Logic Networks for Cross-slot and Cross-query reasoning based on InfoNet and textual
linkages to resolve conflictions and predict missing links
 Weight=15:  x , y , z Ambiguous ( X , Y )  Textual  Linkage(Y , Z )    Pilot ( X )  Pilot ( Z )  Remove X
 Weight=100: 
x , y , z Sibling ( X , Y )  Origin(Y , Z )  Origin( X , Z )
Maximum Entropy based Learning-to-rank model to re-rank candidate answers
13%-22% absolute F-measure improvement
(CUNY) Chen et al. "CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System
Description". Proc. TAC2010 and Lecture Notes in Computer Science, 2010
31
31
Uncovering Hierarchical Relationships among
Linked Objects




Parent-child, manager-subordinate,
organizational, initiator-follower
DAG underlying tree
 Data: Nodes, links, labeled trees
 Jointly Learn the importance of features
and rules (challenge: joint learning)
 Infer the tree structures of unlabeled data
(challenge: model & feature design)
Develop a general model & summarize typical
features w/ uncertain importance
 Local feature (singleton potential)
 Dependency rule (pairwise potential)
Test on two tasks
 Uncover family tree structure
 Uncover online discussion structure
v1
p
v11
vp22
p44
v
p
v33
CandidateDAG
v4
v1
v2
v2
v3
v3
Onepossibleresult
v4
Another possibleresult
Examples of features and rules
Inference performance in diff. measures
Practical usefulness and generality
Our model > state-of-the-art text mining (2-3X)
Does not require many labels for training
Joint model > two-stage model (5% - 381%)
Good adaptability for generalization
(UIUC + CUNY) Chi Wang, Jiawei Han, Xiang Li, Qi Li, Wen-Pin Lin, Adam Lee, Hao Li, Heng Ji, "Uncovering
Hierarchical Relationships among Linked Objects: A Probabilistic Modeling Approach", KDD'11 (sub)
Uncovering Hierarchical Relationships among
Linked Objects

Using a novel discriminative model CRF-Hier
– optimized for joint modeling of tree structure learning and reasoning
– 10%-12% higher performance than state-of-the-art
Mohammed bin Awad
bin Laden
Salem
bin Laden
Bakr
bin Laden
Abdullah Osama
bin Laden
Osama
bin Laden
Saad
bin Laden
Omar Osama
bin Laden
(UIUC + CUNY) Chi Wang, Jiawei Han, Xiang Li, Qi Li, Wen-Pin Lin, Adam Lee, Hao Li, Heng Ji, "Uncovering
Hierarchical Relationships among Linked Objects: A Probabilistic Modeling Approach", KDD'11 (sub)
Potential Transition Example: Terrorism Networks
Search and Browsing Engine
• In many scenarios, a user may
only know information about
limited portions of objects or
dimensions of links in
information networks and thus
have difficulty at creating
informative queries
• For example, a military data
analyst may have a list of
famous terrorism organizations
without knowing their detailed
person member names, but still
wish to track activities about
these members
Multi-Facet Search in Self-Boosting Information Networks
(Example: Terrorism Network Search and Browsing)
Demo Video: http://nlp.cs.qc.cuny.edu/terrorism.m4v
• Facilitate a military analyst in expert finding and terrorist information search gathering,
control and analysis for any given query
• Entity-topic analyzer for self-expansion and self-boosting: Terrorism organization 
members  status of members (die, arrest,...) and information networks associated
with each member
(CUNY + UIUC) Sam Anzaroot, Javier Artiles, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Search and
Browsing Self-Boosting Information Networks. SIGIR2011 [SUB]