Transcript Slide 1

Graph-Based Methods for Automatic
Text Summarization
Lin Ziheng1, Kan Min-Yen2 and Lee Wee Sun2
School of Computing, National University of Singapore
3 Science Drive 2, Singapore 117543
A paper has been submitted to and accepted by HLT-NAACL 2007 TextGraphs-2 workshop. We also participated in DUC 2007 and have submitted a paper for our system.
1. Abstract
3. Timestamped Graph
Current graph-based approaches to text
summarization assume static graphs. A
suitable evolutionary text graph model
may impart a better understanding of
the texts. We propose a timestamped
graph (TSG) model that is based on
evolving networks and human writing
and reading processes.
Assumptions
1.Writers write articles from the first sentence to the last;
2.Readers read articles from the first sentence to the last.
Approach
Add sentences into the graph in chronological order.
Suitable in modeling the growth of single documents; for multi-document, treat
it as multiple instances of the single documents, which evolve in parallel.
2. Motivation
No existing text graph approach that
models how texts emerge. (LexRank,
TextRank)
Natural evolving networks.
An example
Definition
The growth of TSG
The example is just one instance of TSG with specific parameter settings. We
generalize and formalize the TSG algorithm.
Citation network
The WWW
Human writing and reading processes.
A timestamped graph algorithm tsg(M) is a 9-tuple (d, e, u, f,σ, t, i, s,τ) that
specifies a resulting algorithm that takes as input the set of texts M and outputs
a graph G.
e - #edges to add per vertex per time step;
u - unweighted or weighted edges;
σ- vertex selection function σ(u, G);
s - skew degree.
The success of graph ranking
algorithms, such as PageRank.
Skew degree
4. Evaluation
5. Conclusion
Dataset: DUC 2005, 2006 and 2007. Evaluation tool: ROUGE.
Each dataset contains 50 clusters, each cluster contains a query and 25 documents.
Summarization system: (1) Graph construction phase – TSG; (2) Sentence ranking
phase – PageRank; (3) Sentence extraction phase – MMR re-ranker.
e=2
e=2
12th
Proposed a timestamped graph
model for text understanding and
summarization.
Applied TSG on DUC 05, 06 and
07, and achieved comparable
results.
Best performance achieved with
specific parameter settings.
N
Topicsensitive
Weighted
edges
ROUGE-1
No
No
0.39358
0.07690
Yes
No
0.39443
No
Yes
Yes
Yes
TSG subsumes the graphs used
by LexRank and TextRank.
N
ROUGE-2
Skew degree
ROUGE-1
ROUGE-2
0
0.36982
0.07580
0.07838
1
0.37268
0.07682
0.39823
0.08072
2
0.36998
0.07489
0.39845
0.08282
3rd
6. Future Works
Results of participation in DUC 2007
Optimal performance: e = 2; topic-sensitive PageRank and weighted edges; s = 1.
DUC results show: TSG is better tailored to deal with update summaries.
Currently looking further on
skewed timestamped graphs.
Analyzing in-degree distribution
of timestamped graphs.
1. Student
2. Supervisor