Transcript Slide 1
Graph-Based Methods for Automatic
Text Summarization
Lin Ziheng1, Kan Min-Yen2 and Lee Wee Sun2
School of Computing, National University of Singapore
3 Science Drive 2, Singapore 117543
A paper has been submitted to and accepted by HLT-NAACL 2007 TextGraphs-2 workshop. We also participated in DUC 2007 and have submitted a paper for our system.
1. Abstract
3. Timestamped Graph
Current graph-based approaches to text
summarization assume static graphs. A
suitable evolutionary text graph model
may impart a better understanding of
the texts. We propose a timestamped
graph (TSG) model that is based on
evolving networks and human writing
and reading processes.
Assumptions
1.Writers write articles from the first sentence to the last;
2.Readers read articles from the first sentence to the last.
Approach
Add sentences into the graph in chronological order.
Suitable in modeling the growth of single documents; for multi-document, treat
it as multiple instances of the single documents, which evolve in parallel.
2. Motivation
No existing text graph approach that
models how texts emerge. (LexRank,
TextRank)
Natural evolving networks.
An example
Definition
The growth of TSG
The example is just one instance of TSG with specific parameter settings. We
generalize and formalize the TSG algorithm.
Citation network
The WWW
Human writing and reading processes.
A timestamped graph algorithm tsg(M) is a 9-tuple (d, e, u, f,σ, t, i, s,τ) that
specifies a resulting algorithm that takes as input the set of texts M and outputs
a graph G.
e - #edges to add per vertex per time step;
u - unweighted or weighted edges;
σ- vertex selection function σ(u, G);
s - skew degree.
The success of graph ranking
algorithms, such as PageRank.
Skew degree
4. Evaluation
5. Conclusion
Dataset: DUC 2005, 2006 and 2007. Evaluation tool: ROUGE.
Each dataset contains 50 clusters, each cluster contains a query and 25 documents.
Summarization system: (1) Graph construction phase – TSG; (2) Sentence ranking
phase – PageRank; (3) Sentence extraction phase – MMR re-ranker.
e=2
e=2
12th
Proposed a timestamped graph
model for text understanding and
summarization.
Applied TSG on DUC 05, 06 and
07, and achieved comparable
results.
Best performance achieved with
specific parameter settings.
N
Topicsensitive
Weighted
edges
ROUGE-1
No
No
0.39358
0.07690
Yes
No
0.39443
No
Yes
Yes
Yes
TSG subsumes the graphs used
by LexRank and TextRank.
N
ROUGE-2
Skew degree
ROUGE-1
ROUGE-2
0
0.36982
0.07580
0.07838
1
0.37268
0.07682
0.39823
0.08072
2
0.36998
0.07489
0.39845
0.08282
3rd
6. Future Works
Results of participation in DUC 2007
Optimal performance: e = 2; topic-sensitive PageRank and weighted edges; s = 1.
DUC results show: TSG is better tailored to deal with update summaries.
Currently looking further on
skewed timestamped graphs.
Analyzing in-degree distribution
of timestamped graphs.
1. Student
2. Supervisor