Graph-based NLP

Download Report

Transcript Graph-based NLP

Graph-based Approaches to NLP
Al Akhwayn Summer School
2015
Horacio Rodríguez
TALP Research Center
Dept. Computer Science
Universitat Politècnica de Catalunya
[email protected]
http://www.lsi.upc.edu/~horacio
Outline
•
•
•
•
•
Introduction
Graph basics
Main techniques
(Probabilistic) Graphical Models
Aplications
Introduction
• Using Graphs
– Consolidated theory
– Efficient Algorithms
• Aplications
–
–
–
–
–
–
Automatic Summarization
IR
WSD
Entities clustering
Relevance detection
sentiment classification
1
Introduction
2
• What to read
– Rada Mihalcea
• http://lit.csci.unt.edu/index.php/Graph-based_NLP
– TextGraph workshop HLT-NAACL 2006
– TextGraph-2 workshop HLT-NAACL 2007
– Thesis
• Toutanova 2005, Murphy 2002, Zhu 2005
– Tutorials
• Rada Mihalcea, Dragomir Radev, Smaranda Muresan
Graphs Basics
• Graph
– G=<V,E>
• V set of vertices (nodes)
• E set of arcs (edges)
– An edge is a pair of vertices
– digraphs
• Edges are ordered pairs of vertices
– Source, target
–
–
–
–
Undirected graphs
(u,v)  E, u, v  V, u and v are adjacent
(u,v)  E is incident in u and v
Degree of a vertex
• Number of incident edges
• For Digraphs
– in-degree, out-degree
1
Graphs Basics
• Paths in graphs
– Simple path
– cycle
• Connected (sub) graphs
• Reduced graphs
• Articulation nodes
2
Graphs Basics
•
•
•
•
Subgraph
Tree
DAG
Complete Graph
– All nodes are adjacent
• Clique
• Bipartite Graph
• Hipergraph
3
Graphs Basics
• Labeled Graphs
– In nodes
– In edges
• Weighted Graphs
• Excentricity of a node of a graph
• Center of a graph
4
Graphs Basics
• Spanning Trees
• Minimum Cost Spanning Trees
• Representation of graph
–
–
–
–
Adjacency matrix
Adjacency lists
pros and cons
Sparse graphs
5
Graphs Basics
6
• Algorithms over graphs
– Dijkstra
•
•
•
•
•
For a weighted digraph mínimum cost path from a node to the others
Path length
Path cost
Dynamic programming
O(n2)
– All-pairs shortest path problem
• iterate Dijkstra for all the nodes,
• Floyd
• O(n3)
– Transitive Clausure
• Warshall
• O(n3)
Graphs Basics
• Algorithms over graphs
– Center
• using Floyd
– Graph traversing
• Depth First Search (DFS)
• Breadth First Search (BFS)
– Path finding
• Find a path between two nodes
• using DFS or BFS
– Minimum cost Spanning Trees
• Prim
• Kruskal
• O(n2)
7
Graphs Basics
8
• Algorithms over graphs
– Graph matching
• For a graph G = (V,E), a matching is un subset of E not containing
two arcs incident to the same node
• maximal matching problem
• complete matching problem
• Interesting for bipartit1 graphs
Graphs Basics
9
– Spectral theorem
• Conditions for a matrix to be diagonalizable
• Spectral Decomposition of A
– Diagonalization
• if A is normal (and thus hermitic and thus if real simetric) then a
decomposition exists:
– A = U  U*
–  is diagonal, being its entries the eigenvalues of A
– U is unitary being its columns the eigenvectors of A
Graphs Basics
– Matric decompositions
• SVD
– Aplication to dimensionality reduction
• Principal Components Analysis
10
Specific techniques
1
• Graph-based semi-supervised learning
–
–
–
–
–
–
–
Zhu, 2005
Goldberg, Zhu, 2006
L instances labeled x1, ..., xl
+ +
U unlabeled xl+1, ..., xl+u
C classes = {c1, ..., cC}
f(x) rating of x, f: x  R
classification: mapping f(x) to the nearest discrete rating in C
Specific techniques
2
• Graph-based semi-supervised learning
–
–
–
–
–
f(x) should be smooth with respect to the graph
(un)smootheness of an edge (xi,xj) = w((f(xi) - f(xj))2)
summing for all edges L(f) (un)smootheness of the graph
L(f) energy or loss
El problema de optimización es la f que minimice L(f)
Specific techniques
3
• Random Walk Algorithms
– Decide the importance of a vertex within a graph
– A link between two vertices = a vote
• Iterative voting => Ranking over all vertices
– Model a random walk on the graph
• A walker takes random steps
• Converges to a stationary distribution of probabilities
• Guaranteed to converge if the graph is
– Aperiodic – holds for non-bipartite graphs
– Irreducible – holds for strongly connected graphs
– Examples
• PageRank (Brin, Page, 1998)
• HITS (Kleinberg, 1999)
Specific techniques
4
• HITS (Hyperlinked Induced Topic Search)
– Keinberg, 1999
– Defines 2 types of importance:
•Authority
– Can be thought of as being an expert on the subject
– “You are important if important hubs point to you”
•Hub
– Can be thought of as being knowledgeable about the subject
– “You are important if important authorities point to you”
– Hubs and Authorities mutually reinforce each other
– random walk where:
•Odd steps you walk from hubs to authorities (follow outlink)
•Even steps you walk from authorities to hubs (follow inlink)
Specific techniques
5
• PageRank
–
–
–
–
–
–
–
Brin, Page, 1998
Original Google algorithm
Random walks over the Web following hyperlinks
some probability of jumping randomly out of the page
The fraction of time spent in each page is the PageRank
PageRank forms a probability distribution over the Web
PageRank is the principal eigenvector of the normalized link
matrix of the Web
Specific techniques
6
• Max Flow networks
– G=(V,E,s,t,u)
– s  V, source
– t  V, sink
– u(e) capacity of e
10
s
5
15
2
9
5
4
15
15
10
3
8
6
10
4
6
15
10
4
30
7
t
(Probabilistic) Graphical Models
1
• Graphs whose nodes represent random variables and the
absence of an edge conditional independence.
• GM
– undirected (Markov Networks, MN, Marlov Random Fields, MRF)
– directed (Bayesian Networks, BN).
• Survey en la tesis de K.P. Murphy, 2002
• Daphne Koller book
• Daphne Koller Coursera
(Probabilistic) Graphical Models
2
• Undirected GM
– global Markov property
•
•
•
•
XA  XB | Xc
A, B, C sets of nodes
A is conditionally independent of B given C
all paths between nodes in A and nodes in B are blocked by some
node in C
– local Markov property
• Xi is independent of all other nodes in the graph given its neighbours
(Probabilistic) Graphical Models
• Joint distribution of a MRF
– C is the set of maximal cliques
– C(XC) potential function
– Z normalization factor
3
(Probabilistic) Graphical Models
• Example
4
(Probabilistic) Graphical Models
• Directed GM (BN)
– Not cycles (DAG)
– edge (A,B) means that A causes B
– X is conditionally independent of its non
descendents given its paprents
– X is conditionally independent of all other
nodes given its neighbours
5
(Probabilistic) Graphical Models
• Factor Graphs
– Unify directed and undirected GM
– Bipartite
• Factors
• variables
– A MRF or a BN can be converted to a FG
and the reverse
6
Applications
• Applications
–
–
–
–
–
–
–
–
–
–
Summarization
Name disambiguation
IE
Novelty detection
Word dependency distributions
Textual Entailment
QA
Sentiment categorization
WSD
Substructure learning
1
Applications
• Summarization Approaches
–
–
–
–
Salton et al, 1999
LexRank (Radev, Erkan, 2004)
TextRank (Mihalcea, Tarau, 2004)
Zha, 2002
2
Applications
• Name disambiguation
• Aplication to Email
3
– Minkov, Cohen, Ng, 2006
– Representing a mail with graphs
– Using lazy graph walk as similarity measure (similar to
PageRank)
Applications
• Semi-Supervised Approach for IE
–
–
–
–
–
4
Hassan et al, 2006
Representing in the graph bot labeled and unlabeled examples.
Pattern induction
Reinforcement of weights
Based on HITS
Applications
5
• IE
– “American vice President Al Gore visited ...”.
– Template
• COUNTRY NOUN_PHRASE PERSON VERB_PHRASE
– Pattern
• COUNTRY(E2) NOUN_PHRASE(R) PERSON(E1) VERB_PHRASE
– Tuple
• application of a pattern to a text
– entity 1: Al Gore
– entity 2: United States
– relation: vice President
Applications
• Novelty Detection
• Gamon, 2006
• TREC 2004 novelty track
6
– Given the relevant sentences in the complete document set
(for a given topic), identify all novel sentences
• supervised classification task
• Baseline
– BOW with KL divergence
• d document
• R document set
Applications
7
• Distributions of word/word dependencies
– ej. POBJ(n|v)
• Toutanova, 2005 (tesis)
• PP attachment
• Learning a Markov Chain, MC, (random walk) model
– Nodes = words
Applications
8
• RTE
• Zanzotto, Moschitti, 2006
– Textual entailment pairs as pairs of syntactic trees with coindexed nodes
– consider both the structural similarity between syntactic tree
pairs and the similarity between relations among sentences
within a pair
– similarities
• cross-pair
– K((T',H'), (T'',H''))
– structural and lexical similarity between T', T'' and H', H''
– intra-pair word movement compatibility between (T',H') and
(T'',H'')
• intra-pair
– novel kernel function
Applications
9
• QA
• Diego Mollá, 2006
– graph-based representation of question and text sentences
– QA rule
•
•
•
•
pattern matching Q
pattern matching A sentence
pointer to A in A sentence
rules define an extension graph
– pattern
• graph with (some) nodes assigned to variables
Applications
10
• Substructure Learning
–
–
–
–
Leskovec at al, 2005
Semantic Graphs for representing documents
SVM for learning sub-structures
Approach
• Linguistic Analysis
– NLPWin
– Portrait
– Logical Form Triples (Subj, Pred, Obj)
• Semantic Graphs building
– 3 sets of linguistic features
» (ANV) adjectives, nouns, verbs
» (NPV) head nouns and verbs
» (LF) heads of LFT
• Two different graph representations for each of these three sets.
– NamedLink
– Nodes
Applications
11