Young-RaeCho

Download Report

Transcript Young-RaeCho

KOCSEA Technical Symposium 2010
Systematic Analysis of Interactome:
A New Trend in Bioinformatics
Young-Rae Cho, Ph.D.
Assistant Professor
Department of Computer Science
Baylor University
History of Bioinformatics
Stage 1.
Sequence Analysis
•
Gene sequencing
•
Sequence alignment
•
Homolog search
•
Motif finding
History of Bioinformatics
Computational Biology
Stage 1.
Sequence Analysis
Stage 2.
Structure Analysis
•
Gene sequencing
•
Protein folding
•
Sequence alignment
•
Homolog search
•
Homolog search
•
Binding site prediction
•
Motif finding
•
Function prediction
History of Bioinformatics
Computational Biology
Stage 1.
Sequence Analysis
Stage 2.
Structure Analysis
Functional Genomics
Stage 3.
Expression Analysis
•
Gene sequencing
•
Protein folding
•
Function prediction
•
Sequence alignment
•
Homolog search
•
Gene clustering
•
Homolog search
•
Binding site prediction
•
Sample classification
•
Motif finding
•
Function prediction
History of Bioinformatics
Computational Biology
Stage 1.
Sequence Analysis
Stage 2.
Structure Analysis
Functional Genomics
Systems Biology
Stage 3.
Expression Analysis
Stage 4.
Network Analysis
•
Gene sequencing
•
Protein folding
•
Function prediction
•
Network modeling
•
Sequence alignment
•
Homolog search
•
Gene clustering
•
Interaction prediction
•
Homolog search
•
Binding site prediction
•
Sample classification
•
Function prediction
•
Motif finding
•
Function prediction
•
Pathway identification
•
Module detection
Biological Networks
 Definition
 Maps of biochemical reactions, interactions, regulations between genes or proteins
 Importance
 Provide insights into the mechanisms of molecular function within a cell
 Significant resource for functional characterization of genes or proteins
 Require computational and systematic approaches
 Examples
 Metabolic networks
 Protein-protein interaction networks
 Genetic interaction networks
 Gene regulatory networks (Signal transduction networks)
Protein Interaction Networks
 Determination
 Experimental methods: Y2H, MS, Protein Microarray
 Computational methods: Homolog search, Gene fusion analysis, Phylogenetic profiles
 Genome-scale protein-protein interactions  Interactome
 Representation
 Un-weighted, undirected graph
 Challenges
 Unreliability
 Large scale
 Complex connectivity
Network Re-structuring
 Strategy
unweighted network
 To resolve complex connectivity
 Converts the complex graph to
a hierarchical tree structure
 Uses the concepts of path strength,
functional linkage, and centrality
 Process
 Input: a protein interaction network
edge weighting
weighted network
functional linkage measurement
score matrix
network restructuring
structured network
 Output: a list of functional modules
hub confidence measurement
hubs
network clustering
clusters
Path Strength
 Path Strength Model
 Assumption: each node in a path chooses a succeeding edge based on the weighted
probability

 Path Strength Factors
 Edge weights
 Path length
 Node weighted degree
Functional Linkage
 Measurements
 Path strength of the strongest path between two nodes
 Computational problem
 Needs a heuristic approach
 Uses a user-specified threshold of the max path length
 Formula
 k-length path strength:
 Functional linkage:
shortest path length
threshold
Network Restructuring
 Centrality
 Weighted closeness:
 Algorithm
 Computes centrality for each node a
 Selects a set of ancestor nodes, T(a), of a by
 Selects a parent node, p(a), of a by
 Example
Hub Confidence
 Measurement
 Selects a set of child nodes, D(a), of a by
 Selects a set of descendent nodes, La, of a by
 Computes the hub confidence, H(a), of a by
 Example
Clustering
 Algorithm
 Iteratively select a hub a with the highest hub confidence
 Output the sub-tree La including a as a cluster (functional module)
 Cluster Depth
 The max path length from the root of the sub-tree to a leaf
 Example
Topological Assessment of Hubs
 Network Vulnerability
 Random attack: repeatedly disrupt a randomly selected node
 Degree-based hub attack: repeatedly disrupt the highest degree node
 Structural hub attack: repeatedly disrupt the node with the highest hub confidence
 For each iteration, observes the largest component
 Results
fraction of largest component
1.00
0.95
0.90
0.85
0.80
0.75
random attack
degree-based hub attack
structural hub attack
0.70
0.65
0.60
0
20
40
60
80
100
number of nodes
120
140
160
Biological Assessment of Hubs
 Protein Lethality
 Determines lethal / viable proteins by knock-out experiment
 Lethality represents functional essentiality
 Orders proteins by degree and hub confidence
 Observes the cumulative proportion of lethal proteins for every 10 proteins
 Results
1.0
structural hubs
average lethality
0.9
degree-based hubs
0.8
0.7
0.6
0.5
0.4
0
20
40
60
80
number of hubs
100
120
140
Topological Assessment of Clusters
 Modularity
 A combined measure of density within each cluster and separability among clusters
 Estimated by the ratio of the number of edges within a cluster (sub-graph)
to the number of all edges starting from the nodes in the cluster (sub-graph)
 Observes the average modularity of clusters with respect to the cluster depth
 Results
 More specific function module has
 Justify the general-to-specific concepts
of hierarchical functional modules
160
average modularity
higher modularity
180
140
120
100
80
60
40
20
0
1
2
3
4
5
6
7
8
cluster depth
9
10 11 12
Biological Assessment of Clusters
 f-Measure
 Compares each output cluster X with the real functional annotation Y (from MIPS)
 Recall = (# of common proteins of X and Y) / (# of proteins in Y)
 Precision = (# of common proteins of X and Y) / (# of proteins in X)
 f-measure = 2 × Recall × Precision / (Recall + Precision)
 Results
 Compared with the results from previous hierarchical clustering methods, e.g.,
edge-betweenness (top-down approach) and ProDistIn (bottom-up-approach)
Conclusion
 Motivation
 Significant functional knowledge in protein interaction networks (interactome)
 Complex connectivity
 Contributions
 Convert an unstructured network to a structured network
 Conserve functional information through pathways
 High network vulnerability, low functional lethality at hubs as a drug target
 Applicable to various fields, e.g., social networks, WWW
 Foundation of structural dynamics during network evolution
Questions ?
 Reference
 Y.-R. Cho and A. Zhang, “Identification of functional modules by converting
interactome networks into hierarchical ordering of proteins”. BMC Bioinformatics,
11(Suppl 3):S3, 2010