CSE891 Selected Topics in Bioinformatics

Download Report

Transcript CSE891 Selected Topics in Bioinformatics

CSE891-002 Selected Topics in
Bioinformatics
Jin Chen
232 Plant Biology Bld.
2011 Spring
1
About me…
•
Jin Chen, Assistant Professor in CSE and PRL from 2009
•
Office: 232 Plant Biology Lab. Tel: (517) 355-5015. Email: [email protected]
2
Outline
• Course Description
• Introduction to Computational Network
Biology
3
Course Description
• Course objectives: study interesting computational network biology
problems and their algorithms, with a focus on the principles used to
design those algorithms. (3 credits)
• Instructor: Jin Chen, Office: 232 Plant Biology Bld. Email:
[email protected]
• Office hours: Thursday 2PM-3PM. If you cannot attend office hours, email
me about scheduling a different time.
• Web page: http://www.msu.edu/~jinchen/cse891a
4
Course Description
• Course work: One 80 minutes lecture, and 80 minutes of
discussion & student presentations each week
• Grading policies: The course will be graded on attendance
(10%), participation (20%), and presentation (70%).
• No Final Exam
5
Course Description
• Prerequisites: Graduate students in science or engineering.
Note: an override is necessary for non-CSE graduate students;
please send your PID & NetID to me.
• No prior knowledge of biology is required. Computationally
inclined biology graduate students are encouraged to take the
class as well.
6
Suggested books
• A.-L. Barabási, Linked: The new science of networks
• U. Alon, An Introduction to Systems Biology
• B. Palsson. Systems Biology: Properties of Reconstructed
Networks
• K. Kaneko, Life: An Introduction to Complex Systems Biology
7
Course Description
Graph model
Graph clustering subgraph mining
Protein-protein
interaction
network
Network
Biology
Gene regulatory
network
Metabolic network
Integrative study
Graph Mining
8
Date Title
1/11/11 Course Organization. Introduction.
1/13/11 Protein-Protein Interaction networks I
1/18/11 Student Presentation & Discussion 1
Topics
Introduction to Computational Network Biology
PPI network construction and false positive detection
1/20/11 Protein-Protein Interaction networks II
1/25/11 Student Presentation & Discussion 2
Topological analysis in PPI networks. Network motif.
Applications of PPI network (protein function prediction,
1/27/11 Protein-Protein Interaction networks III
network comparison)
2/1/11 Student Presentation & Discussion 3
2/3/11 Gene Correlation Networks I
Gene co-expression study
2/8/11 Student Presentation & Discussion 4
2/10/11 Gene Correlation Networks II
Gene co-regulation study
2/15/11 Student Presentation & Discussion 5
2/17/11 Gene Transcriptional Regulation Networks I cis-elements and gene co-regulation
2/22/11 Student Presentation & Discussion 6
2/24/11 Gene Transcriptional Regulation Networks II Bayesian network for GRN construction
3/1/11 Student Presentation & Discussion 7
3/3/10 Gene Transcriptional Regulation Networks III ChIP-seq and its applications in GRN construciton
3/7-11 Spring Break
3/15/11 Student Presentation & Discussion 8
3/17/11 Gene Transcriptional Regulation Networks IV GRN topological study
3/22/11 Student Presentation & Discussion 9
3/24/11 Metabolic Networks I
Flux balance analysis and metabolic control analysis
3/29/11 Student Presentation & Discussion 10
3/31/11 Metabolic Networks II
Integrative study: r-FBA model
4/5/11 Student Presentation & Discussion 11
4/7/11 Graph Mining I
Graph models
4/12/11 Student Presentation & Discussion 12
4/14/11 Graph Mining II
Graph clustering and partitioning
4/19/11 Student Presentation & Discussion 13
4/21/11 Graph Mining III
Frequent subgraph mining
4/26/11 Student Presentation & Discussion 14
9
Paper list
1.
Chua et al. Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions.
Bioinformatics (2006) 22 (13): 1623-1630.
2.
Kashani et al. Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics 2009, 10:318
3.
Deng et al. Prediction of Protein Function Using Protein–Protein Interaction Data. Journal of Computational Biology. December 2003,
10(6): 947-960.
4.
Hu et al. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics. Vol. 21 Suppl. 1
pp. i213–i221. 2005
5.
Xu et al. Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles. ICDE 2006
6.
Xu et al, Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines. PLoS Computational Biology. 5(4) 2009
7.
Huang et al. Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule
mining. Decision Support Systems. 43. 1207–1225. 2007
8.
Honkela et al. Model-based method for transcription factor target identification with limited data. PNAS vol 107(17) pp. 7793–7798.
2009
9.
Vermeirssen et al. Transcription factor modularity in a Gene-Centered C. elegans Protein-DNA interaction network. Genome Research
17, 061-1071. 2007
10.
Covert et al, Transcriptional Regulation in Constraints-Based Metabolic Models of Escherichia coli, Journal of Biological Chemistry,
277(31): pp. 28058-28064. 2002
11.
Herrgard et al. Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces
cerevisiae. Genome Research. 16:627–635. 2006
12.
Barabási et al. Network Biology: Understanding the Cell's Functional Organization. Nature Reviews Genetics 5, 101-113. 2004
13.
Dongen. A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer
Science in the Netherlands, Amsterdam, May 2000
14.
Huan et al. Mining Family Specific Residue Packing Patterns from Protein Structure Graphs, RECOMB, pp. 308-315, 2004
10
Course Description
• Select at least one paper for presentation from the paper list.
Email me which paper you will present by next Mon
(1/17/2011)
• Each presentation is 45 min, including 15 min Q&A, followed
with a discussion
• Your grade will be largely determined by the presentation
(70%)
• Presentation starts from next Tue (1/18/2011)
11
Important Days:
Class Begins
Open adds end
Last day to drop with refund
Last day to drop with no grade reported
Class Ends
1/10/2011
1/14/2011
2/3/2011
3/2/2011
5/6/2011
12
Introduction to Computational
Network Biology
• Network biology belongs to systems biology, which
belongs to genomics
• Interested in the relations between entities rather
than the entities themselves
http://bionet.bioapps.biozentrum.uni-wuerzburg.de/
13
Network’s everywhere
•
Internet, social network, anti-terrorism network
•
Biological networks
–
–
–
–
–
–
Protein-protein interaction (PPI) network
protein-DNA interaction network
gene correlation network
gene regulatory network
metabolic network
signaling network…
•
Network is a tool for under standing complex systems
•
Network models explains network properties and support network behavior
study
•
Network measures provide quantitative analysis for complex systems
14
Definition of network (graph)
Self-loop
Multi-set of edges
Edge
G(V,E)
Node (vertex)
Simple graph: does not have loops (self-edges)
and does not have multi-edges.
15
Definition of network (graph)
Directed graph
vs.
Undirected graph
Labeled graph
vs.
Unlabeled graph
Symmetric graph
vs.
Asymmetric graph
16
Webpage layout
Pages on a web site
and the hyperlinks
between them
17
M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 2004
Adopted from R Albert’s
18 slides
Biological networks
19
Yeast Protein-Protein Interaction network
Hawoong Jeong
20
Gene regulation network
of sea urchin
Eric Davidson
21
Metabolic flux analysis of E. coli
Abhishek Murarka
22
Why study networks?
• Complex systems cannot be described in a reductionist view
• Behavior study of complex systems starts with understanding the network
topology
• Network - related questions:
– How do we reconstruct a network?
– How can we quantitatively describe large networks?
– How did networks get to be the way they are?
23
Simple measures
• Node Degree: the number of edges connected to the node
– In-degree & Out-degree
– Total in-degree == total out-degree
• Average Degree: the average of node degrees for all the nodes in the
network, denoted as:
where N is the number of nodes in the network, ki is the
node degree of node i
• Degree distribution: the degree distribution P(k) gives the fraction of
nodes that have k edges
24
Simple measures
• Shortest path: to find a path between two nodes such that the
sum of the weights of its constituent edges is minimized
• Graph diameter: the longest shortest path between any pair
of nodes in the graph.
• Connected graph: any two vertices can be joined by a path
• Bridge: if we erase the edge, the graph becomes disconnected
25
Simple measures
•
Betweenness centrality: for all node pairs (i, j), find all the shortest paths between
nodes i and j, denoted as C(i,j), and determine how many of these pass through
node k, denoted as Ck(i,j). Betweenness centrality of node k is
•
Calculating the betweenness involves calculating the shortest paths between all
pairs of vertices on a graph. O(V2logV + VE) for sparse graph with Johnson’s
algorithm.
L. C. Freeman, Sociometry 40, 35 (1977); P. E. Black, Dictionary of Algorithms and Data Structures (2004)
26
Complex measures
• Frequent subgraph mining
• Graph comparison & classification
• Graph isomorphic testing
28
Useful software
• Visualization & Topological Analysis
– Cytoscape (www.cytoscape.org)
– Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek)
• Graph related programming
– LEDA (www.algorithmic-solutions.com)
– Nauty
(www.cs.sunysb.edu/~algorith/implement/nauty/impleme
nt.shtml)
29
1960
1999
2002
Real networks are much more
complex
• Transcription regulatory networks of Yeast and E. coli show an
interesting example of mixed characteristics
– how many genes a TF interacts with
- scale-free
– how many TFs interact with a given gene
- exponential
Modularity and network motif
• Cellular function are likely to be carried out in a highly
modular manner
• Modular -- a group of genes/proteins that work together to
achieve distinct functions
• Biology is full of examples of modularity
Remaining challenges
• Discovery of network motifs is closely related to the
generation of random networks
• Structure of network motifs does not necessary determine
function
• Relation between higher-level organizational, functional
states and networks has not yet been studied
Voigt, W. et al. Genetics 2005
Ingram P.J.et al. BMC Genomics 2006
Eric Werner. Nature 2007
Next class
• PPI network construction
• False-positive detection
38