NetworkAnalysis_2012

Transcript NetworkAnalysis_2012

Networks
A series of entities or NODES (genes, proteins, metabolites,
individuals, ecosystems, etc, etc) and the interactions or EDGES
between them.
Directed graph
(where connections have directionality,
e.g. kinase – substrate connections)
Undirected graph
Network Analysis
Goal: to turn a list of genes/proteins/metabolites into a network to
capture insights about the biological system
Today:
1. Types of high-throughput data amenable to network analysis
2. Network theory and its relationship to biology
2
Physical Interactions: protein-protein interactions
Data from:
1. Large-scale yeast-two hybrid assay:
recovers binary (1:1) interactions
Giorgini & Muchowski
Gen. Biol.
2005
2. Protein immunoprecipitation &
mass-spec identification: recovers complexes
mass spectrometry to
identify recovered proteins
PEPTIDE
TAG
3. Literature curation
3
Nature 2005
Y2H + literature curation
Protein Arrays
Proteins or antibodies immobilized onto a solid surface
Antibody arrays: for identification & quantification of fluorescently labeled proteins
in complex mixtures … proteins bind to immobilized Ab.
Functional arrays: for measuring protein function
* ppi: detect binding of fluorescent protein to immobilized peptides/proteins
* kinase targets: detect phosphorylation of immobilized peptides/proteins by query kinase
* ligand binding: detect DNA/carbohydrate/small molecule bound to immobilized proteins
Reverse-phase arrays (lysate arrays): cells lysed in situ and immobilized cell lysate is screened
5
Challenges:
1. Large-scale protein purification
2. Protein structure/stability requirements
vary widely (unlike DNA)
3. Conditions for protein function vary widely
4. Protein epitope/binding domain must be
displayed properly
6
From Hall, Ptacek, & Snyder review 2007
High-throughput identification of gene/protein function:
Functional Genomics
Gene knock-out libraries: library of single-gene deletions for every gene
done in yeast, E. coli, other fungi/bacteria
S. cerevisiae libraries:
heterozygous deletion (nonessential genes) Strains can be phenotyped individually (screening)
OR homozygous deletion of all genes.
OR
Selected for particular phenotypes –
* Each gene replaced with a short, unique Strains surviving the selection can be readout on
‘barcode’ sequence
DNA arrays designed against
the barcode sequences
7
Yeast deletion library used to:
a) Identify ‘essential’ yeast genes and genes required for normal growth
a) Genes required for survival of particular conditions/drugs
b) Features of functional genomics, gene networks, etc
* Screened deletion libraries for >700 conditions
* Found ‘phenotype’ for nearly all yeast genes
* Characterized which genes could be functionally profiled by which assays
(e.g. phenotype, gene expression, etc)
8
Challenges:
1. Difficult to probe ‘essential’ and slow-growing strains
2. Cells likely to pick up secondary mutations to complement missing gene
(chromosomal anueploidy in yeast)
9
Science 2010
Pairwise deletions
to measure
genetic interactions
for 75% of yeast
genes
High-throughput identification of gene/protein function:
Functional Genomics
RNAi knock-down libraries (C. elegans, flies, humans)
Small double-stranded DNAs complementary to mRNA
can be injected (or fed) …
… these are targeted by the RNAi pathway to
inhibit mRNA stability/translation of target gene
… knocks down protein abundance/function
Challenges:
1. Doesn’t work for all genes/ds DNAs
2. Doesn’t work in all tissues
3. Delay in protein decrease, timing
different for different proteins
Image from David Shapiro
11
12
Insights from whole-genome knockout / knockdown studies
* Screens for genes important for specific phenotypes/processes
* Identifying off-target drug effects
* Clustering of genes based on common phenotypes from knockdowns
* Clustering/analysis of phenotypes with similar underlying genetics/processes
* Integrative analysis with genomic expression, etc
* Network analysis
13
Network structures
Random network:
Scale-free network:
Each node has roughly equal number
of connections k, distributed according
to Poisson distribution
Some nodes with few connections,
other nodes (‘hubs’) with
many connections
(distributed according to Power Law)
Directed vs. Undirected Graphs
14
Network Terminology
Connectivity (Degree) k: number of
connections of a given node
(average degree of all nodes <k>)
Degree distribution: probability that
a selected node has k connections
Shortest path l: fewest number of
links connecting two given nodes
(average shortest path <l> between all node pairs)
Clustering coefficient: # of links connecting
the k neighbors of Node X together
15
Scale-free Networks
Connectivity: most nodes have few connections
but joined by ‘hub’ nodes with many connections
‘Small world’ effect: each node can be
connected to any other node through
relatively few connections
‘Disassortative’: hubs tend NOT to
directly connect to one another
‘Robust’: network structure remains despite
node removal (up to 80% removal!)
‘Hub vulnerability’: network structure is
particularly reliant on few nodes (hubs)
16
Networks Challenges
1.
Identifying relevant subnetworks
2.
Integrating multiple data types (see #1 above)
3.
Capturing temporary interactions and dynamic relationships
4.
Using network structure/subnetworks to infer new insights about biology
Networks Challenges
1. Infer hypothetical functions based on network connectivity
2. Reveal new connections between functional groups and complexes
3. Identify motifs and understand motif behaviors (more next time)
17
http://www.cytoscape.org/
A gazillion plugins for Cytoscape …
Inferred NaCl-activated Signaling Network
430 proteins
1199 edges
starting network:
5,855 proteins
25,906 edges
Kinase
Transcription Factor
Target Gene/Module
Debbie Chasman & Mark Craven
Orthologs of human disease genes are enriched in the network
430 proteins
188 have one-to-one
human orthologs
95% of ‘reviewed’
orthologs are
disease associated
Disease-associated ortholog
Human ortholog not linked to disease

NetworkAnalysis_2012

Transcript NetworkAnalysis_2012

Directory