Gene Ontology-based hypothesis testing

Download Report

Transcript Gene Ontology-based hypothesis testing

Examples of functional
modeling.
NCSU GO Workshop
29 October 2009
Tools and materials from this workshop will
be available online at the AgBase database
Educational Resources link.
 For continuing support and assistance
please contact:
[email protected]

This workshop is supported by USDA CSREES grant number MISV-329140.
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
#11
#12
#13
#14
#15
#16
#17
#18
#19
#20
#21
#22
#23
#24
#25
#26
#27
#28
#29
#30
#31
#32
#33
Reference
Accession Peptides (Hits)
Sc
ALBU_CHICK Serum albumin precursor (Alpha-livetin) (Allergen Gal d 5)
113575 255 (244 6 1 3 1
APA1_CHICK Apolipoprotein A-I precursor (Apo-AI)
113990 94 (84 4 4 2 0)
FIBA_CHICK Fibrinogen alpha/alpha-E chain precursor [Contains: Fibrinopeptide
1706798 40 (37 1 1 0 1)
Mol_id: 1; Molecule: Ovotransferrin; Chain: Null; Synonym: Conalbumin; Hetero1127086 34 (30 3 0 1 0)
PB2 protein [Influenza A virus (A/chicken/Taiwan/7-5/99(H6N1))]
9954387 28 (0 22 3 2 1)
C Chain C, Crystal Structure Of Native Chicken Fibrinogen
8569623 15 (15 0 0 0 0)
I50711 complement C3 precursor - chicken
2118406 11 (9 2 0 0 0)
TTHY_CHICK Transthyretin precursor (Prealbumin) (TBPA)
136463 10 (10 0 0 0 0)
TIM2_CHICK Metalloproteinase inhibitor 2 precursor (TIMP-2) (Tissue inhibitor 3122960 13 (0 11 1 0 1)
AAA6469
3645997 10 (9 0 1 0 0)
MYH9_CHICK Myosin heavy chain, nonmuscle (Cellular myosin heavy chain) (NMMHC)
127759 14 (5 0 5 3 1)
S19188 myosin-V - chicken
104779 13 (4 3 3 2 1)
FIBB_CHICK Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]
399491 9 (8 1 0 0 0)
A Chain A, Crystal Structure Of Wild Type Turkey Delta 1 Crystallin (Eye Lens
14278427 12 (0 2 10 0 0)
type I polyketide synthase AVES 2 [Streptomyces avermitilis MA-4680]
29827480 8 (6 1 0 1 0)
Hyperion protein, 419 kD isoform [Gallus gallus] 0
4582571 11 (0 5 3 3 0)
vitronectin [Gallus gallus] ovirus 3]
1922282 7 (7 0 0 0 0)
CA36_CHICK Collagen alpha 3(VI) chain precursor
1345652 10 (3 2 3 1 1)
paired-type homeobox Atx [Gallus gallus] I beta su
18252581 11 (0 4 5 1 1)
I51298 transforming protein sno-N - chicken
2147397 7 (6 1 0 0 0)
TP2A_CHICK DNA topoisomerase II, alpha isozyme
13959708 7 (5 2 0 0 0)
ITA6_CHICK Integrin alpha-6 precursor (VLA-6)
124948 12 (0 5 1 4 2)
glucose regulated thiol oxidoreductase protein precursor [Gallus gallus]
22651801 11 (1 3 4 0 3)
spectrin alpha chain [Gallus gallus] rsor
1334744 9 (3 2 1 2 1)
ATP-binding cassette transporter 1 [Gallus gallus]
18028983 9 (2 3 2 1 1)
cone-type transducin alpha subunit [Gallus gallus]
11066401 8 (1 6 0 1 0)
condensin complex subunit [Gallus gallus] s] hick
26801168 12 (0 2 4 4 2)
BA2B_CHICK Bromodomain adjacent to zinc finger domain 2B (Extracellular matri
22653663 8 (3 1 3 1 0)
ryanodine receptor type 3 [Gallus gallus]
1212912 9 (0 5 2 1 1)
type I polyketide synthase AVES 4 [Streptomyces avermitilis MA-4680]
29827484 7 (2 4 1 0 0)
structural muscle protein titin [Gallus gallus] n k
7024535 9 (0 5 2 0 2)
breast cancer susceptibility protein [Gallus gallus]
19568157 7 (4 1 1 0 1)
FAS_CHICK Fatty acid synthase [Includes: EC 2.3.1.38; EC 2.3.1.39; EC 2.3.1.41
1345958 8 (1 4 1 1 1)
"Today’s challenge is to realise
greater knowledge and
understanding from the data-rich
opportunities provided by modern
high-throughput genomic
technology."
Professor Andrew Cossins,
Consortium for Post-Genome Science, Chairman.
Bio-ontologies

Bio-ontologies are used to capture biological
information in a way that can be read by both
humans and computers.
 necessary for high-throughput “omics” datasets
 allows data sharing across databases

Objects in an ontology (eg. genes, cell types, tissue
types, stages of development) are well defined.

The ontology shows how the objects relate to each
other.
What is the Gene Ontology?
“a controlled vocabulary that can be applied to all organisms
even as knowledge of gene and protein roles in cells is
accumulating and changing”
the de facto standard for functional annotation
 assign functions to gene products at different levels, depending
on how much is known about a gene product
 is used for a diverse range of species
 structured to be queried at different levels, eg:
 find all the chicken gene products in the genome that are
involved in signal transduction
 zoom in on all the receptor tyrosine kinases
 human readable GO function has a digital tag to allow
computational analysis of large datasets

COMPUTATIONALLY AMENABLE ENCYCLOPEDIA OF
GENE FUNCTIONS AND THEIR RELATIONSHIPS
Who uses GO? http://www.ebi.ac.uk/GOA/users.html
Use GO for…….
1.
2.
3.
4.
Determining which classes of gene
products are over-represented or underrepresented.
Grouping gene products.
Relating a protein’s location to its function.
Focusing on particular biological pathways
and functions (hypothesis-testing).
Translation to clinical research: Pig
Global mRNA and protein expression was measured Bindu Nanduri
from quadruplicate samples of control, X- and Y-treated tissue.
Differentially-expressed mRNA’s and proteins identified from
Affymetrix microarray data and DDF shotgun proteomics using
Monte-Carlo resampling*.
* Nanduri, B., P. Shah, M. Ramkumar, E. A. Allen, E. Swaitlo, S. C. Burgess*, and M. L. Lawrence*. 2008.
Quantitative analysis of Streptococcus Pneumoniae TIGR4 response to in vitro iron restriction by 2-D LC ESI
MS/MS. Proteomics 8, 2104-14.
Using network and pathway analysis as well as Gene Ontologybased hypothesis testing, differences in specific phyisological
processes between X- and Y-treated were quantified and
reported as net effects.
Proportional distribution of mRNA functions
differentially-expressed by X- and Y-treated tissues
Treatment Y
Treatment X
immunity (primarily innate)
inflammation
Wound healing
Lipid metabolism
response to thermal injury
angiogenesis
Total differentially-expressed
mRNAs: 4302
Total differentially-expressed
mRNAs: 1960
Net functional distribution of differentially-expressed mRNAs:
X- vs. Y-Treatment
X
Y
sensory response to pain
angiogenesis
response to thermal injury
Lipid metabolism
Wound healing
classical inflammation
(heat, redness, swelling, pain, loss
of function)
immunity (primarily innate)
35
30
25
20
15
10
Relative bias
5
0
5
Proportional distribution of protein functions
differentially-expressed by X- and Y-treated tissues
Treatment Y
Treatment X
immunity (primarily innate)
inflammation
Wound Healing
Lipid metabolism
response to Thermal Injury
Angiogenesis
hemorrhage
Total differentially-expressed
proteins: 509
Total differentially-expressed
proteins: 433
Net functional distribution of differentially-expressed Proteins:
X- vs. Y-Treatment
hemorrhage
sensory response to pain
Treatment X
Treatment Y
angiogenesis
response to thermal injury
lipid metabolism
Wound healing
classical inflammation
(heat, redness, swelling, pain, loss of function)
immunity (primarily innate)
8
6
4
2
0
Relative bias
2
4
6
B-cells
Stroma
apoptosis
immune response
cell-cell signaling
(Looking at function, not gene.)
Relating a protein’s location to its function.
Focusing on particular biological pathways and functions (hypothesis-testing).
Shyamesh Kumar BVSc
a-CD30 mab
The critical time point in MD
lymphomagenesis
Susceptible (L72)
18
mean total lesion score
16
Genotype
14
Susceptible (L72)
Resistant (L61)
12
10
Resistant ( L61)
Non-MHC associated resistance and
susceptibility
8
6
4
2
0
0
20
40
60
days post infection
Burgess et al,Vet Pathol 38:2,2001
80
100
a-CD8 mab
Marek’s Disease Lymphoma Model : Chicken
Tissue
CD30hi, Neoplastically-transformed
CD30 lo/- hyperplastic
The neoplastically-transformed (CD30hi) cells in Marek’s disease lymphoma cell phenotype most
closely resembles T-regulatory cells. LA Shack, T. Buza, SC Burgess. Cancer Immunology and
Immunotherapy, 2008
Whole tissue mRNA expression
L6 (R)
40 – mean Ct value
25
20
*
*
*
L7 (S)
*
*
15
10
5
0
mRNA
Microscopic lesion mRNA expression
L6 (R)
40 – mean Ct value
25
20
*
L7 (S)
*
15
10
*
*
*
5
0
IL-4
IL-12
IL-18
TGFβ
mRNA
GPR-83 SMAD-7 CTLA-4
CYTOKINES AND T HELPER CELL DIFFERENTIATION
NAIVE
CD4+ T
CELL
APC
Th-1
T reg
Th-2
L6 Whole
NAIVE
CD4+ T
CELL
APC
L7 Whole
T reg
Smad 7
L7 Micro
IL 12
IL 4
Th-1, Th-2, T-reg ?
Th-2
Th-1
Inflammatory?
IL 4
IFN γ
IL 12
IL10
TGFβ
IL 18
CTL
Macrophage
NK Cell
Step I. GO-based Phenotype Scoring.
Gene product
Step III. Inclusion of quantitative
data to the phenotype scoring table
and calculation of net affect.
Gene product
Th1
Treg
Inflammation
1.58
-1.58
0.00
0.00
0.00
0.00
-1.20
1.20
-1.20
Th1
Th2
Treg
Inflammation
IL-2
1
ND
1
-1
IL-2
1.58
IL-4
-1
1
1
ND
IL-4
0.00
1
-1
1
IL-6
IL-6
Th2
IL-8
ND
ND
1
1
IL-8
0.00
0.00
1.18
1.18
IL-10
-1
1
1
0
IL-10
0.00
0.00
0.00
0.00
IL-12
0.00
0.00
0.00
0.00
IL-13
1.51
-1.51
0.00
0.00
Step II.
by quantitative
IL-12Multiply
1
-1
ND
ND
IL-13
-1
1
ND
ND
data for
each
gene
product.
IL-18
1
1
1
1
IL-18
0.91
0.91
0.91
0.91
IFN-g
1
-1
1
1
IFN-g
0.00
0.00
0.00
0.00
TGF-b
-1
0
1
-1
TGF-b
-1.71
0.00
1.71
-1.71
CTLA-4
-1
-1
1
-1
CTLA-4
-1.89
-1.89
1.89
-1.89
GPR-83
-1
-1
1
-1
GPR-83
-1.69
-1.69
1.69
-1.69
SMAD-7
1
1
-1
1
SMAD-7
0.00
0.00
0.00
0.00
Net Effect
-1.29
-5.38
10.15
-5.98
ND = No data
Whole Tissue
L7 (S)
L6 (R)
120
100
Net Effect
80
60
40
20
0
-20
-40
Th-1
Th-2
T-reg
Inflammation
Microscopic lesions
L6 (R)
60
5mm
L7 (S)
50
Net Effect
40
30
20
10
0
-
10
-
20
Th-1
Th-2
T-reg
Inflammation
Phenotype
Key points
• Modeling is subordinate to the biological
questions/hypotheses.
•Together the Gene Ontology and canonical genetic
networks/pathways provide the central and
complementary foundation for modeling functional
genomics data.
• The strategy you use to model your data will depend
upon
• what information is readily available for your species
of interest
• what biological system you are studying