From Functional Genomics to Physiological Model: the
Download
Report
Transcript From Functional Genomics to Physiological Model: the
Prioritization of Avian
GO Annotation
Structural Annotation
Genome No. Entrez
Species Build2
Genes
Human
Mouse
Rat1
Chicken
36.3
37.1
3.4
2.1
36,437
64,018
49,516
19,9793
No. Proteins % predicted
(NRPD)
proteins
415,830
228,696
108,069
31,8193
4.91
9.28
29.99
46.624
proteins/gene
11.41
3.57
2.18
1.595
NRPD: Non-redundant Protein Database
1. The
rat genome was published only 8 months prior to the chicken genome, yet rat has
2x as many genes in Entrez Gene and 3x as many proteins.
2. After two genome builds chicken still has 5% of genomic sequence that has not been
assigned a chromosome and mini-chromosomes have not been sequenced.
3. Chicken genes and proteins are under-represented in public databases.
4. Of the chicken proteins available from NRPD, almost half are predicted based upon
computational analysis.
5. On average chicken has only 1 protein per gene so very little is known about isoforms
and alternate transcripts in the chicken gene products.
Phase 1: “Breadth”
7, 478 Chicken entries in UniProtKB
GOA provides
IEA mapping for UniProtKB entries
Initial strategy for AgBase biocurators was to
add GO to chicken gene products that had
none.
Since 46% of the chicken proteins in NRPD
were predicted, they would have no GO
IEA,
ISS, ISO….
Functional Annotation
100
80
% of gene
products
annotated
no GO
60
computational GO
AgBase
40
manual GO
20
0
Human
Mouse
Rat
Chicken
the proportion of GO for chicken is over-represented
because of their under-representation in public databases
Phase 2: “Depth”
What are the
community needs?
GO Annotation of Arrays
DelMar14K, FHCRC, Tgu array
44K Agilent oligo array
AIIM array, Affymetrix
Should we be focusing on arrays?
What arrays should we do?
GO Annotation Priorities?
Provide “breadth” of coverage
Annotate products represented on arrays
Reference Genome targets
Subject areas (immunity,
nutrition/metabolism, development
Ad hoc as requested