ABM2008_LitCur

Download Report

Transcript ABM2008_LitCur

LITERATURE CURATION
Overview & Integrated Phenotype
Curation
SAB 2008
Literature Curation - Data Flow for First Pass
Paper
collection
First Pass
Data flagged with comments for
32 different data types
Web
Interface
Postgres DB
Some data types
stored for
future curation
Active Curation:
Data Extraction
Database Input Files
St.Louis DB
Caltech DB Sanger DB
Local Databases
SAB 2008
Complete
DB
pro
duc
Tra
dat
a
RNA
si on
i
n sg
en e
t i nt
e
rac
Mut
tion
an t
s
phe
not
yp e
Gen
e fu
S eq
n ct
i on
u en
Gen
c
e
e-g
ch a
en e
nge
i
n
ter a
Gen
ct io
e-se
ns
q, g
Ant
en e
ibo
n am
dy
e, s
yno
n ym
Gen
e re
Stru
gula
ctu
t ion
r
e co
Site
rr ec
of a
t ion
cti o
n an
al ys
Ove
is
rex
pre
ssio
n
N
ew
S eq
al le
u en
le
Pro
ce f
tein
eatu
fun
res
ct io
ns i
n vi
tro
Map
pin
gd
Cel
ata
Stru
l (n
ame
ctu
r
,f un
al in
cti o
fo
n, a
bl at
ion
)
Mi c
r
o
arra
Mos
y
aic
Cov
ana
al en
l ysi
tm
s
odi
fi ca
tion
SNP
s
Mas
s
R
Fun
S pe
NAi
cti o
c
(lar
nal
g eco m
sca
le)
plem
en t
ati o
n
Che
m
Hum
i cal
s
an d
i sea
ses
Gen
e
E xp
r es
First-Pass Curation Fields (based on 5787 Papers)
1600 1546
1291
1200
958
800
400
917
877
811 794 789
697
493 472
320 293 281 279
194 172 166
SAB 2008
147 140
93 81 57
41 34 20 18 18 12
0
Objects in
WS170
Gene Identity and Function:
Mutant Phenotype (total alleles)
RNAi (Large and small scale)
Overexpression
Nomenclature Data
Interactions:
Genetic interactions
Gene Product Interaction (Y2H)
Cell Data:
Cell Function (ablation and mosaics)
Gene Expression and Function:
Expression Data
Gene Regulation on Expression Level
Microarray
Sequence Data:
Feature Data***
Sequence Change
Reagents:
Transgene
C elegans Antibodies
Concise Description: Total Descriptions
Genes w/> 5 references
Genes w/> 1 reference
Gene Ontology: Total GO annotations
Total non-IEA GO annotation
* Based on first pass papers completed unless otherwise noted
** includes one tiling array
*** Data from Sanger RT, - not only first pass
2736
64461
5
Objects in
WS190/91
%
Change
4675
74427
9
71%
15%
80%
Data from 78 papers since WS 170
%
Complete*
20%
53%
< 1%
100%
4920
11573
6795
11573
38%
0
183
NA
15%
6355
642
40
9744
2044
53**
53%
218%
33%
100%
100%
71%
Sanger Request Tracker - 67 since WS 170
Start up phase
100%
Data from 97 papers since WS 170
4062
1084
5151
1324
27%
22%
100%
100%
4398
5335
21%
7%
5%
85%
53%
75,065
25,634
141,937
33,045
† Outside of first pass
SAB 2008
47%
22%
Many Data Types Include a Phenotype Assignment
Gene Identity and Function:
Mutant Phenotype (total alleles)
RNAi (Large and small scale)
Overexpression
Nomenclature Data
Interactions:
Genetic interactions
Gene Product Interaction (Y2H)
Cell Data:
Cell Function (ablation and mosaics)
Gene Expression and Function:
Expression Data
Gene Regulation on Expression Level
Microarray
Sequence Data:
Feature Data***
Sequence Change
Reagents:
Transgene
C elegans Antibodies
Concise Description:
Total Descriptions
Genes w/> 5 references
Genes w/> 1 reference
Gene Ontology:
Total GO annotations
Total non-IEA GO annotation
Phenotype Annotations
- Consistency
- Efficiency
Phenotype Ontology
Provides a controlled vocabulary for phenotypic
descriptions, organized hierarchically
Can annotate phenotypes to a very granular level,
preserving associations with more general terms
SAB 2008
The WormBase Phenotype Ontology is Hierarchical:
SAB 2008
Annotate to a very granular level, preserving associations with more
general terms
(OBO-EDIT)
reproductive_system_development_abnormal
vulva_development_abnormal
vulva_cell_fate_specification_abnormal
vulval_cell_induction_abnormal
vulval_cell_induction_increased
multivulva
Term Name
Definition
w/ references
Multiple vulva-like protrusions
are present along the ventral side of
the animal. This is usually a result
of all six vulval precursor cells
adopting vulval (1° or 2°) fates.
Synonyms
Muv
SAB 2008
Using the Ontological Structure for Data Retrieval
Query by Name, WB ID
Vulva development
or Synonym
Also for:
Gene Ontology
Anatomy Ontology
Output: Showing children of parent term and annotations
See individual annotations with references
SAB 2008
Phenotype Ontology Overview:
Phenotype
Terms
Defined
terms
Percent
Defined
Terms used
(%)
119
0
0
---
2007
1394
237
17%
40%
Current
1677
708
42%
60%
Release
WS160 - Jul,
2006 (prior to PO)
WS170 -Feb,
Development:
Refined by usage
We will continue development in parallel w/ curation
- reflects the developing complexity with which terms are described in literature
(Currently there are 4,675 alleles curated w/ 10,468 phenotype associations ~125%  WS170)
Maintained with OBO-EDIT and registered with OBO foundry (NCBO)
OBO-Edit is developed by the Berkeley Bioinformatics and Ontologies Project, and is funded
by the Gene Ontology Consortium.
Community Input
Seek input from experts in certain fields to develop ontology
SAB 2008
Expert input leads to granularity that reflects term usage
The embryonic_lethal branch - Fabio Piano and Kris Gunsalus
annotations 355
SAB 2008
Integrated Phenotype Curation
Initial paradigm - one curator = one data type
First Pass
Gene Regulation
Interactions
(change in expression)
(genetic)
RNAi
Paper:
Chromatin regulation and sumoylation in the inhibition of Ras-induced vulval development
in Caenorhabditis elegans.
Poulin et al - EMBO J. 2005 Jul 20;24(14):2613-23.
smo-1:
RNAi  Phenotype
“RNAi of smo-1 on its own induces a low percentage of Muv animals”.
RNAi based Interaction (Enhancement)
“let-60(n2021) increase in the percentage of Muv animals compared to smo-1(RNAi) alone”
RNAi based Interaction (Synthetic)
“smo-1 displays synMuv activity in both class A and class B backgrounds”
RNAi based Gene Regulation (Ectopic)
“RNAi of the sumoylation pathway gene smo-1 leads to ectopic lag-2 expression”
SAB 2008
Need for curation integration: RNAi curation as an example
First Pass
Gene Regulation
RNAi
Interactions
RNAi based Interactions
RNAi curation form has functionality to generate interaction objects
Enter number of interacting genes
SAB 2008
“let-60(n2021) increase in the percentage of Muv animals compared to smo-1(RNAi) alone”
smo-1(RNAi)
let-60(n2021)
Enhancement
Muv
- Keep track in Postgres database - avoids redundant curation
- Currently there are 2493 RNAi-based
interactions in WormBase
SAB 2008
Coordination of RNAi based Gene Regulation
· If an RNAi object is created first:
I enter information here so that Xiaodong can create a gene
regulation object for the RNAi object
· If a gene regulation object is created first:
Xiaodong creates a gene regulation object and I input the object name here
· Currently there are 365 RNAi-based gene regulations (46%  from WS170)
· Need to set up a tracking system in Postgres
SAB 2008
Towards Integrating RNAi and Allele Curation
First pass
Alleles
RNAi
Postgres
RNAi Checkout
Allele Checkout
SAB 2008
SAB 2008
SAB 2008
SAB 2008