Core 2 first round

Download Report

Transcript Core 2 first round

Core 2: Bioinformatics
CBio-Berkeley
Outline
• Berkeley group background
• Core 2 first round
– what: aims, milestones
– how: software lifecycle, interaction w/ other
cores
• Current progress
• Discussion
Berkeley group: genomics
• Formerly BDGP (Berkeley Drosophila
Genome Project) Informatics
– Genome sequencing, analysis and
annotation
– Genomic application development
– Database development
• FlyBase
• Generic Model Organism Database
Apollo
GBrowse
In-situ expression database
Genomics applications
• GadFly
– analysis and annotation database
– pipeline software
• BOP
– computational analysis integration
• CGL
– Comparative Genomics Software Library
SO and SOFA
• Sequence Ontology for Feature
Annotation
• Ontology for genomics
– Sequence feature classes:
• mRNA, intron, UTR, sequence_variant, …
– Sequence feature relations
• exon part_of transcript
• polypeptide derives_from mRNA
Chado
• Model organism relational database schema
– FlyBase, GMOD
• Modules
–
–
–
–
–
–
–
sequence annotations
expression
map
genotype
phenotype
ontology/cv
…
• Generic schema
– Uses ontologies for strong typing
Berkeley group: GO
• Gene Ontology - Informatics
–
–
–
–
Database, web portal
Ontology editing tools
Ontology QC and integration
OBO
OBO-Edit (formerly DAG-Edit)
AmiGO and GO Database
Obol
• Problem: large ontologies of composite terms
are difficult to manage
• Solution: partial automation (reasoners)
• Requires logical definitions
– how do we obtain them?
• Solution: Obol
– Parses logical definitions from class names
– Logical definitions can be reasoned over
• detect errors and automation
– Integrates OBO ontologies
OBO Relations Ontology
• Common relations used across ontologies
must mean the same thing
–
–
–
–
–
is_a
part_of
derives_from
has_participant
…
• OBO relations ontology provides precise
definitions
– defines class-level relations in terms of their
instances
• http://obo.sourceforge.net/relationship
– collaboration with core5, Manchester & others
Outline
• Berkeley group background
• Core 2 first round
– what: aims, milestones
– how: software lifecycle, interaction w/ other
cores
• Current progress
• Open questions
Core 2 specific aims
•
Aims
1.
2.
3.
4.
•
Capture and describe data
Reconcile annotation and ontology changes
Store, view and compare annotations
Link disease genes
First round
– phenotypes: Fly and Zebrafish
– HIV clinical trial data
Aim 1: Capture and describe
data
• Phenotype data capture
– OBO-Edit plug-ins
– Combine classes from multiple ontologies
• PATO, anatomical ontologies
– NLP tools?
• Clinical trial data capture
– what are the appropriate tools?
Aim 1: Capture and describe
data
• Zebrafish, fly
– PaTO: Phenotype and trait ontology
• phenotype ‘primitives’
–
–
–
–
‘Entity-Attribute-Value’ model
Phenotype ontologies
Genetic data
Orthologs
• Clinical trial data
– generic instance model
– what are the appropriate ontologies here?
PATO
• An ontology of attributes and attribute
values
– e.g. morphology, structure, placement
• Current status of PATO?
– needs work to conform to sound ontology
principles
• definitions
• formalisation of attributes
– working with core3-cambridge (Gkoutos)
and core5 (Neuhaus)
Phenotype annotation
• Entity-attribute structured annotations
– Entity term; PATO term
•
•
•
•
•
brain FBbt:00005095; fused PATO:0000642
gut MA:0000917; dysplastic PATO:0000640
tail fin ZDB:020702-16; ventralized PATO:0000636
kidney ZDB:020702-16; hypertrophied PATO:0000636
midface ZDB:020702-16; hypoplastic PATO:0000636
• Pre-composed phenotype terms
– Mammalian Phenotype Ontology
• “increased activated B-cell number” MPO:0000319
• “pink fur hue” MPO:0000374
Example (Fly)
A481G
Gene: Jra
Allele: Jra[bZIP.Scer\UAS]
Allele Description:
bZIP
defects in head and dorsal cuticle.
Scer\GAL4[hs.PB] induces…..
Entity
Attribute
Value
Background/
Environment
embryp
viability
lethal
Scer\GAL4[hs.P
B]
dorsal cuticle
shape
abnormal
…
…
…
…
wing vein L2
shape
branched
temperature
sensitive
Genotype-Phenotype
datamodel
• Need to model complex genotypes
• Environment
• Phenotype
– E-A-V is not enough
• Relational attributes
• Complex phenotypes
• Measurements and assays
– CSHL 2005 Phenotype meeting
Aim 2: Reconcile annotation
and ontology changes
• Ontology evolution can trigger
annotation changes
• Identifiers
– all classes and annotations will have stable
identifiers
– Cores 1 and 2 to decide on identifier model
• LSID URNs
• OntoTrack
Aim 3: Store, view and
compare annotations
• OBO: ontologies
• OBD: data annotated using ontologies
– genotype-phenotype
– clinical trials
– others
OBD: A Database for OBO
• Data warehouse
– collected from MODs and other sources
• Annotation versioning
• Generic data model
– Any data typed by OBO classes can be stored
• Specific annotation data views
– Clinical trial data view
– Phenotype data view
• Chado-compliant
• Entity-attribute-(value) model
Key technologies
• ‘Semantic Web’ database technology
– ontology-aware
• ontologies are part of meta-model
• higher level query languages
– SPARQL, SeRQL, …
• tool interoperability
– Protégé-OWL, Jena, ..
– SQL compatibility
• optionally layered on relational model
– Standards? Maturity?
• Many implementations
– Sesame, Kowari,
Aim 3: Store, view and
compare annotations
• Browsing
– AmiGO-2
• Advanced visualization
– work with core 1 (University of Victoria)
Comparing annotations
• process vs state
– regulatory processes:
• acidification of midgut has_quality reduced rate
• midgut has_quality low acidity
• development vs behavior
– wing development has_quality abnormal
– flight has_quality intermittent
• granularity (scale)
– chemical vs molecular vs cell vs tissue vs
anatomical part
Integrating anatomical
ontologies
• Annotations should be comparable between species
– phenotype annotations are composed of anatomical terms
• Multiple species-centric anatomical ontologies
– Problem: how do we compare across species?
– XSPAN (Bard et al): creating mappings
– Core 1: ontology mappings
Aim 4: Linking disease genes
• Homology data
– Orthologous genes
• Genomic data
– SNPs, sequence variants
• Ontologies
– Disease ontologies
– Semantic similarity
– Ontology integration
• Obol, XSPAN
Linking disease to phenotype
• Relationship of phenotype to diseases and
disorders
– essentialist
– statistical
• Disease ontologies
– OBO disease ontology (Northwestern)
– EVOC disease ontology (EVOC)
– Others
• Disease ontology workshop (core 5)
– November 2006
Outline
• Berkeley group background
• Core 2 first round
– what: aims, milestones
– how: software lifecycle, interaction w/
other cores
• Current progress
• Open questions
Software lifecycle
• Software is developed in phases
• Different phases require interaction with
different cores
• Iterative “Agile” methodology
– fast cycles
– involve ‘customer’ (core3) at all phases
Outline
• Berkeley group background
• Core 2 first round
– what: aims, milestones
– how: software lifecycle, interaction w/ other
cores
• Current progress
Current progress
• Meetings
– CSHL November 2005
• Phenotype ontology meeting
• Phenotype tools workshop
– Berkeley, UVic, Core 3
• OBO-Edit complex class plug-in
• Phenotype browser prototype
• Genotype-Phenotype datamodel
OBO-Edit complex class plugin
• Combinatorial composition of classes
• Current use-cases:
– plant anatomical structures
– integrating GO and OBO-Cell
• Ideal for phenotype classes
– extend to make ‘phenotype’ plug-in
OBD Progress
• Genotype-Phenotype data model
defined
• Prototype implemented
• evaulating technologies
Phenotype browser
• Experimental branch of AmiGO code
• Allows browsing and querying of
combinatorial phenotype annotations
• Experimental dataset
• Demo
– http://yuri.lbl.gov/amigo/obd