richardson_W3CSW.short

Download Report

Transcript richardson_W3CSW.short

Data Integration,
Gene Ontology, and
the Mouse*
Joel Richardson, Ph.D.
Mouse Genome Informatics Group
The Jackson Laboratory
Bar Harbor, Maine 04609
* Not necessarily in that order.
We have the human
sequence: OK, now what?

One species is not enough:



The sequence is just the beginning





model organisms (one strain is not enough)
comparative studies
sequence variants
gene regulation and interaction networks
non-coding functional elements
environmental effects
Genotype to phenotype
The Mouse





the premier animal model for studying
human disease
> 95% same genes
same diseases, similar reasons (e.g.,
cancer, hypertension, diabetes,
osteoporosis, …)
1000s lab strains, diff. characteristics
precise genetic control
The Jackson Laboratory





Private nonprofit research
institution (est. 1929)
Studying mouse as a model of
human biology and disease
National Cancer Research Center
Supplier of laboratory strains to
researchers worldwide
Areas: metabolism, development,
cancer, immune response
www.jax.org
Bar Harbor, ME 04609
Mouse Genome Informatics
(MGI)






Consortium of NIH-funded projects
Housed at TJL
Integrates and disseminates public
data resources covering selected
aspects of mouse biology
First program project funding 1989
> $10M/y total, >60 people
Online since 1994.
www.informatics.jax.org
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
MGI Concept Map
Phenotypes
Genotypes
Strains
Alleles
Expression
Data
Anatomy
Genes and
other loci
Mapping
Data
Variants
DNA and
Protein
Sequences
References
Molecular
Fragments
Accession
IDs
Integration in MGI
Identifying objects.
Resolving or noting
discrepancies.
Integration is key to
knowledge discovery
in age of genomics
The Power Of Integration:
Queries

What transcription factors are expressed in a 2-cell
embryo and not in a blastocyst?



What development QTLs contain these TFs?




integration of multiple expression assay data sets and data types.
standardization of anatomical references and developmental
stages
integration of expression data and mapping data
genetic map result of integrating lots of mapping data
What strains are distinguished by SNPs in this region?
And so on…
The MGI System
(from 40,000 feet)
Data
Downloads
Literature
Curation
Editing
Interface
Load
scripts
MGI
RDBMS
Servlets
CGI Scripts
Files
Web
SQL
Report
Scripts
Files
MGI in Context
Unigene
TIGR
GO
DoTS
Ensembl
Anatomy
Interpro
OMIM
SwissProt
RefSeq
RPCI
NCBI
I.M.A.G.E.
LocusLink
MGC
GenBank
NIA
Mutagenesis
Centers
Scientific
Literature
MGI db
RIKEN
ATCC
RatMap
Integration relies on
Standard Vocabularies

Structured vocabularies
The common semantic frameworks
 Structured into is-a/part-of hierarchies


Evidence-based annotation
Associations of vocabulary terms with objects
 Evidence (codes), citations, etc., decorate the
associations


Structured annotations and queries
Structured Vocabularies in MGI

Gene Ontology (GO)


Mammalian Phenotype (MP)


Annotations to genotypes (e.g. knockouts)
Mouse Anatomical Dictionary


Functional gene annotations
Annotations of expression
Other standardized, non-structured vocabularies





Mouse strains
cell lines
clone libraries
tissues
lots of smaller ones
Challenges



Domain very difficult to frame
Huge variability, variety of data, formats,
providors, update schedules&semantics, etc…
Biologists and Computer Scientists think
differently.


communication is paramount, but difficult
Rapid changes, e.g., in last 10 years:


genetic crosses -> YAC/BAC mapping -> RH mapping
-> genome sequence
northern blots -> microarrays -> mpss
System Evolution





The system is a software ecosystem
Maintenance is the cost of success
Changes and cost/benefit
If it ain’t broke, don’t fix it
Commitments/agenda/priorities
Credits
Richard Baldarelli
Matt Baya
Jon Beal
Dale Begley
Judy Blake
John Boddy
Dirck Bradt
Carol Bult
Nancy Butler
Donna Burkart
Jeff Campbell
Lori Corbani
Rebecca Corey
Sharon Cousins
Diane Dahmen
Harold Drabkin
Janan Eppig
Jackie Finger
David Garippa
Lucette Glass
Carroll Goldsmith
Pat Grant
Terry Hayamizu
David Hill
Jim Kadin
Ben King
Debbie Krupke
Moyha Lennon-Pierce
Jill Lewis
Ira Lu
Cathy Lutz
Lois Maltais
Prita Mani
Mike McCrossin
Louise McKenzie
David Miers
Daniel Modrusan
Dieter Naf
Li Ni
Janice Ormsby
Sridhar Ramachandran
Deborah Reed
Joel Richardson
Martin Ringwald
David Shaw
Bob Sinclair
Cynthia Smith
Connie Smith
Paul Szauter
Leslie Trombley
Pierre Vanden Borre
Michael Walker
Linda Washburn
Josh Winslow
Iry Witham
Sophia Zhu