2009-10-21-NCBO-Twigger - National Center for Biomedical
Download
Report
Transcript 2009-10-21-NCBO-Twigger - National Center for Biomedical
Accelerating Candidate Gene Discovery
through Ontological Indexing of Large
Scale Data Repositories
Simon Twigger, Ph.D.
MCW Department of
Physiology
Human & Molecular
Genetics Center
http://rgd.mcw.edu
Meet the client
Rat researchers
ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene expressed in?
What expression data is
known for SD (aka
Are any of these genes
SD/NHsd, Harlan
associated with my
Sprague Dawley,
phenotype?
Sprague Dawley) rats?
Has this gene been seen in the brain?
What rat expression studies have been done on
Mammary Cancer(aka breast neoplasms/breast
cancer/cancer of the breast, breast carcinoma...)?
Biological Data Warehouse
Really important piece of data...
Problem...
Where,
what,
when?
+
(one) Solution?
Where,
what,
when?
+
How to create the
index?
Examine One by One?
Analysis of anterior pituitary glands of ACI,
Copenhagen, and Brown Norway males
following treatment with the synthetic
estrogen diethylstilbestrol (DES).
Copenhagen = COP
Brown Norway = BN
NCBO ontology services
http://bioportal.bioontology.org/annotator
Open Biomedical
Annotator
http://www.bioontology.org/wiki/index.php/Annotator_Web_service
Initial Ontologies &
Workflow
•
•
•
Datasets
Series
Samples
Phase 1
Small Scale Testing
Initial Test Load:
30 Rat Dataset records (GDS) out of 236
32 Series records (GSE) out of 750
587 Sample records (GSM) out of 7288
RubyOnRails web application to view
data
http://gminer.mcw.edu/
Parallel Annotation Workflow
Concurrent Annotation
Results
August
October
Cloud-enabled Workflow?
Results/Demo
Initial Observations Synonyms
DES
Ept6
Searching with synonyms can be great:
Ept6 = ACI.COP-(D3Mgh16D3Rat119)/Shul
DES = Diethylystilbestrol
Initial Observations Synonyms
Searching with synonyms can cause problems:
Estrogen-induced pituitary tumorigenesis = EPT
Ethanolaminephosphotransferase activity = EPT
Initial Observations 2
Rat Strain symbols
AT, AN, AS, A, B, CD
G (1000 x g)
C (˚C)
TX (Abbreviation for
Texas)
Train classifier on real strain
phrases? Look for relevant
neighboring terms?
...pituitary gland of the ACI, Copenhagen and Brown Norway Rat.
...16 month-old Sprague-Dawley females that...
...expression data from female SD rats with access to lifelong...
...Strain or Line: F344/NCrl ...
...dahl Salt-sensitive (S) rat and S.R(9)x3A congenic rat....
...kidneys from Dahl salt-sensitive males...
Initial Observations Anatomy
In GEO records
White Adipose Tissue
Brown Adipose Tissue
Ulnar bone
Skeletal Muscle
Anterior Pituitary
Calvarial Bone
Left Ventricle
Corresponding MA term
White Fat
Brown Fat
Ulna bone
Set of Skeletal Muscle
Anterior Pituitary Gland
Chondrocranium
Heart Left Ventricle
Potential synonyms that could be added to MA
Phase 2
All Rat Affy Samples
1 ontology (Anatomy)
Larger scale data load
0 Rat Dataset records (GDS)
479 Series records (GSE)
12,012 Sample records (GSM)
Targeted Indexing
Mouse Adult
Gross Anatomy
Ontology
Results/Demo
Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
Alb
Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
+
Scgb2a1
Alb
Hbb
is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney
Human (U133, U133v2.), Mouse (430, U74, U95) and Rat
(U34a/b/c, 230, 230v2)
62,000 samples x ca. 25,000 genes/sample = 1.5B data points
Probeset results on GMiner
Gabdr
Probeset results on GMiner
RDF Data integration
Probeset
to MA
Rat Genes
& xrefs
Probeset to Mouse Anatomy
Ontology
RGD ID
OpenRDF Sesame
Virtuoso Open Source
Triple Store
Ongoing
• Work on term recognition, strains, etc.
• Evaluation of Probeset-to-Anatomy
results
• Curation interface to add additional
terms
• RDF formats, Triple Store
implementation
• Integrate Strain and tissue results into
RGD
Education &
Outreach
Meet the student
Video #3 is being shot this week
Future Videos
Target is the scientist!
• Solve common tasks
• Use annotation tools
• Evaluate annotations
• Intro to specific ontologies
• Interview ontology teams
• Ideas?
• What does your community
Acknowledgements
•
•
•
•
•
•
Joey Geiger - Development of GMiner
Jennifer Smith - Video creation, data curation
Rajni Nigam - Rat Strain Ontology
Clement Jonquet - NCBO OBA tools
Trish Whetzel - Video script feedback
Mark Musen & NIH Roadmap Initiative - Our
Funding!
Links
•
•
•
•
http://twigger.hmgc.mcw.edu/ncbo/
http://gminer.mcw.edu
Project webpage
Web application
http://github.com/mcwbbc/gminer
Gminer Code
http://github.com/simont/MCW-RDF
[email protected]
RDFizer code