EXPO - Buffalo Ontology Site
Download
Report
Transcript EXPO - Buffalo Ontology Site
How Philosophy of Science Can
Help Biomedical Research
Barry Smith
http://ontology.buffalo.edu/smith
1
How to Do Biology across the
Genome?
2
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES
IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS
VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY
TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER
CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY
GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL
KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC
ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC
KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD
NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI
SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK
TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW
MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY
ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS
RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG
sequence of X chromosome in baker’s yeast
3
TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR
KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL
SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM
FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA
CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC
TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR
RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP
NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS
RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS
FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI
YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV
RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS
QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF
NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV
WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG
LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE
RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST
NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT
TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS
ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN
SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN
MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL
AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR
GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG
GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM
LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG
4
RFDILLCRDSSREVGE
5
Stelzl et al., Cell, 20056
network of gene interactions in E. coli
http://moebio.com/santiago/gnom/ingles.html
8
9
what cellular component?
what molecular function?
what biological process?
10
11
12
The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
13
The Idea of Common Controlled Vocabularies
GlyProt
MouseEcotope
Holliday junction
helicase complex
DiabetInGene
GluChem
14
Gene Ontology
male courtship behavior,
orientation prior to leg
tapping and wing vibration
15
Benefits of GO
1. based in biological science
2. links data to biological reality
3. links people to software
4. links data together
• across species (human, mouse, yeast, fly ...)
• across granularities (molecule, cell, organ,
organism, population)
16
The goal
all biological (biomedical) research data
should cumulate to form a single,
algorithmically processible, whole
http://obofoundry.org
17
Ontologies already being applied to
achieve this goal
Sjöblöm T, et al. analyzed 13,023 genes in 11
breast and 11 colorectal cancers
GO tells you what is standard functional information
for these genes
By tracking deviations from this standard 189 genes
could be identified as being mutated at significant
frequency and thus as providing targets for
diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
18
Towards Empirical Philosophy
•
•
•
processualist vs. 3-dimensionalist
reductionist vs. non-reductionist
realist vs. nominalist
If ontologies based on different philosophical
principles are tested for their utility in
support of scientific research, which types of
ontologies will prove most useful?
19
Some sample ontologies
Cell Ontology (CL)
Foundational Model of Anatomy (FMA)
Environment Ontology (EnvO)
Gene Ontology (GO)
Infectious Disease Ontology
Phenotypic Quality Ontology (PaTO)
Protein Ontology (PRO)
RNA Ontology (RnaO)
Sequence Ontology (SO)
20
21
22
23
24
The problem
High throughput experimentation data is
meaningless unless the researcher is
provided with detailed information
concerning how it was obtained
25
To make experimental data computationally
accessible we need ontologies to describe
the data
(1) from the point of view of their relation
to reality
(2) from the point of view of their relation
to experiments
26
Three solutions
The MGED Ontology
OBI: The Ontology for Biomedical Investigations
EXPO: The Experiment Ontology
27
MGED (Microarray Gene
28
Expression Data) Ontology
MGED Ontology
Individual =def. name of the individual
organism from which the biomaterial was
derived
Experiment =def. The complete set of
bioassays and their descriptions
performed as an experiment for a common
purpose. ... An experiment will be often
equivalent to a publication.
29
MGED Ontology
Chromosome =Def An abstraction used for
annotation
Chromosome =Def A biological sequence
that can be placed on an array
30
OBI
The Ontology for Biomedical Investigations
with thanks to Trish Whetzel and Richard Scheuermann
31
Purpose of OBI
To provide a resource for the unambiguous
description of the components of
biomedical investigations such as the
design, protocols and instrumentation,
material, data and types of analysis and
statistical tools applied to the data
NOT designed to model biology
32
Hypothesis
That it is possible to create ontology
resources of genuine utility by drawing on
logical and philosophical principles e.g.
pertaining to consistency of definitions,
avoidance of use-mention confusions.
33
OBI Collaborating Communities
Crop sciences Generation Challenge Programme (GCP),
Environmental genomics MGED RSBI Group,
www.mged.org/Workgroups/rsbi
Genomic Standards Consortium (GSC),
www.genomics.ceh.ac.uk/genomecatalogue
HUPO Proteomics Standards Initiative (PSI), psidev.sourceforge.net
Immunology Database and Analysis Portal, www.immport.org
Immune Epitope Database and Analysis Resource (IEDB),
http://www.immuneepitope.org/home.do
International Society for Analytical Cytology, http://www.isac-net.org/
Metabolomics Standards Initiative (MSI),
Neurogenetics, Biomedical Informatics Research Network (BIRN),
Nutrigenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi
Polymorphism
Toxicogenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi
Transcriptomics MGED Ontology Group
34
OBI – Tools and Documentation
Open source, standards compliant and version management
• Ontology Web Language (OWL) using Protégé editor
• OBI.owl files are available from the OBI SVN Repository
The Problem of Clinical Investigations
Regulatory bodies such as the FDA need
to assess the evidentiary value of
enormous volumes of data collected e.g.
in trials on specific drug formulations
For this, they need to impose
standardization of terminologies used to
express these data, e.g. as developed by
the Clinical Data Interchange Standards
Consortium (CDISC)
36
37
Clinical Investigations
terminologies
“Study Design”
Descriptive research
– Case study – description of one or more patients
– Developmental research – description of pattern of change
over time
– Qualitative research – gathering data through interview or
observation
Exploratory research
– Secondary analysis – exploring new relationships in old data
– Historical research – reconstructing the past through an
assessment of archives or other records
Experimental research
– Randomized clinical trial
– Meta-analysis – statistically combining findings from several
different studies to obtain a summary analysis
“Population”
Recruited population
–
–
–
–
Randomized population
Eligible population
Screened population
Premature termination population
Excluded population
– Excluded post-randomization population
– Not-eligible-population
Analyzed population
–
–
–
–
Study arm population
Crossover population
Subgroup population
Intent-to-treat population - based on randomization
Overview of OCI
Development plan (CDISC)
Standard operating procedures (CDISC)
Statistical analysis plan (CDISC)
Meta-analysis (CDISC)
Quality assurance (CDISC)
Quality control (CDISC)
Baseline assessment (CDISC)
Validation (CDISC)
Coding (MUSC)
Permuted block randomization (MUSC)
Secondary-study-protocol (RCT)
Intervention-step (RCT)
Blinding-method (RCT)
Study design
Negative findings (MUSC)
Positive findings (MUSC)
Primary-outcome (RCT)
Secondary-outcome (RCT)
EXPO
The Ontology of Experiments
L. Soldatova, R. King
Department of Computer Science
The University of Wales, Aberystwyth
46
EXPO: Experiment Ontology
47
EXPO: Experiment Ontology
48
EXPO: Experiment Ontology
49
experimental actions part_of experimental design
subject of experiment part_of experimental design
50
Role of Philosophy of Science
EXPO: Experiment Ontology
51
Towards Empirical Philosophy of Science
•
•
•
•
•
rational statistical models of induction
case-based / domain-based reasoning
falsifiabilism
Humeanism vs. laws
logical, relative frequency, Bayesian, objective
(chance) and epistemic theories of probability
These generate different ontologies of
scientific evidence
– which one is correct?
52
Environment Ontology +
Phenotypic Quality Ontology +
Ontology for Personalized and Community
Medicine
‘Racial’ Phenotypes: Social, Phylogenetic,
Essentialistic ...
53
Ontology for Personalized and Community
Medicine
to support studies of differential effects on health
1. of environmental qualities of different neighborhoods
and
2. of different community behavior phenotypes
54