has_part - National Center for Ontological Research

Download Report

Transcript has_part - National Center for Ontological Research

1991198283914738
Ontologies for Immunology
Barry Smith
https://immport.niaid.nih.gov/
NIAID Division of Allergy, Immunology
and Transpantation (DAIT)
Areas of Research
• Allergic Diseases
• Asthma
• Autoimmune Diseases
• Food Allergy
• Immune Tolerance
• Medical Countermeasures Against Radiological
and Nuclear Threats
• Transplantation
DAIT-Funded Projects
Depositing Data into ImmPort
•
•
•
•
Immune Tolerance Network (ITN)
Atopic Dermatitis and Vaccinia Network (ADVN)
Population Genetics Analysis Program
Immune Function and Biodefense in Children,
Elderly, and Immunocompromised Populations
• HLA Region Genetics in Immune-Mediated
Diseases
• Modeling Immunity for Biodefense
Goals of ImmPort
• Accelerate a more collaborative and coordinated
research environment
• Create an integrated database that broadens the
usefulness of scientific data
• Advance the pace and quality of scientific discovery
while extending the value of scientific data in all areas of
immunological research
• Integrate relevant data sets from participating
laboratories, public and government databases, and
private data sources
• Promote rapid availability of important findings
• Provide analysis tools to advance immunological
research
Standards and ontologies needed to
advance especially this
• Integrate relevant data sets from participating
laboratories, public and government
databases, and private data sources
SDY 165: Characterization of in vitro Stimulated
B Cells from Human Subjects shared to SemiPublic Workspace (SPW) Proje
SDY 165: Characterization of in vitro Stimulated
B Cells from Human Subjects shared to SemiPublic Workspace (SPW) Project
During the human B cell (Bc) recall response, rapid cell division
results in multiple Bc subpopulations. RNA microarray and
functional analyses showed that proliferating CD27lo cells are a
transient pre-plasmablast population, expressing genes associated
with Bc receptor editing. Undivided cells had an active
transcriptional program of non-ASC B cell functions, including
cytokine secretion and costimulation, suggesting a link between
innate and adaptive Bc responses. Transcriptome analysis suggested
a gene regulatory network for CD27lo and CD27hi Bc
differentiation.
Functionally Distinct Subpopulations of CpG-Activated Memory B
Cells, Alicia D. Henn … & Martin S. Zand, Pubmed 2246822
Figure 5:
The Fate of
CD27lo cells
ImmPort Antibody Registry
ImmPort Antibody Registry
BD Lyoplate Screening Panels Human Surface Markers
http://pir.georgetown.edu/cgibin/pro/entry_pro?id=PR:000001963
Discoverability
• Find all ImmPort studies involving genes
associated with B cell receptor editing
• Find all data in public and government
databases relating to B cell receptor editing
GOPubMed: 179 documents
(B cell receptor editing Zand) AND ("Zand"[au])
SDY 165 Discoverability
During the human B cell (Bc) recall response, rapid cell division
results in multiple Bc subpopulations. RNA microarray and
functional analyses showed that proliferating CD27lo cells are a
transient pre-plasmablast population, expressing genes associated
with Bc receptor editing. Undivided cells had an active
transcriptional program of non-ASC B cell functions, including
cytokine secretion and costimulation, suggesting a link between
innate and adaptive Bc responses. Transcriptome analysis suggested
a gene regulatory network for CD27lo and CD27hi Bc
differentiation.
NIAID Sample Data Sharing Plan
(Last Reviewed February 12, 2013)
• Sharing of data generated by this project is an essential part of our proposed
activities and will be carried out in several different ways.
• Presentations at national scientific meetings. … it is expected that approximately
four presentations at national meetings would be appropriate. …
• Annual lectureship. A lectureship has brought to the University distinguished
scientists and clinicians …
• Newsletter. The [disease interest group] publishes a newsletter …
• Web site of the Interest Group. The [interest group] currently maintains a Web
site where information [about the disease] is posted. Summaries of the scientific
presentation from the [quarterly project] meetings will be posted on this Web
site, written primarily for a general audience. [Link to Web site]
• Annual [Disease] Awareness wee k….
• SAGE Library Data. It is our explicit intention that these [Serial analysis of gene
expression] data will be placed in a readily accessible public database. …
Plan addressing Key Elements for a Data
Sharing Plan under NIH Extramural Support
(Last Reviewed August 09, 2012)
What data that will be shared [sic]: I will share phenotypic data associated with the
collected samples by depositing these data at _______which is an NIH-funded repository.
... Additional data documentation and de-identified data will be deposited for sharing
along with phenotypic data, which includes demographics, family history of XXXXXX
disease, and diagnosis, consistent with applicable laws and regulations. … Meta-analysis
data and associated phenotypic data, along with data content, format, and organization,
will be available at ___. Submitted data will confirm [sic] with relevant data and
terminology standards.
Who will have access to the data: … Where will the data be available: … When will the
data be shared: …
How will researchers locate and access the data: I agree that I will identify where the data
will be available and how to access the data in any publications and presentations that I
author or co-author about these data … repository has policies and procedures in place
that will provide data access to qualified researchers, fully consistent with NIH data
sharing policies and applicable laws and regulations.
What should be required /
recommended
•
•
•
•
•
•
•
LOINC
CDISC
BRIDG
SNOMED
Minimal Information Checklists
OBI
Immunology Ontologies
Lab resources do not recommend use
of LOINC
http://www.niaid.nih.gov/LabsAndResour
ces/resources/DAIDSClinRsrch/Document
s/gclp.pdf
Can some these be built into
•
•
•
•
LDMS
LIS
CTMS
EHRs
need to explore relation of SDY
data in ImmPort to CDISC
From the Summary specification
CDISC definitions
• A study arm represents one planned path
through the study. The path is composed of a
study cell for each epoch in the study.
• A study cell is the part of study design that
describes what happens in a particular epoch
for a particular arm. The cell describes how
the purpose of its epoch is fulfilled for each
arm.
CDISC Structural Elements
Building blocks of a study design: Epochs,
Cells, Arms, Segments, Activities
• A study arm represents a path
• A path is composed of a cell
• A cell is a part of a study design
• A cell describes what happens in an epoch
• An Activity represents a point in a study at
which a specific action is to be taken.
BRIDG
• http://bridgmodel.nci.nih.gov/files/BRIDG_M
odel_3.2_html/index.htm
BRIDG Adverse Event
BRIDG Adverse Event
Product Brief - Biomedical Research
Integrated Domain Group (BRIDG)
Implementations/ Case Studies (Actual Users)
• The BRIDG Project is a collaborative effort
engaging stakeholders from four organizations:
– Clinical Data Interchange Standards Consortium
(CDISC)
– HL7 Regulated Clinical Research Information
Management Working Group (HL7 RCRIM WG)
– National Cancer Institute (NCI), including the Cancer
Biomedical Informatics Grid (caBIG™) project
– Food and Drug Administration (FDA)
Does BRIDG have any users?
Ballot Cycle
Info
Misc Notes
Product Type
Project
Document
Repository
2010 May Ballot Cycle Info: INFORMATIVE
Ballot results: Met basic vote requirements for approval. 24 Negative votes to reconcile
Document Name: HL7 Version 3 Domain Analysis Model: Biomedical Research Integrated Domain (BRIDG),
Release 1
Ballot Code: V3DAM_BRIDG_R1_TBD
NIB Submitted By: Edward Tripp
May 2013: Published NE 2013; PMO archiving.
March 2013: Received ANSI approval for Technical Report of HL7 Version 3 Domain Analysis Model:
Biomedical Research Integrated Domain (BRIDG), Release 1. PMO set status as Ready for Normative Edition
Publication. Project Team does not need to do any further work.
Nov 2012: TSC approved a publication request for HL7 Version 3 Domain Analysis Model: Biomedical Research
Integrated Domain (BRIDG), Release 1 as an informative document and registration with ANSI as a Technical
Report, from TSC tracker # 2406.
As of 2012-09-01: Project restarted. BRIDG project team re-writing new NWIP for submission September
2012.
2011Dec: Per L. Laakso The Bridg project at PI 538 is an ISO/JIC project (ISO/CD 14199). PMO changed it to be
an ISO/JIC project type.
2010Nov(LL): updated dates per RCRIM three-year plan. cyclical development indicative of informative ballot
developemnt Jan cycles, ballot May cycles, publication and renewed development in September cycles.
2010July: added repository URL
Sept 2009: PMO changed from 3-Year Plan item to Active Project
Domain Analysis Model (DAM)
http://wiki.hl7.org/index.php?title=BRIDG_as_DAM
What should be required /
recommended
•
•
•
•
•
•
•
LOINC
CDISC
BRIDG
SNOMED
Minimal Information Checklists
OBI
Immunology Ontologies
MIFLOWCYT: Minimal Information for
a Flow Cytometry Experiment
Minimal Information about a Cellular Assay
Project Header
Source: Contact details of researcher/person in charge of the project (name, affiliation/institution,
department, address, Email, etc.).
Project: Description (text) of the project within a larger context (biological process that is addressed;
description of measured effect, controls, etc.).
Application: Description (text) of the specific application of this project (abstract; reference to
publication).
Array(s): culturing and reaction container(s) that are used during the project (name;
• identifier; type (e.g. 384, 96, 24 well, flask, glass slide for cell arrays); vendor or
• manufacturer; order-number; surface area/feature size.
CellLine(s): Description of cell lines employed (name; identifier; ATCC number (if applicable) or details:
Species, tissue, organ, contact-details (when from different lab);
• reference to publication; passage number; mycoplasma test (Y/N) and other validation;
• modifications (optional, if any made, e.g. stably transfected, induced resistance, etc.).
Reagents: Media, supplements, kits, buffers, and solutions (name, identifier, vendor or manufacturer,
order number, lot number).
Perturbator(s): Description of materials/conditions that are used in the project to perturb the cells
(type - e.g. siRNA, cDNA, small chemical compound, name, external references • gene/protein identifiers/order numbers, sequences if applicable).
Instrument(s): Description of the data acquisition station and other instruments utilized in the project,
e.g. for transfection (name, type, model, manufacturer).
Minimal Information about a Cellular Assay
Experimental modules
Treatment(s): Description of the conditions that are applied to the cells during culturing
(name, identifier, time-stamp, materials used, volume, actual passage number of cells,
seeding density, temperature, CO2-content, humidity).
Perturbation(s): Description of the perturbation, (special case of a treatment), which
describes the application of ‘perturbator(s)’, e.g. transfection (siRNA, expression clone),
treatment with small compound, temperature shift, etc.
PostTreatment(s): Description of the conditions that are applied after culturing and prior
to data acquisition, i.e. lysis, fixation, staining, antibody incubation, etc.
DataAcquisition(s): Detection of the effect(s) induced by the perturbation (identifier, time
stamp, reference to instrument above, instrument-settings, e.g. excitation and emission
wavelengths with filter sets, lamp/Laser energy).
DataProcessing
Description of the processes applied to analyze the raw-data in order to generate a hit list.
Reference to publication(s) describing the procedures and/or to software utilized (incl.
version and settings).
Links to the raw, the processed, and to the interpreted data.
Minimal Information about a T-Cell
Assay
• Janetzki S, Britten CM, Kalos M, Levitsky HI,
Maecker HT, Melief CJ, Old LJ, Romero P, Hoos
A, Davis MM. 2009. "MIATA"-Minimal
Information about T Cell Assays. Immunity. 31:
527-8
http://www.miataproject.org/
http://mibbi.sourceforge.net/portal.shtml
MIBBI= Minimal Information about a
Biological or Biomedical Investigation
• How to make MIBBI checklists non-redundant,
factorable, such as to support interoperability
/ sharing of data?
• Need to use the same words for the same
things and events in each checklist
• OBI = Ontology for Biomedical Investigations
OBI representation of a neuroscience study
OBI: Vaccine Protection Investigation
OBI ontology terms (examples)
OBI Relations
Advantages of OBI
• Federal reporting (drug trials)
• Enhancement of plans for data sharing
• Supports retrieval and meta-analysis by allowing
searches over protocols, metadata about
experiments, imaging processes, statistical
processes, sources of analytes, equipment
vendors, …
• Supports comparison of runs performed by
different labs on the same machines, using the
same sorts of settings, stainings, samples …
ImmPort (recommended) Ontologies
Chemical Entities of Biological Interest (CHEBI)
The Protein Ontology (PRO)
The Gene Ontology (GO)
GO Annotation for the Immune System
The Cell Ontology (CL)
Beta Cell Genomics Ontology (BCGO)
The Immune Epitope Ontology (ONTIE)
The Infectious Disease Ontology (IDO)
Ontology for General Medical Science (OGMS))
Desiderata
Allergy Ontology
Immunology Ontology
Autoimmune Disease Ontology
How to advance interoperability of
immunology (clinical trial) data
How connect
• experimental data
• ontology resources?
• 1. identify ways that submitters of data can
benefit early on through use of ontologies
• 2. identify ways to make it easier for submitters
of data to use ontologies
• 3. identify ways to annotate data using ontologies
post-submission
1. identify ways that submitters of data can
benefit early on through use of ontologies
a. help to satisfy NIH mandates
– can we influence NIH mandates?
b. help to satisfy regulatory reporting
requirements (FDA)
c. help to analyze their data (and to incorporate
other sorts of data)
d. help them to do better science by imitating
successes of others who have exploited
ontologies
2. identify ways to make it easier for
submitters of data to use ontologies
a. build them into EHRs
– can we influence EHR vendors?
b. build them into Lab Information Systems
c. build them into Clinical Trial Management
Systems
3. identify ways to annotate data using
ontologies post-submission
a. NLP
- New tools for classification and monitoring of autoimmune diseases.Maecker
HT, Lindstrom TM, Robinson WH, Utz PJ, Hale M, Boyd SD, Shen-Orr SS,
Fathman CG.Nat Rev Rheumatol. 2012 May 31;8(6):317-28
- Towards a Cytokine-Cell Interaction Knowledgebase of the Adaptive Immune
System. Shen-Orr SS, Goldberger O,Garten Y,Rosenberg-Hasson Y,Lovelace
PA,Hirschberg DL, Altman RB, Davis MM, Butte AJ.Pacific Symposium on
Biocomputing 2009:439-450
b. manual annotation of ImmPort summaries
c. ?
The Infectious
Disease Ontology
with thanks to Albert Goldfain and
Lindsay G. Cowell
71
IDO-Core
• OBO Foundry ontology based on BFO and OGMS
• Contains general terms in the ID domain:
• E.g., ‘colonization’, ‘pathogen’, ‘infection’
• Intended to represent information along several
dimensions:
• biological scale (gene, cell, organ, organism,
population)
• discipline (clinical, immunological, microbiological)
• organisms involved (host, pathogen, and vector types)
• A hub for further extension ontologies
• A contract between IDO extension ontologies and the datasets that
use them.
72
“Toward Precision Medicine: Building a Knowledge Network for
Biomedical Research and a New Taxonomy of Disease”
73
ICD 9: Catch-all Codes and
Scattered Exclusions
• 041 Bacterial infection in conditions classified elsewhere and of unspecified
site
• Note: This category is provided to be used as an additional code to identify the
bacterial agent in diseases classified elsewhere. This category will also be used to
classify bacterial infections of unspecified nature or site.
• Excludes: septicemia (038.0-038.9)
• 041.1 Staphylococcus
• 041.10 Staphylococcus, unspecified
• 041.11 Methicillin susceptible Staphylococcus aureus
• MSSA
• Staphylococcus aureus NOS
• 041.12 Methicillin resistant Staphylococcus aureus
• Methicillin-resistant staphylococcus aureus (MRSA)
• 041.19 Other Staphylococcus
• 038 Septicemia
• 038.1 Staphylococcal septicemia
• 038.10 Staphylococcal septicemia, unspecified
• 038.11 Methicillin susceptible Staphylococcus aureus septicemia
• MSSA septicemia
• Staphylococcus aureus septicemia NOS
• 038.12 Methicillin resistant Staphylococcus aureus septicemia
• 038.19 Other staphylococcal septicemia
74
ICD 9: Catch-all Codes and
Scattered Exclusions
• 041 Bacterial infection in conditions classified elsewhere and of unspecified
site
• Note: This category is provided to be used as an additional code to identify the
bacterial agent in diseases classified elsewhere. This category will also be used to
classify bacterial infections of unspecified nature or site.
[041.19] Other Staphylococcus
• Excludes: septicemia (038.0-038.9)
• 041.1[041]
Staphylococcus
Bacterial infection in conditions classified elsewhere and of
• 041.10 Staphylococcus, unspecified
unspecified site.
• 041.11 Methicillin susceptible Staphylococcus aureus
• MSSA
• Staphylococcus aureus NOS
• 041.12 Methicillin resistant Staphylococcus aureus
• Methicillin-resistant staphylococcus aureus (MRSA)
• 041.19 Other Staphylococcus
• 038 Septicemia
• 038.1 Staphylococcal septicemia
• 038.10 Staphylococcal septicemia, unspecified
• 038.11 Methicillin susceptible Staphylococcus aureus septicemia
• MSSA septicemia
• Staphylococcus aureus septicemia NOS
• 038.12 Methicillin resistant Staphylococcus aureus septicemia
• 038.19 Other staphylococcal septicemia
75
IDO: Core and Extensions Framework
76
A Lattice of Lightweight
Application-Specific Ontologies
77
IDO-Staph: Introduction
• Initial Release Candidate:
http://purl.obolibrary.org/obo/ido/sa.owl
• Google Code Page: http://code.google.com/p/ido-staph/
• Scope
• Entities specific to Staphylococcus aureus (Sa) infectious diseases at
multiple granularities
• Biological and clinical terms describing host-Sa interactions
• An IDO extension ontology
• Extends IDO-Core, OGMS
• BFO as an upper ontology
• Built on OBO Foundry principles
• Applications
• Duke Staph aureus Bacteremia Group data annotation
• Lattice of infectious diseases
78
Sa Organism: Parts and Products
• Molecular Entities: Toxins, Invasins, Adhesins
from Shetty, Tang, and Andrews, 2009
79
Source: http://textbookofbacteriology.net/themicrobialworld/staph.html
80
Toxic Shock Syndrome
• Staphylococcal TSS is a ido:‘infectious disease’
• has_material_basis SOME
(Sa infectious disorder AND (has_part SOME TSST)
• TSST is a pr:protein
• has_disposition SOME ‘exotoxin disposition’ [INF: is a exotoxin]
• tstH is a so:gene
• has_gene_product SOME TSST
• part_of SOME (SaPI2 OR SaPI3)
• SaPI2 is a so:‘pathogenic island’
• SaPI3 is a so:‘pathogenic island’
81
Sa Diseases: Asserted Hierarchy
• Primary classification of staphylococcal diseases
• These are first and foremost infectious diseases
• Use DOIDs for disease terms
• Assert ido:‘infectious disease’ as a parent term for these
diseases
82
Sa Diseases: Inferred Hierarchy
• Secondary classification as Sa Infectious Diseases
83
Ways of differentiating infectious diseases
• High-level types
• By host type (species)
• By anatomical site of infection
• By signs and symptoms
• By mode of transmission
• By (sub-)species of pathogen
• Differentiation based on host features
• Clinical phenotype
• Strain (e.g. A/J)
• Gene types (e.g. C5-deficient)
• SNP alleles
• Differentiation based on pathogen features
• By phenotype (e.g. drug resistance)
• By genotype
•
•
•
•
By banding patterns (e.g. PFGE)
By typing of house-keeping genes (e.g. MLST)
By virulence factor typing (e.g. spa, SCCmec)
By whole genome?
“Methicillin-Susceptible Staphylococcus aureus Endocarditis Isolates Are Associated with
Clonal Complex 30 Genotype and a Distinct Repertoire of Enterotoxins and Adhesins”
Nienaber et al. 2011 J Infect Dis. 204(5):704-13.
84
Ways of differentiating Staph aureus
infectious diseases
• Sa Infectious Disease
• By SCCmec type
• By ccr type
• By mec class
• By spa type
• IWG-SCC
• Maintains up-to-date
SCCMec types
• General guidelines for
reporting novel SCCmec
elements
http://www.sccmec.org/Pages/SCC_ClassificationEN.html
85
SCCMec
(Staphylococcal Chromosome Cassette)
• A mobile genetic element in Staphylococcus aureus that
carries the central determinant for broad-spectrum
beta-lactam resistance encoded by the mecA gene and
has the following features:
• (1) carriage of mecA in a mec gene complex,
• (2) carriage of ccr gene(s) (ccrAB or ccrC) in a ccr gene complex,
• (3) integration at a specific site in the staphylococcal chromosome,
referred to as the integration site sequence for SCC (ISS), which
serves as a target for ccr-mediated recombination, and
• (4) the presence of flanking direct repeat sequences containing the
ISS.
86
87
Representing SCCMec: IDO-Staph + SO
is a
gene group
pathogenic island
is a
is a
mec complex
has_part
is a
SCCmec
has_part
ccr complex
is a
is a
mec complex class B
SCCmec Type IV
ccr complex Type 2
has_part
has_part
has_part
IS1272
is a
has_part
has_part
mecA
is a
insertion sequence gene
ccrA2
has_part
ccrB2
is a
is a
88
NARSA Isolate Data
• Isolate data from the Network for Antimicrobial Resistance
to Staph. Aureus
• CDC Active Bacterial Core surveillance (ABCs) Isolates
Subset
• Known Clinically Associated Strains
• 101 Sa isolates
• Isolate data
• Culture source (e.g. bone/joint)
• Antimicrobial profile (e.g. erythromycin resistant)
• Virulence factors expressed (e.g. TSST-1+)
• PFGE type (e.g. USA300)
• Genomic typing (e.g. MLST type 8, SCCmec type IV))
89
90
Building the Lattice
• For each NARSA Isolate we extract
• SCCMec Type (IDO-STAPH)
• TSST +/- (IDO-STAPH)
• PVL +/- (IDO-STAPH)
• Culture Source (FMA)
• Antimicrobial Profile
• Drug (CHEBI)
• Minimum Inhibitory Concentration (OBI)
• CLSI Interpretation of Resistance (IDO)
• Each particular isolate can be part of a particular Staph
aureus infectious disorder.
• Each particular Staph aureus isolate can be the material
basis for a particular Staph aureus infectious disease.
91
Resistance of NRS701 to clindamycin
• resistance_of_Iso_to_D instanceOf resistance_to_D
• Iso has_disposition ‘resistance_of_Iso_to_D’
• Iso_D_MIC instanceOf ‘MIC data item’
• Iso_D_MIC has_measurement_value M
+
92
Faceted Browser
• http://awqbi.com/LATTICE/narsa-complete.html
• http://purl.obolibrary.org/obo/ido/sa/narsa-isolates.owl
93
Conclusion
• Good web resources on Staph aureus exist…
• IWG-SCC
• NARSA
• Comprehensive Antibiotic Resistance Database (CARD)
• …but currently in information siloes and flat HTML
• Disease specific application ontologies can be induced
from isolate data
• Each such application ontology
• Has a well-defined place in the lattice beneath IDO-Core
• Can be used to make Sa specific genetic-phenotypic assertions.
• We believe an IDO-based lattice of application ontologies
can contribute to a new taxonomy of (infectious) disease.
94
Acknowledgements
• This work was funded by the National Institutes of Health
through Grant R01 AI 77706-01. Smith’s contributions
were funded through the NIH Roadmap for Medical
Research, Grant U54 HG004028 (National Center for
Biomedical Ontology).
• Duke SABG
• IDO Consortium
• OBO Foundry
95