Clinical-Genomics-Atlanta-Shabo-September-2004

Download Report

Transcript Clinical-Genomics-Atlanta-Shabo-September-2004

HL7 Clinical-Genomics SIG:
A Shared Genotype Model
HL7 V3 Compliant
Amnon Shabo (Shvo)
IBM Research Lab in Haifa
HL7 Clinical-Genomics SIG Facilitator
Atlanta, September 2004
Haifa Research Lab
Current Work
Clinical-Genomics
Storyboard
Tissue Typing
Family
History
Genotype
Clinical-Genomics
Storyboard
BRCA
Shared
Cystic Fibrosis
Clinical-Genomics
Storyboard
Model
Clinical
Statement
Shared
Model
Pharmacogenomics
Clinical-Genomics
Storyboard
Haifa Research Lab
The Genotype CMET
 Represents genomic data in HL7 RIM Classes
 Not meant to be a biological model
 Concise and targeted at healthcare use for
personalized medicine
 Consists of:
 A Genotype (entry point)
 1 .. 3 alleles
 Polymorphisms
 Mutations
 SNPs
 Haplotypes
 DNA Sequencing
 Gene expression
 Proteomics
 Phenotypes (clinical data such as diseases, allergies, etc.)
Haifa Research Lab
The Genotype CMET
(cont.)
 Design Principles:
 Shared model (a reusable component in different use cases)
 Basic encapsulation of genomic data that might be used in healthcare
regardless of the use case
 Stemmed from looking for commonalities in specific use cases
 Presented as the CG SIG DIM (Domain Information Model) in ballot#6&8
 Most of the clones are optional, thus allowing the representation of merely a
genotype with a minimum of one allele (a typical use by early adopters)
 At the same time, allows the use of finer-grain / raw genomic data, thus
accommodating the more complex use cases such as tissue typing or clinical
trials
 Its use is currently illustrated in four R-MIMs:




Tissue Typing
Cystic Fibrosis
Viral genotyping
Pharmacogenomics
Haifa Research Lab
The Genotype Model
Entry Point:
Genotype
Genotype
HL7 Clinical Genomics SIG
Document:
Subject:
Facilitator:
0..* priorClinicalPhenotype
Individual
Allele (1..3)
sequelTo
Haplotype
ClinicalPhenotype
(POCG_RM000004)
Individual Genotype DIM (to be registered as a CMET)
Genomics Data
Rev:
0.5
Date:
April 24, 2004
Amnon Shabo (Shvo), IBM Research in Haifa
Entry point to the
Clinical-Genomics
Genotype Model
0..* priorClinicalPhenotype
sequelTo
typeCode*: <= SEQL
Haplotype
ClinicalPhenotype
ClinicalPhenotype
typeCode*: <= SEQL
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
componentOf
Allele
Sequence
Genotype
0..* haplotype
Note:
A related allele that is on a
different haplotype, and still
has significant interrelation
with the source allele.
Haplotype
typeCode*: <= COMP
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1] (e.g., HETEROZYGOTE)
text: ED [0..1]
effectiveTime: IVL<TS> [0..1] (the time of genotyping)
component
Note:
The classCode should be
OBSGENPOLSNP which
stands for
SNP-polymorphism
genomic observation,
a subtype of
OBSGENPOL
(polymorphism genomic
observation) which is a
subtype of OBSGEN
(genomic observation).
componentOf
typeCode*: <= COMP
typeCode*: <= SEQL
There must be at least one
IndividualAllele and three
at the most. The typical case
would be an allele pair, one
on the paternal chromosome and
one on the maternal chromosome.
IndividualAllele
pertinentInformation5
sequelTo
Note:
typeCode*: <= COMP
0..* haplotype
0..* priorClinicalPhenotype
The third allele could be
present if the patient has
three copies of a chromosome as
in the Down’s Syndrome.
0..* pertinentIndividualAllele
typeCode*: <= PERT
DeterminantPeptide
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE
(identifier and classification of the
determinant, e.g., Entrez)
text: ED
pertinentInformation2
SNP
Note:
Use methodCode if
you don’t use the
associated method
procedure.
Sequencing
0..* pertinentDeterminantPeptide
typeCode*: <= PERT
1..3 individualAllele
AlleleSequence
SNP
classCode*: <= OBS
0..* pertinentSNP
moodCode*: <= EVN
pertinentInformation1
id: II [0..1]
typeCode*: <= PERT
code: CE CWE [0..1]
(SNP identifier & classification, e.g.
Entrez dbSNP)
text: ED [0..1]
value: BAG<ED> [0..*] (the SNP itself)
methodCode: SET<CE> CWE [0..*]
pertinentInformation
typeCode*: <= PERT
IndividualAllele
0..1 pertinentAlleleSequence
classCode*: <= OBS
moodCode*: <= EVN
code*: CE CWE [1..1]
(allele identifier & classification, e.g. GeneBank)
text: ED [0..1]
methodCode: SET<CE> CWE [0..*]
(The method by which the code was determined)
pertinentInformation2
typeCode*: <= PERT
pertinentInformation3
pertinentInformation6
pertinentInformation4
typeCode*: <= PERT
typeCode*: <= PERT
typeCode*: <= PERT
Method
Method
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: [1..1]
(the sequence standard code, e.g.
BSML, GMS)
text: (the annotated sequence)
effectiveTime: [1..1]
value: ED [1..1] (the actual sequence)
methodCode: (the sequencing
method)
classCode*: <= PROC
moodCode*: <= EVN
id: II [0..1]
0..* pertinentMethod
code: CD CWE [0..1] <=
pertinentInformation1
ActCode
typeCode*: <= PERT(type of method)
text: ED [0..1]
(free text description of the
method used)
methodCode: SET<CE>
CWE [0..*]
outcome
typeCode*: <= OUTC
0..* pertinentMutation
Mutation
0..1 pertinentMutation
Mutation
GeneExpression
0..* pertinentMethod
Method
Mutation
0..1 pertinentGeneExpression
pertinentInformation
typeCode*: <= PERT
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE <= ActCode
(the standard's code (e.g., MAGEML identifier)
text:
effectiveTime:
value: ED [1..1] (the actual gene
expression levels)
methodCode:
Polymorphism
classCode*: <= OBS
moodCode*: <=
EVN
id: II [0..1]
code: CD CWE [0..1]
<= ActCode
text: ED [0..1]
value: ANY [0..1]
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
0..* pertinentPolymorphism
code: CE CWE
(mutation identifier and
classification,
e.g. LOINC MOLECULAR
GENETICS NAMING)
text:
sequelTo
typeCode*: <= SEQL
Constraint: GeneExpression.value
Gene
Expression
Note:
The classCode should be
OBSGENPOLMUT
which stands for mutationpolymorphism
genomic observation,
a subtype of
OBSGENPOL (polymorphism
genomic observation) which
is a subtype of
OBSGEN (genomic
observation).
Polypeptide
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code*: CE CWE [1..1]
(idnetifier & classification of
the protein, e.g., SwissProt,
)
(PDB, PIR, HUPO)
text:
0..* priorClinicalPhenotype
Constrained to a restricted MAGE-ML
content model, specified elesewhere.
Proteomic
s
0..* outcomePolypeptide
Constraint: AlleleSequence.value
Constrained to a restricted
BSML or GMS content model,
specified elsewhere.
Note:
Usually this is a computed outcome, i.e.,
the lab does not produce the actual protein.
ClinicalPhenotype
Note:
The classCode should be
OBSGENPOL which stands
for polymorphism genomic
observation, a subtype of
OBSGENPOL (polymorphism
genomic observation) which
is a subtype of OBSGEN
(genomic observation).
Note:
Could refine ActRelationship typeCode
to elaborate on different types of genomic
to phenotype effects.
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1] (disease, allergy, sensitivity, ADE, etc.)
text: ED [0..1]
uncertaintyCode: CE CNE [0..1]
value: ANY [0..1]
Note:
An observation of a clinical condition
represented internally in this model.
reference
typeCode*: <= x_ActRelationshipExternalReference
0..* referredToExternalClinicalPhenotype
ExternalClinicalPhenotype
Polymorphism
classCode*: <= OBS
moodCode*: <= EVN
id*: II [1..1]
(The id of an external observation (e.g., in a problem
list)
Note:
An external observation is a valid Observation
instance existing in any other HL7-compliant
artifact, e.g., a document or a message.
Clinical
Phenotype
Note: Shadowed observations
are copies of other observations
and thus have all of the original
act attributes.
Haifa Research Lab
Coexistence of HL7 Objects and
Bioinformatics Markup
Genomic Data
Sources
Clinical Practice
Knowledge
(KBs, Ontologies, registries,
Evidence-Based, Papers, etc.)
EHR
System
Bubbling up the clinically-significant raw
genomic data into specialized HL7 objects and
linked them with clinical data from the patient EHR
Decision Support
Applications
Haifa Research Lab
Coexistence of HL7 Objects and
Bioinformatics Markup (cont.)
Genetic Counseling
DNA Lab
Sequencing Example…
EHR
System
Bubbling up the clinically-significant SNP data into
HL7 SNP and Mutation objects and
linked them with clinical data from the patient EHR
Decision Support
Applications
Haifa Research Lab
Coexistence of HL7 Objects and
Bioinformatics Markup (cont.)
IndividualAllele
classCode*: <= OBS
The patient's
moodCode*: <= EVN
id: II [0..1]
allele
code*: CE CWE [1..1] (allele classification)
text: ED [0..1]
value: ANY [0..1] (e.g. accession no. in GeneBank)
methodCode: SET<CE> CWE [0..*] (The method by which the code was determined)
HL7 genomicspecialized
Objects
Bubbling-up…
Bubbling-up…
AlleleSequence
SNP
SNP
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CD CWE [1..1]
(the sequence standard code, e.g.
BSML, GMS)
text: ED [0..1] (sequence's
annotations)
effectiveTime: GTS [1..1]
value: ED [1..1] (the actual sequence)
methodCode: SET<CE> CWE [0..*]
(the sequencing method)
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
(SNP classification, e.g. from Entrez
dbSNP)
text: ED [0..1]
value: BAG<ED> [0..*] (the SNP itself)
methodCode: SET<CE> CWE [0..*]
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
(SNP classification, e.g. from Entrez
dbSNP)
text: ED [0..1]
value: BAG<ED> [0..*] (the SNP itself)
methodCode: SET<CE> CWE [0..*]
Bubbling-up…
bioinformatics
markup
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
(mutation classification)
text: ED [0..1]
value: ANY [0..1]
(mutation code, e.g. drawn
from LOINC MOLECULAR
GENETICS NAMING)
ClinicalPhenotype
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
(e.g., disease, allergy, sensitivity, ADE, etc.)
text: ED [0..1]
value: ANY [0..1]
Bubbling-up…
Sequencing
data
encapsulated as
Mutation
Haifa Research Lab
The Family History Model
FamilyHistory
(COCT_RM999999)
Patient
This model is intended to
be a CMET and has the
capability of representing
any part of the patient
pedigree.
classCode*: <= PAT
id*: SET<II> [1..*]
Note:
First-degree relatives.
FAMMEMB could be used
for unidentified relatives, but
also any of the more specific
codes like PRN (parent) or
NMTH (natural mother).
Note:
Person holds details that are
not specific the family role.
Person is also the scoper of
the relative roles (for more
details see the V3 RoleCode
vocabulary, domain =
PersonalRelationshipRoleType).
typeCode*: <= SBJ
0..* clinicalGenomicChoice
1..1 patientPerson
Person
classCode*: <= PSN
determinerCode*: <= INSTANCE
id: SET<II> [0..*] (e.g., SSN)
name: BAG<EN> [0..*]
telecom: BAG<TEL> [0..*]
administrativeGenderCode: CE
CWE [0..1]
<= AdministrativeGender
birthTime: TS [0..1]
deceasedInd: BL [0..1] "false"
deceasedTime: TS [0..1]
raceCode: SET<CE> CWE [0..*] <=
Race
ethnicGroupCode: SET<CE> CWE
[0..*] <= Ethnicity
0..1 relationshipHolder
Note:
This should be replaced with the
Clinical-Genomics Genotype model
(as a CMET) to deal with all types
of genomic data.
classCode*: <= PRS
id: SET<II> [0..*] (use this attribute to hold pedigree ID)
code: CE CWE [0..1] <= RoleCode "FAMMEMB"
0..*
Note:
Clinical Genomics choice similar to the
choice associated with the Patient role
(the entry point of this model).
classCode*: <= ACT
moodCode*: <= EVN
code: CD CWE [1..1] <= ActCode
negationInd: BL [0..1]
effectiveTime: GTS [0..1]
confidentialityCode: SET<CE> CWE [0..*] <= Confidentiality
uncertaintyCode: CE CNE [0..1] <= ActUncertainty
classCode*: <= OBS
moodCode*: <= EVN
subjectOf
Note:
A shadow of Person allows for recursive
representation of any higher level
degree of relations, e.g., grandfather,
through the same clone - PersonalRelationship,
nesting in Person.
ClinicalStatement
Genotype
0..* relationshipHolder
PersonalRelationship
Person
ClinicalGenomicChoice
subjectOf1
Genotype
CMET
Note:
Should be replaced with a
generic clinical statement
CMET so it is capable of
holding any pertinent
clinical data of the patient
or his/her relative.
Haifa Research Lab
Family History – Harmonization Proposals
 Age:
 Age of subject when subject’s diagnosis was made
 Age at time of death
 Proposed solution: a new data type to refer to from effectiveTime:
<effectiveTime xsi:type="TSR"> <!--TSR=Time Stamp Relative-->
<epoch code="B"/>
<offset value="20" unit="mo"/>
</effectiveTime>
 Vocabulary proposals
 Observation Interpretation (Deleterious, Unknown significance, Polymorphism, No mutation)
 Personal relation codes and qualifiers
 Personal Relationship association names
 A naming algorithm problem (HL7 tooling issue)
Haifa Research Lab
The Genotype Model in Tissue Typing
BMT Tissue Typing
Donor Banks
BMT
Ward
Tissue Typing Observation
Individual1 HLA
TissueTyping Lab
Genotype
Matching
Allele
SNP
Haplotype
Individual2 HLA
Haifa Research Lab
How the Genotype fits to Tissue-Typing
Tissue Typing in the context of Bone-Marrow Transplantation:
BMT
Center
BMT unique
Order/Entry
Tissue
Typing
Observation
Donor
Bank
Haifa Research Lab
How the Genotype fits to Tissue-Typing
Note:
This module is developed by the Clinical-Genomics SIG.
It will registered as a CMET but for now it appears here as
an observation. For details, see the Genotype R-MIM.
All genomic data are encapsulated in this CMET, including
mutations which are the essence of the CF testing for example.
Genotype CMET
Note:
The no. of genotypes is dependent
on the no. of loci examined in each
HLA class (usually, class I includes
A, B and C antigens and class II
includes the DR antigen family)
classCode*: <= OBS
moodCode*: <= EVN
Class I
Antigens
component
1..* hLA_AntigenGenotype
typeCode*: <= COMP
1..* hLA_AntigenGenotype
component
typeCode*: <= COMP
ClassTissueTypingResultLetter
II
Antigens
TTObservation
(UUDD_RMnnnnnn)
Tissue-Typing
Observation
TissueTypingFacility
Single
classCode*: <= ENT
determinerCode*: <= INSTANCE
TT-TestingLab
classCode*: <= QUAL
0..1
Class II Antigens
classCode*: <= OBS
moodCode*: <= EVN
code: CD CWE [0..1] <= ActCode
classCode*: <= OBS
moodCode*: <= EVN
code: CD CWE [0..1] <= ActCode
component2
0..* tissueTypingResultLetter
typeCode*: <= DOC
0..1 class II Antigens
typeCode*: <= COMP
TissueTypingObservation
primaryPerformer TissueTypingObservation
typeCode*: <= PPRF
classCode*: <= PSN
determinerCode*: <= INSTANCE
SubjectChoice
Donor
0..1 participant
subject
typeCode*: <= SBJ
(COCT_MT050000)
component1
documentationOf
0..1 tT-TestingLab
Person
CMET: (PAT)
R_Patient
[universal]
0..1 class I Antigens
typeCode*: <= COMP
0..1 playingPerson
classCode*: <= ROL
classCode*: <= DOCCLIN
moodCode*: <= EVN
code: CE CWE <=
TissueTypingDocumentType
Class I Antigens
Tissue Typing
Observation
The Genotype
model is used
for each HLA
Antigen
classCode*: <= OBS
moodCode*: <= EVN
id:
code: CS CWE <= TissueTypingTestingClass
1..1 priorTissueTypingObservation
sequelTo
typeCode*: <= SEQL
1..1 priorTissueTypingObservationTissueTypingMatchingObservation
classCode*: <= OBS
sequelTo1
moodCode*: <= EVN
typeCode*: <= SEQL
code: CS CWE <= TissueTypingMatchingClass
text:
Note:
component
TissueTypingMatchingClass
typeCode*: <= COMP
should be a new vocabulary
in HL7, e.g., 2-haplotype match
TT-Matching
(UUDD_RMnnnnnn)
Description
Constraint: LocusMatching
The number of LocusMatching
Observations is dependent on
the no. of loci examined
in the tissue-typing testing
Note:
TissueTypingLocusMatchingClass
should be a new vocabulary
in HL7 (may use recent NMDP effort)
0..* locusMatching
LocusMatching
classCode*: <= OBS
moodCode*: <= EVN
code: CS CWE <=
TissueTypingLocusMatchingClass
text:
Tissue Typing
Matching
Observation
Haifa Research Lab
Tissue Typing Scenario Simulation
 Real Case with…
 A Hutch Patient and
 sibling and unrelated donor candidates are in Hadassah
 Information exchange…
 is simulated through a series of XML files
 following the TT storyboard activity diagram and
 using the HL R-MIMs + Genotype CMET
 Documented in the following doc:
 HL7-Clinical-Genomics-TissueTypingInfoExchangeSimulation.doc
 Contact Amnon Shabo to get the document ([email protected])
Haifa Research Lab
The Genotype Model in Cystic Fibrosis
Provider
EMR System
Entry Point:
Blood Sample
MGS
Report
MLG
Counselor
Patien
t
ML
Consultant
DNA
Genotype
CMET
Molecular
Genetic lab
Haifa Research Lab
The Genotype Model in Viral Genotyping
Repor
t
Patient
Sponso
r
Pathoge
n
Viral DNA Sequencing
Entry Point:
Specimen
Test
Panel
Resistance
Profile
Genotype
CMET
Viral DNA
Regions
DNA
Lab
Haifa Research Lab
The Genotype Model in PharmacogenomicsBased Clinical Trial & Submission
Data
Analysis
Patient
Repor
t
Pharmacogenomics
testing
CRO
Gene
Selection
Analysi
s device
CR
O
SNP/Hap
Discovery
Genotype
CMET
Data
Validation
Trial
design
Regulator
Sponso
r
Genomic
data
Submission
Haifa Research Lab
Constrained-BSML Schema
 BSML – Bioinformatics Sequence Markup Language
 Aimed at any biological sequence, for example:
 DNA
 RNA
 Protein
 Constraining the BSML DTD to fit the healthcare needs
 Leave out research and display markup
 Ensure the patient identification
 Creating an XML Schema, set up as the content model
of an HL7 attribute of type ED
Haifa Research Lab
Constrained-MAGE-ML Schema
 Cope with data outside of the XML (referenced)
 Shared issues:
 Eliminate research & display elements and requires the
presence of certain elements, for example - patient identifiers
 Require that one and only one patient will be the subject of
the data, to avoid bringing data of another patient into the HL7
message
 Require that data will refer to only one allele with which the
encapsulating HL7 object is associated
Haifa Research Lab
OBS Specialization Examples
 PublicHealthCase
 detectionMethodCode :: CE
 transmissionModeCode :: CE
 diseaseImportedCode :: CE
 Diagnostic Image
 subjectOrientationCode:: CE
 The above examples are relatively ‘simple’ considering
the uniqueness of the genomic observation attributes
 Propose to add a genomic specialization to the RIM
Observation Class
 Rationale: has additional attributes that are unique to
genomics (LSID, Bioinformatics Markup, etc.)
Haifa Research Lab
Genomic Specializations of Observation
GenomicObservation
LSID
Polymorphism
Gene Expression
Bio Sequence
type
position
length
reference
region
MAGE
BSML
SNP
tagSNP
Mutation
knownAssciatedDiseases
(not the actual phenotype)
Haifa Research Lab
New Class Codes Proposal
classCode
Class name
OBSGEN
GenomicObservation
OBSGENPOL
Polymorphism
OBSGENPOLMUT
Mutation
OBSGENPOLSNP
SNP
Haifa Research Lab
New Attributes Proposal
 GenomicObservation: LSIDIdentifier
 AlleleSequence: moleculeSequence
A constrained XML Markup based on the BSML markup.
 Polymorphism:
o
o
o
o
o
type (SNP, Mutation, Other)
position (the position of the polymorphism)
length (the length of the polymorphism)
reference (the base reference for the above attributes)
region (when the polymorphism scope is a specific gene region)
 SNP: Tag SNP
A Boolean field indicating whether this SNP is part of small SNPSet that determines a SNP-haplotype.
 GeneExpression: expressionLevels
A constrained XML Markup based on the MAGE markup.
 Proteomic clones:
TBD.
Haifa Research Lab
Proposed HL7 Vocabularies
 Genomics Vocabularies:
 Polymorphism:
 General types
(SNP, Mutation, Sequence Variation)
 Nucleotide-based types
(substitution, insertion, deletion, etc.)
 Alleles Relation
(recessive / dominant, homozygote / heterozygote)
 Genotype-to-phenotype types of effects
 Genomic observation interpretation (Deleterious, Unknown
significance, polymorphism, No mutation)
 SequencingMethodCode
(example in next slide)
Haifa Research Lab
HL7 Vocabulary Example
SequencingMethodCode:
 SSOPH
-Sequence specific oligonucleotide probe hybridization
 SSP
-Sequence specific primers
 SBT
-Sequence-based typing
 RSCA
-Reference strand conformation analysis
Haifa Research Lab
Proposed HL7 Vocabularies (cont.)
 Tissue Typing related Vocabularies:
 TissueTypingLocusMatchingClass
 TissueTypingMatchingClass
 TissueTypingTestingClass
 TissueTypingTestingMethod
 TissueTypingDocumentType
 TissueTypingOrderClass
 DonorType (allogeneic, autologous, etc.)
 Class I & II antigens classification
Haifa Research Lab
XML Examples
 Genotype Examples:
o GenotypeSample1.xml
A genotype of two HLA alleles in the B locus
o GenotypeSample2.xml
A genotype of two HLA alleles in the B locus, along with a SNP designation in the
first allele
 Tissue Typing Observation Examples:
o TissueTypingObservationSample1.xml
Consists of a single tissue typing observation of a patient or a donor
o TissueTypingObservationSample2.xml
Consists of two tissue typing observations of a patient & donor, leading to a tissue
typing matching observation
 Donor Search Examples:
o TissueTypingDonorBankSample1.xml
This example is aimed at illustrating an unsolicited message from a BMT Center to
a donor bank, sending a patient's tissue typing observation for the purpose of
searching an appropriate donor
Haifa Research Lab
Next Steps
 HL7





Formally submission of our harmonization proposals
Continue with 2 alternatives until harmonization is resolved
Register the Genotype Family History models as CMETs
Hand craft sample instances (for review and experimental use)
Derive a Genetic Testing model from the HL7 Lab SIG Models
 Vocabularies
 HL7 External-
develop
get HL7 to recognize them
 Constraining Bioinformatics Markup
(continue the effort and include markup in the next ballot)
 MAGE-ML or MIAME
 BSML (done)
 caBIO (?)
Haifa Research Lab
Linking to the NCI Rembrandt Model
Use-case driven
modeling, designed with
the HL7-Genotype
model as a starting
point and will eventually
extend the caBio model.
Haifa Research Lab
Alternative Genotype Models
Genotype
HL7 Clinical Genomics SIG
A model without
genomic
specializations of
the HL7 RIM
Observation class:
Document:
Subject:
Facilitator:
ClinicalPhenotype
(POCG_RM000004)
Individual Genotype DIM (to be registered as a CMET) - Genomic Attributes as HL7 Clones
Genomics Data
Rev:
0.17
Date:
September 14, 2004
Amnon Shabo (Shvo), IBM Research in Haifa, [email protected]
Entry Point:
Genotype
Constraint: translationalData.value
Constrained to a restricted caBio
content model, specified elsewhere.
Entry point to the
Clinical-Genomics
Genotype Model
0..* causedClinicalPhenotype
causeOf
typeCode*: <= CAUS
0..* causedClinicalPhenotype
causeOf
Haplotype
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
value: ANY [0..1]
ClinicalPhenotype
ClinicalPhenotype
typeCode*: <= CAUS
Note:
The presence of this
0..* haplotype clone indicates that the
source SNP clone is a
typeCode*: <= COMP
tag SNP (note that it
has a DEF mood),
Note:
A related allele that is on a
different haplotype, and still
has significant interrelation
with the source allele.
Haplotype
componentOf
translationalData
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1] (e.g., HETEROZYGOTE)
text: ED [0..1]
effectiveTime: IVL<TS> [0..1] (the time of genotyping)
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CD CWE [0..1] <= ActCode
value: ANY [0..1]
0..* pertinenttranslationalData
pertinentInformation
typeCode*: <= PERT
typeCode*: <= COMP
typeCode*: <= CAUS
DeterminantPeptide
typeCode*: <= COMP
tagSNP
IndividualAllele
classCode*: <= OBS
moodCode*: <= DEF
There must be at least one
IndividualAllele and three
at the most. The typical case
would be an allele pair, one
on the paternal chromosome and
one on the maternal chromosome.
0..* referredToIndividualAllele
reference
typeCode*: <= SUBJ
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1] (classification of the
determinant)
text: ED [0..1]
value: ANY [0..1]
0..* derivedDeterminantPeptide
derivation
Note:
0..1 tagSNP
subject
The third allele could be
present if the patient has
three copies of a chromosome as
in the Down’s Syndrome.
typeCode*: <= REFR
typeCode*: <= DRIV
Note:
Use methodCode if
you don’t use the
associated method
procedure.
1..3 individualAllele
AlleleSequence
SNP
Polymorphism
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
(SNP classification, e.g. from Entrez
dbSNP)
text: ED [0..1]
value: BAG<ED> [0..*] (the SNP itself)
methodCode: SET<CE> CWE [0..*]
IndividualAllele
0..* sNP
subject6
typeCode*: <= SUBJ
sequelTo
subject9
typeCode*: <= SEQL
typeCode*: <= SUBJ
0..1 alleleSequence
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code*: CE CWE [1..1] (allele classification)
text: ED [0..1]
value: ANY [0..1] (e.g. accession no. in GeneBank)
methodCode: SET<CE> CWE [0..*] (The method by which the code was determined)
subject7
typeCode*: <= SUBJ
0..* causeDeterminantPeptide
manifestationOf
DeterminantPeptide
typeCode*: <= MFST
subject5
subject4
subject
typeCode*: <= SUBJ
typeCode*: <= SUBJ
typeCode*: <= SUBJ
Note:
These diseases are not the actual
phenotype for the patient, rather they
are the known risks of this mutation.
0..1 polymorphismAttributes
PolymorphismAttributes
0..* mutation
0..1 priorMutation
Mutation
Polymorphism
Attributes
Container
pertinentInformation
typeCode*: <= PERT
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE <= ActCode
(the standard's code (e.g., MAGEML identifier)
text:
effectiveTime:
value: ED [1..1] (the actual gene
expression levels)
methodCode:
Constrained to a restricted MAGE-ML
content model, specified elesewhere.
Note:
A container of common
polymorphism attributes.
0..* polymorphism
Polymorphism
classCode*: <= OBS
moodCode*: <=
EVN
id: II [0..1]
code: CD CWE [0..1]
<= ActCode
text: ED [0..1]
value: ANY [0..1]
subject
typeCode*: <= RISK
classCode*: <= OBS
moodCode*: <= DEF
code: CD CWE [0..1] <= ActCode
text: ED [0..1]
value: ANY [0..1]
Constraint: AlleleSequence.value
Constrained to a restricted
BSML content model,
specified elsewhere.
0..* causePolypeptide
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code*: CE CWE [1..1]
(classification of the protein, e.g.,
SwissProt, PDB, PIR, HUPO)
text: ED [0..1]
value: ANY [0..1]
Note:
This might be a computed outcome, i.e.,
the lab does not provide the actual protein,
but secondary processes populate this
clone with the translational protein.
0..* causedClinicalPhenotype Note:
An observation of a clinical condition
represented internally in this model.
ClinicalPhenotype
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
(e.g., disease, allergy, sensitivity, ADE, etc.)
text: ED [0..1]
value: ANY [0..1]
reference
component4
component5
component6
component7
typeCode*: <= COMP
typeCode*: <= COMP
typeCode*: <= COMP
typeCode*: <= COMP
0..1 polyPosition
component8
typeCode*: <= x_ActRelationshipExternalReference
typeCode*: <= COMP
0..1 polyReference
PolyType
PolyLength
PolyPosition
PolyReference
PolyRegion
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
value: CE CWE [1..1]
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
value: INT [1..1]
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
value: INT [1..1]
classCode*: <= OBS
moodCode*: <= EVN
id: SET<II> [0..*]
value: ED [0..1]
classCode*: <= OBS
moodCode*: <= EVN
id: SET<II> [0..*]
value: ED [0..1]
Note: A code attribute was not added to any of the polymorphism
attribute clones as this seems to be implicit from the clone name.
typeCode*: <= SUBJ
classCode*: <= PROC
moodCode*: <= EVN
id: II [0..1]
code: CD CWE [0..1] <=
ActCode
(type of method)
text: ED [0..1]
(free text description of the
method used)
methodCode: SET<CE>
CWE [0..*]
Polypeptide
Note:
Should refine ActRelationship typeCode
to elaborate on different types of genomic
to phenotype interrelations.
PolymorphismAttributes
classCode*: <= ActContainer
moodCode*: <= EVN
0..1 polyLength
subject
typeCode*: <= MFST
Note:
The classCode should be
OBSGENPOLMUT
which stands for mutationpolymorphism
genomic observation.
typeCode*: <= CAUS
0..1 polymorphismAttributes
0..1 polyType
0..* method
manifestationOf
0..1 polymorphismAttributes
PolymorphismAttributes
Polymorphism
Attributes
0..* riskKnownAssociatedDiseases
risk
causeOf
typeCode*: <= SUBJ
typeCode*: <= SUBJ
Note:
The classCode should be
OBSGENPOL which stands
for polymorphism genomic
observation, a subtype of
OBSGENPOL (polymorphism
genomic observation) which
is a subtype of OBSGEN
(genomic observation).
Method
knownAssociatedDiseases
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CE CWE [0..1]
(mutation classification)
text: ED [0..1]
value: ANY [0..1]
(mutation code, e.g. drawn
from LOINC MOLECULAR
GENETICS NAMING)
subject
Constraint: GeneExpression.value
classCode*: <= OBS
moodCode*: <= EVN
id: II [0..1]
code: CD CWE [1..1]
(the sequence standard code, e.g.
BSML, GMS)
text: ED [0..1] (sequence's
annotations)
effectiveTime: GTS [1..1]
value: ED [1..1] (the actual sequence)
methodCode: SET<CE> CWE [0..*]
(the sequencing method)
Mutation
0..1 geneExpression
GeneExpression
0..* pertinentMethod
Method
0..* causedClinicalPhenotype
causeOf
component
0..* haplotype
componentOf
Note:
The classCode should be
OBSGENPOLSNP which
stands for
SNP-polymorphism
genomic observation.
Genotype
Polymorphism
Attributes Shadow
asso. W / Mutation
0..1 polyRegion
0..* referredToExternalClinicalPhenotype
ExternalClinicalPhenotype
classCode*: <= OBS
moodCode*: <= EVN
id*: II [1..1]
(The id of an external observation (e.g., in a problem
list)
Note:
An external observation is a valid Observation
instance existing in any other HL7-compliant
artifact, e.g., a document or a message.
Note: Shadowed observations
are copies of other observations
and thus have all of the original
act attributes as well as all
‘outbound’ associations.
Haifa Research Lab
Comments received on the Genotype Model
 Revalidate/collapse the polymorphism hierarchy
Add a RIM class “SequenceVariance”
Representing all types of polymorphisms
Type could be placed in the code attribute
‘position’ and ‘length’ could be parts of a boundary in a
RegionOfInterest type of Observation
 Could represent any bio-sequence (DNA, RNA, Protein, etc.)




 Patient data vs. generic knowledge
 tagSNP, knownAssociatedDiseases and haplotype are a type of
knowledge
 Should they only be referenced (pointing to KBs)?
 Types of relationships between the various Genotype
observations: Pertinent, Component, Subject,…?
 It’s tricky as it should apply to the observations and not to the
observed entities
Haifa Research Lab
Comments on the Genotype Model (cont.)
 Distinguishing the encapsulating objects from the
bubbled-up ones
 associate encapsulated objects to a bubbled-up objects, with
options: XFRM (transformation), XCRPT (excerpt), SUMM (summary),
DRIV (derived from)… what’s best?
 Method object should be in DEF mood?
 Could it be that there is a need to describe a method per patient?
 Is the SNP  Mutation association useful?
 Changed the association type to XFRM to demonstrate a possible
“bubbled-up” association, i.e., a SNP was encountered as a mutation
Haifa Research Lab
SLIST Data Type
 Use HL7 data types to represent bio-sequences
 SLIST<CV> (applied to CV=Coded Value) could hold either of the
following:
 ACGTCGGTTCA…
 Leu-Ala-Met-Gly-Ala-…
Table 37: Components of Sampled Sequence
Name
origin
scale
digits
Type
Description
T
The origin of the list item value
scale, i.e., the physical quantity
that a zero-digit in the sequence
would represent.
T.diff
A ratio-scale quantity that is
factored out of the digit
sequence.
list<int>
A sequence of raw digits for the
sample values. This is typically
the raw output of an A/D
converter.
Haifa Research Lab
Issues with just SequenceVariation…
 SNP:
 Link to Haplotype is valid only for SNP type of Polymorphism
 tagSNP is valid only for SNP
 Mutation:

code&value are constrained to LOINC or other medical-oriented
taxonomy rather than to an LS taxonomy as in polymorphism
 The attribute knownAssociatedDiseases moves to the phenotype
choice so it’s resolved
 SNP  Mutation association needs now a recursive
association within Sequence Variation
 Technical issue: cannot shadow a choice box
Haifa Research Lab
The End…
Thank you…