Transcript Document
The NCI Thesaurus:
A Controlled Vocabulary
Of NCI Functions
Gilberto Fragoso
Center for Bioinformatics
National Cancer Institute
National Institutes of Health (U.S.)
Overview
• Center for Bioinformatics (NCICB)
• Mission
• Supported NCI Activities
• http://ncicb.nci.nih.gov
• NCI Thesaurus
• Challenges & Issues
NCI Center for Bioinformatics
• Support NCI Bioinformatics/Research Activities
Cancer Genome Annotation Project (CGAP)
Clinical Trials
Cancer Molecular Analysis Project (CMAP)
Molecular Analysis of Cancer (Director's Challenge)
Mouse Models of Human Cancer Consortium
(MMHCC)
• Cancer Core Infrastructure (caCORE)
Cancer Data Standards Repository (caDSR)
Cancer Bioinformatics Infrastructure Objects (caBIO)
Enterprise Vocabulary Services (EVS)
NCI Center for Bioinformatics
caBIO
• Model of cancer research domain
• Sample caBIO classes
Gene
Protein
Sequence
SNP
Chromosome
Clone
Library
Taxon
Agent
Pathway
Tissue
Organ
Disease
ClinicalTrialProtocol
• Applicable to other biomedical domains
• Released as part of caCORE version 1
• http://ncicb.nci.nih.gov, follow Infrastructure
NCI Center for Bioinformatics
Enterprise Vocabulary Services
• Collection of services and resources that address
NCI's needs for controlled vocabulary
• Main vocabulary products
• NCI Thesaurus
• NCI Metathesaurus
• http://ncimeta.nci.nih.gov
• Specialty vocabularies
• MMHCC (Mouse Models of Human Cancer Cons.)
• CTRM (Core Terminology Reference Model)
NCI Thesaurus
• Reference biomedical vocabulary for the NCI
• Contains all the codes, keywords, and special
purpose vocabulary used in the Institute
• Features
• NCI-only vocabulary sources
• No licensing restrictions
• Description logic-based
• Consistency checks
• Auto-classification
• Primary purpose is to support coding and
retrieval applications
NCI Thesaurus
• Freely available
Online access via browser application and API
• API extensions provided by caBIO
Flat file format
• Terminology and hierarchies only
• FTP site will be advertised in our Web site
DAML+OIL (Future)
Elements of the NCI Thesaurus
Concepts
Identifiers
Name, Code, ID
Kind (Group)
Defined / Primitive tag
Parent Concept
Roles
Properties
Elements of the NCI Thesaurus
Concepts
Identifiers
Name, Code, ID
Kind (Group)
Defined / Primitive tag
Parent Concept
Roles
Properties
Properties
Preferred Name
Synonyms
CUI (UMLS)
Semantic Type
Definition
External DB ID
Omim
LocusID
GenBank
SwissProt
Comments
Editor’s Notes
Design Notes
"Kinds" in the NCI Thesaurus
Organism
Chemicals
And Drugs
Anatomy
Findings
And Disorders
Clinical Or Research
Activity
Cancer Science
Protein
Gene
Occupation
Or Discipline
Diagnostic And
Prognostic Factors
Biological Process
Technique
Properties
Or Attributes
NCI
Equipment
Top Level Concepts in Various Kinds
NCI Kind
Anatomy Kind
Business Rules
Conceptual Entities
Funding
Patient or Public Education
Social Concepts
Training and Education
Anatomic Structures and Systems
Anatomic Sites
Anatomic Structure
Body Part, Organ, or Tissue
Body Region
Cell Structure
Embryonic Structures
Extracellular Structure
Macromolecular Structure
Miscellaneous Anatomy Terms
Organ Systems
Top Level Concepts in Various Kinds
Chemicals And Drugs Kind
Drugs and Chemicals
Drugs and Chemicals, Structural Classification
Inorganic Chemicals
Organic Chemicals
Drugs and Chemicals, Functional Classification
Chemical Modifiers
Drug of Abuse
Foods and Food Products
Immunologics
Industrial Products
Pharmacologic Substances
Physiology - Regulatory Factors
Reagents
Top Level Concepts in Various Kinds
Gene Kind
Genes
Cancer Gene
Oncogene
Oncogene, G-Protein
Oncogene, Growth Factor
Oncogene, Transcription Factors
Proto-Oncogene
Susceptibility / Resistance Gene
Tumor Promoter Induced Gene
Tumor Supressor Gene
Candidate Disease Gene
Reporter Gene
Top Level Concepts in Various Kinds
Findings And Disorders Kind
Diseases, Disorders, and Findings
Findings
Diseases and Disorders
Familial Neoplastic Syndrome
Lymphoproliferative Disorder
Molecular Disease
Neoplasm
Neoplasm by Morphology
Neoplasm by Site
Neoplasm by Special Category
Neoplasm by Disease NEC
Non-Neoplastic Disease, Syndrome, or Condition
Precancerous Condition
Top Level Concepts in Various Kinds
Biological Process Kind
Biological Processes
Cell Processes
Intercellular Processes
Metabolic Processes
Organismal Processes
Pathologic Processes
Physiologic Processes
Population Processes
Viral Functions and Activities
"Roles" in the NCI Thesaurus
• Semantic relations between concepts are
expressed via roles
• We define a small number of roles for the various
kinds
Support classification
Support specific use cases
Reproducible usage by domain experts
• A concept is tagged as "defined" when a specific
set of roles is asserted
Roles in the Anatomy Kind
Anatomic Structure is Physical Part of
Anatomy
Anatomic Structure Has Location
Exp Organism Anatomic Structure
is Physical Part of
Disease Roles
Findings
And Disorders
Disease has Associated Anatomy
Disease has Associated Cell Type
Disease Metastatic to Site
Anatomy
Exp Org Disease has Associated
Exp Org Anatomy
Disease has Modifier
Properties
Or Attributes
Roles in the Chemicals & Drugs Kind
CD has Biochemical
Class or Structure
CD is Part of CD
CD has Target Anatomy
Anatomy
Chemicals
And Drugs
CD FDA Approved for Disease
Findings
And Disorders
CD Plays Role in Biological Process
CD has Target Protein
Protein
CD has Target Organism
CD has Source
Biological Process
Organism
Roles in the Gene Kind
Gene
Gene Found in Organism
Organism
Gene in Chromosomal Location
Gene Associated With
Disease
Findings
And Disorders
Anatomy
Gene is Biomarker Type
Gene Has Function
Biological Process
Diagnostic And
Prognostic Factors
Roles in the Protein Kind
Protein is Physical Part Of
Protein Plays Role in Biological Process
Protein Has Chemical Classification
Biological Process
Protein Has Biochemical Function
Protein
Protein Encoded by Gene
Gene
Protein Has Organism Source
Organism
Protein is Biomarker Type
Diagnostic And
Prognostic Factors
Protein Has Structural Domain or Motif
Protein Has Associated Anatomy
Protein Expressed in Tissue
Protein Malfunction Associated with Disease
Findings
And Disorders
Anatomy
Issues
• Increasing number of non-cancer genes
• Hierarchies for multiple organisms
Unified, or one subtype per organism?
• Tangled hierarchies
Acknowledgements
NCICB
Frank Hartel
Sherri de Coronado
James Oberthaler
Gilberto Fragoso
Office of
Communications
Larry Wright
Margaret Haber
Contractors
Peter Covitz
Ken Buetow
Kevric, Inc
Apelon, Inc