Disease Informatics: Brush up the terms describing techniques and
Download
Report
Transcript Disease Informatics: Brush up the terms describing techniques and
R. P. Deolankar
Half knowledge is always dangerous
Wet lab
A laboratory allowing for hands-on scientific research
and equipped with
Appropriate plumbing
Ventilation
Equipment
High-throughput technology
The technology handling high volume of data or
material
Large-scale methods to purify, identify, and
characterize DNA, RNA, proteins and other molecules.
These methods are usually automated, allowing rapid
analysis of very large numbers of samples.
Microarray
A tool used to sift through and analyze the
information contained within a genome. A microarray
consists of different nucleic acid probes that are
chemically attached to a substrate, which can be a
microchip, a glass slide or a microsphere-sized bead.
DNA microarray
A microarray of immobilized single-stranded DNA
fragments of known nucleotide sequence that is used
especially in the identification and sequencing of DNA
samples and in the analysis of gene expression (as in a
cell or tissue)
Protein microarray
Protein microarray is a piece of glass on which
different molecules of protein have been affixed at
separate locations in an ordered manner thus forming
a microscopic array.
Mass spectrometry
An instrumental method for identifying the chemical
constitution of a substance by means of the separation
of gaseous ions according to their differing mass and
charge -- called also mass spectroscopy
Mass spectrometry: A method used to determine the
masses of atoms or molecules in which an electrical
charge is placed on the molecule and the resulting ions
are separated by their mass to charge
Tandem mass spectrometry
Multiple steps of mass spectrometry selection, with
some form of fragmentation occurring in between the
stages
Immunofluorescence and immunocytochemistry,
ELISA, immunoblotting
Dry lab
A laboratory for making computer simulations or for
data analysis especially by computers (as in
bioinformatics)—called also dry laboratory
Gene prioritization
The results of experimental or computational analyses
in the post-genomic era (e.g., those from microarrays,
proteomics, ChIP-chip, genome-wide in silico
searches, genetic linkages, etc.) often consist of long
lists of candidate genes. There are methods that
provide score to the gene and rank them. This process
is known as gene prioritization.
PhenoGO
PhenoGO is a multiorganism database that provides
phenotypic context, such as the cell type, disease, and
tissue and organ to existing associations between gene
products and Gene Ontology (GO) terms as specified
in the Gene Ontology Annotations (GOA).
BioMedLEE
One existing Natural Language Processing (NLP)
system, known as BioMedLEE, automatically extracts
biological information consisting of bio-molecular
substances and phenotypic data.
MeSH
Medical Subject Heading
MeSH is the National Library of Medicine's controlled
vocabulary thesaurus. It consists of sets of terms
naming descriptors in a hierarchical structure that
permits searching at various levels of specificity.
PhenOS
Phenotype Organizer System, PhenOS is a system
under development by the Lussier research group with
purpose of bridging the gap between heterogeneous
biomedical terminologies.
Inparanoid algorithm
The protein interaction networks of two species are
aligned by assigning proteins to sequence homology
clusters using the Inparanoid algorithm
POCUS
Prioritization of candidate genes using statistics
Reference: Turner FS, Clutterbuck DR, Semple CA.
POCUS: mining genomic sequence annotation to
predict disease genes. Genome Biol. 2003;4(11):R75.
OMIM
Mendelian Inheritance in Man
The Online Mendelian Inheritance in Man. A catalog
of human genes and genetic disorders authored and
edited by Dr. Victor A. McKusick and his colleagues at
Johns Hopkins and elsewhere, and provided through
NCBI. The database contains information on disease
phenotypes and genes, including extensive
descriptions, gene names, inheritance patterns, map
locations and gene polymorphisms.
TOM
A web-based integrated approach for identification of
candidate disease genes, Transcriptomics of OMIM
Reference: Rossi S, Masotti D, Nardini C, Bonora E,
Romeo G, Macii E, Benini L, Volinia S. TOM: a webbased integrated approach for identification of
candidate disease genes. Nucleic Acids Res. 2006 Jul
1;34
Data mining
Data mining (sometimes called data or knowledge
discovery) is the process of analyzing data from
different perspectives and summarizing it into useful
information
Online Predicted Human
Interactions Database or OPHID
Designed to be both a resource for the laboratory
scientist to explore known and predicted proteinprotein interactions, and to facilitate bioinformatics
initiatives exploring protein interaction networks.
Single nucleotide polymorphisms
(SNPs)
A single nucleotide polymorphism (SNP, pronounced
snip), is a DNA sequence variation occurring when a
single nucleotide - A, T, C, or G - in the genome (or
other shared sequence) differs between members of a
species (or between paired chromosomes in an
individual).
Synonymous - nonsynonymous
substitutions
Substitutions that result in amino acid replacements
are said to be nonsynonymous while substitutions that
do not cause an amino acid replacement (such as a
GGG to GGC change - both codons still encode
glycine) are said to be synonymous substitutions.
Because of the difference in their effects on the
physiology of the organism, synonymous and
nonsynonymous substitutions can have quite different
dynamics. For example, synonymous substitutions
usually occur at a much faster rate than do
nonsynonymous substitutions. Hence, for coding
sequence it is often desirable to separate these two.
Ka/Ks values
In genetics, the Ka/Ks ratio or dN/dS ratio is the ratio
of the rate of non-synonymous substitutions (Ka) to
the rate of synonymous substitutions (Ks), which can
be used as an indication of selection on a proteincoding gene.
dbSNP
db (Database) of Single nucleotide polymorphism
A public-domain archive for a broad collection of
Single Nucleotide Polymorphisms (SNPs) and is
hosted at the National Center for Biotechnology
Information.
Orthodisease
OrthoDisease, a comprehensive database of model
organism genes that are orthologous to human disease
genes
Orthodisease is constructed primarily using
Inparanoid analysis. Inparanoid is a program that
automatically detects orthologs (or groups of
orthologs) from 2 species
Field Biology
Biology of organisms living in their natural
environments
Applications in Ecology and Evolutionary Biology
Epidemiology
Epidemiology is the study of how often disease occur
in different groups of people and why
Planning and evaluating strategies to prevent illness
Guide to the management of patients in whom disease
is already developed
Reference: Epidemiology for the uninitiated by
Coggon, Rose and Barker
Population at risk
The population at risk is the group of people, healthy
or sick, who would be counted as cases if they had the
disease being studied
It defines the denominator for the calculation of rates
of incidences and prevalence
It is the number of persons potentially capable of
experiencing the event or outcome of interest
Floating numerator
Numerator floating without its denominator
Common error occurring in field investigations
The error occurs due to the number of cases not
relating to the “at risk” population
Epidemiological conclusions (on risk) cannot be
drawn from purely clinical data (on the number of sick
people seen)
Target population
It is the population about which the conclusions are to
be drawn
Sometimes measurement can be made on the full
target population else study samples are used
Study population and study sample
The group of individuals in a study
In a clinical trial, the participants make up the study
population
Study sample is chosen from study population
Aetiology
The study of the factors that predispose to or
precipitate the disease
External agent, a susceptible host, and an environment
that brings the host and agent together is a disease
etiology triad
Surveillance
Watching over a population and recording data likely
to have epidemiological significance, usually with the
aim of early detection of disease. Essentially an
interventionist exercise compared with monitoring,
which is passive.
Case
Disease in populations exists as a continuum of
severity rather than as an all or none phenomenon
The real question in population studies is not “has the
person got the disease?” but “How much of the disease
has he or she got?”
Diagnostic continuum is dichotomized into “cases”
and “non-cases” on the basis of statistical, clinical,
prognostic or operational options
Hence case definition should be precise and
unambiguous.
Epidemiological case definitions are narrower and
more rigid than clinical ones
Incidence
It is the rate at which new cases occur in a population
during a specified period
(number of new cases) / (Population at risk) * (Time
during which cases were ascertained)
Prevalence
Point prevalence
The proportion of a population that are cases at a point
in time
Period prevalence
The proportion of a population that are cases at any
time within a stated period
Attributable risk and relative risk
Attributable risk is the disease rate in exposed persons
to that in people who are unexposed
Relative risk is the ratio of the disease rate in exposed
persons to that in people who are unexposed
Attributable risk = rate of disease in unexposed
persons * (relative risk – 1)
Confounding
Causing confusion about causation due to 2 or more
variables associated with the disease
Confounding may give rise to spurious associations
when in fact there is no causal relation, or at other
extreme, it may obscure the effects of a true cause
Bias
Bias is the deviation of inferences from the truth
Selection bias is the biased selection of individuals
into the study
Information bias is the biased collection or biased
analysis of the data
Motto of the epidemiologist could well be “dirty hands
but a clean mind” (manus sordidae, mens pura)
Chance
A measure of how likely it is that some event will occur
Random, unpredictable influences on events
The association between the exposure and disease is
considered to be “statistically significant” if the
probability that the test statistic < 0.05
Sensitivity
The proportion of persons with the disease who are
correctly identified by defined criteria
The proportion of persons with the disease who are
correctly identified by a screening test
The ability of a system to detect epidemics and other
changes in disease occurrence
A sensitive test detects high proportion of the true
cases
Specificity
The proportion of persons without a disease who are
correctly identified by a test
The number of true negative results divided by the
total number of all those without the disease
Randomization
Randomization is used to obtain a similar allocation of
individuals to each group, the groups are followed at
the same time
Purpose of randomization: To obtain unbiased
estimates of differences among treatment responses
(means or effects) and to obtain an unbiased estimate
of the random error variation in the experiment
Replication and Local control
Replication is the repetition of an experiment in order
to test the validity of its conclusion
Local control is blocking or grouping to eliminate or to
control the various sources of variation (error)
Replication and local control are necessary to achieve a
reduction in the random variation among treatment
effects in the experiment
Observational (non-experimental)
studies
Person-level unit of observation
1.
Longitudinal measurements
a.
Cohort samples
b.
Case control samples
2. Cross-sectional measurements
Aggregate level units of observation (ecological
studies)
Reference: Epidemiology Kept Simple: An
Introduction to Traditional and Modern
Epidemiology; by B. Burt Gerstman
Personal-level vs. Aggregate-level
Personal level study on smoking might collect
information on each person’s smoking habits, age and
disease status
Aggregate level of study on smoking might collect
information on each region’s per capita cigarette
consumption, age distribution and disease rate
Longitudinal studies
Longitudinal studies are studies in which the sequence
of events in individuals can be delineated over time
In cohort studies the incidence of disease in exposed
and non-exposed groups are compared
In case-control studies people with disease (cases) and
people without disease (controls) are sampled from
the source population and exposure histories of cases
and controls are compared
Longitudinal vs. Cross sectional
studies
Longitudinal measurements relates exposures and
diseases in individuals at various time references
Cross-sectional measurements are not definitively
time sequenced in individuals
In cross-sectional studies the analysis of data is
gathered from samples at one point in time. Since both
the outcome and the variables are measured at the one
time these studies are not strong at showing causeeffect relationships.
Experimental studies
In experimental studies, the investigator introduces or
removes an exposure in order to observe its influence
on a health outcome. Such allocations may be based
on chance mechanism (randomized trials) or on other
deliberate mechanisms built into the study’s protocol
(non-randomized trials)
Other disease informatics lectures:
Supercourse: Epidemiology, the Internet and Global Health
Lecture numbers 31981, 30331, 28921, 25381, 25371, and 34011