Basics of Statistics - University of Delaware
Download
Report
Transcript Basics of Statistics - University of Delaware
Study Design
A study design is a careful advance plan of the
analytic approach needed to answer the research
question under investigation in a scientific way.
The basics of study design:
A carefully formed research question and a clearly stated
outcome measure
Assessing the feasibility of study objectives and
considering alternative research designs
Defining the study population and key concepts in
operational terms
Selecting methods of sampling, data collection, and
analysis appropriate to the study's objectives
Developing realistic budgets and time schedules for each
stage of the research.
Types of Study
Experimental/ Interventional: Investigator controls
the assignment of the exposure or of the treatment e.g.
randomized controlled trial.
Non-experimental/Observational: The allocation or
assignment of factors is not under control of
investigator. For example, in a study to see the effect of
smoking, it is impossible for an investigator to assign
smoking to the subjects. Instead, investigator can
study the effect by choosing a control group and find
the cause and relation effect. Some examples are Cross-sectional study
Cohort study
Case-control study
Study Design
Randomized controlled Trial: Random allocation of different
interventions (or treatments) to subjects in which one treatment
group is for the purpose of determining the efficacy of the other
treatment (s). E.g. placebo or standard medication can be used
as a controlled to compare the efficacy of the other (s)
treatment (s)
Types of control groups:
Placebo control group: Receive treatment
Active control group: For example a cancer patient can’t be given
placebo. Need to use a standard medication in the market.
Types of randomized controlled trials:
Open trial: Investigator and subject know the full details of the
treatment.
Single-blind trial: Investigator knows about the treatment but
subject does not.
Double-blind: Both investigator and subject do not know about the
treatment
Study Design
Cross-sectional study: A descriptive study of the relationship
between diseases and other factors at one point of time (usually) in
a defined population. This is also known as prevalence study or
survey study.
Cohort study: Subjects who presently have a certain condition
and/or receive a particular treatment are followed over time and
compared with another group who are not affected by the condition
under investigation. Cohort analysis attempts to identify cohorts
effects. E.g. recruit a group of smokers and a group of non-smokers
and follow them for a set period of time and note differences in the
incidence of lung cancer between the groups at the end of this time.
Case-control study: A study that compares two groups of people:
those with the disease or condition under study (cases) and a very
similar group of people who do not have the disease or condition
(controls) and look back to see if they had the exposure of interest.
E.g. two groups of people (lung cancer group and non-lung cancer
group) are selected and compare for an exposure (smoke).
Study Design
A Protocol is a document that describes
the background, objective(s), design,
methodology, data collection and
management, variable assessment,
statistical considerations, and
organization of the study.
Basically a protocol is a manuscript that
describes every step from proposal to
completion of the research study.
Basic concepts of clinical trials
A clinical trial is a research study to answer specific questions about
vaccines or new therapies or new ways of using known treatments.
Institutional Review Board (IRB): A committee of physicians,
statisticians, researchers, and others that ensures that a clinical trial is
ethical and that the rights of study participants are protected.
Efficacy is the maximum ability of a drug or treatment to produce a result
Baseline measurement is the measurement taken just before a
participant starts to receive the experimental treatment which is being
tested
Change from baseline measurement is the difference between baseline
and post-baseline measurements.
Percent change from baseline = (Change from baseline / baseline
measurement) x 100
Pharmacokinetics (PK) analysis explores what the body does to the drug.
That is, the processes (in a living organism) of absorption, distribution,
metabolism, and excretion of a drug or vaccine. It helps to decide the
duration of doses.
Pharmacodynamic (PD) analysis detects the effect of drug on the body
or microorganisms of the body.
Basic concepts of clinical trials
Classification of clinical trials by their purposes
Treatment trials: Test experimental treatments, new
combinations of drugs, or new approaches to surgery or
radiation therapy.
Prevention trials: Look for better ways to prevent disease
in people who have never had the disease or to prevent a
disease from returning. These approaches may include
medicines, vitamins, vaccines, minerals, or lifestyle changes.
Diagnostic trials: Conducted to find better tests or
procedures for diagnosing a particular disease or condition.
Screening trials: Test the best way to detect certain
diseases or health conditions.
Quality of Life: Trials (or Supportive Care trials) explore
ways to improve comfort and the quality of life for individuals
with a chronic illness.
Study design of a Clinical Trial
Title: Reflects the main research interest.
Background/Rational of study: Importance of the study
and previous study results will be explained.
Study Objectives:
Primary objective (s): Focuses on the core research
question (s)
Secondary Objectives: Focuses on the secondary/ optional
research questions
Investigational Plan:
Variable (parameter) selections to achieve the research
objectives. Avoid selection of unnecessary variables
Overall study design and plan description: Brief
description of design and assessments.
Study design of a Clinical Trial
Selection of Study Population:
Inclusion criteria: A set of conditions to include a subject in the
study. E.g. a adult study will include subjects only of age 18 or
more.
Exclusion criteria: A set of conditions under which a subject
(met inclusion criteria) will be excluded from the study. E.g.
protocol violations, Non-compliance of the treatment etc.
Sample size calculation: Based on the effect size and the statistical
power needed to test of main research question. Here are some
useful websites for power and sample size calculation
compare means, compare proportions, population survey
Experimental Design, survival analysis
Regression/multiple regression
http://www.danielsoper.com/statcalc/calc01.aspx
http://department.obg.cuhk.edu.hk/researchsupport/Sample_size_EstM
ean.asp
http://hedwig.mgh.harvard.edu/sample_size/size.html
Study design of a Clinical Trial
Description of the treatment groups/ treatment
administration/ treatment period
Randomization of the treatment to the subjects
Detailed descriptions of assessment/collection of all
parameters/variables including sign and symptoms (adverse
events)
Statistical Methods: Detailed descriptions of the statistical
analyses of all variables in the study. A typical clinical trial
may include Hypothesis and Decision rules
Rules for handling missing values
Interim/Final analysis
Subjects Disposition/summaries including the summary of
the reasons of early termination
Study design of a Clinical Trial
Statistical Methods (continued)
Disease Diagnosis/History: Usually descriptive
statistics is enough
Summary of Medical/Surgical history
Demographics (age, sex, race, BMI, height, weight,
etc.) and baseline characteristics : Usually
descriptive statistics are enough but these variables
are often used as covariates in efficacy analysis.
Efficacy analysis: Needs some reasonable statistical
analysis to justify the research goal. Researchers
sometimes perform analysis on the change from
baseline and percent change from baseline values
instead of the observed values.
Study design of a Clinical Trial
Statistical Methods (continued)
Safety analysis (if subjects receives medication): Vital
signs (temperature, blood pressures, respiration, pulse
etc.), ECG/MRI results, Laboratory parameters, Physical
exams, Adverse Events, pregnancy tests etc.)- Usually
summary of the observed values and change from
baseline values are provided.
Prior/concomitant medications: Summary of the all
medications taken during the study or just immediate
prior to study ( usually not more than one month) are
provided.
Quality of life measurements: Both summary statistics
and reasonable statistical analysis are required.
Pharmacokinetic (PK) & pharmacodynamic (PD)
parameters: Summary statistics is enough for most cases.
Study design of a Clinical Trial
Ethics:
Independent Ethics committee (IEC) or Institutional Review Board
(IRB)
Ethical conduct of the study: Guidelines of Food and Drug
Administration (FDA) and International Conference on
Harmonization (ICH) for good clinical practices and maintaining the
quality of research.
Patient information and consent: A document that describes the
rights and risks of the study participants, and includes details about
the study, such as its purpose, duration, required procedures, and
key contacts. The participant then decides whether or not to sign
the document.
Data collection and management
Storage security
Protection from data loss
Checking inconsistency of the data
Experimental Design for
Microarray Experiments
Suzanne McCahan, Ph.D.
Molecular Biologist
Microarray Research
Should be Hypothesis Driven
Test a specific statement
Ask a specific question
Involves data mining
Often generates new hypotheses
Microarrays can measure…
Gene Expression
Chromatin Structure
Methylation of Cytosine
Histone Binding
Array Comparative Genomic
Hybridization (aCGH)
Amplification of Chromosomal Regions
Deletion of Chromosomal Regions
General Background
The application and type of array to
be used determine what should be
considered when designing
microarray experiments.
A general understanding of
microarrays is needed.
General Characteristics of Microarrays
Microarrays are small.
This is an picture of an
Affymetrix GeneChip.
Image courtesy of Affymetrix.
Microarrays are comprised of DNA probes
Probes are attached to (or
synthesized on) a surface.
Oligos - 25 bp
Oligos – 50-70 bp
Cloned or Amplified DNA
PCR – 500 bp
BAC (Bacterial Artificial
Chromosome) – 300kb
Image courtesy of Affymetrix.
Hybridization
Strands represent
A probe on an
array
Labeled DNA or
RNA from a sample
(This is also
referred to as the
‘target’.)
Image courtesy of Affymetrix.
C
T
A
A
G
A
G
C
G
A
T
T
C
T
C
G
C:
T :
A:
A:
G:
A:
G:
C:
G
A
T
T
C
T
C
G
Hybridization
No fluorescence
where labeled DNA
(or RNA) does NOT
hybridize to probe.
Fluorescence
where labeled DNA
(or RNA) hybridizes
to probe.
Images courtesy of Affymetrix.
DNA Microarrays
There are many probes on a single
microarray.
Amount of target is relative to the
intensity of fluorescent signal.
Image courtesy of Affymetrix.
Numbers of Probes on Microarrays
Gene Expression
Affymentrix Rat GeneChip has ~300,00 probes
representing ~15,000 genes
Chromatin Structure
Agilent Mouse CpG Island Array has ~100,000
probes
aCGH (Amplification/Deletion)
NimbleGen Human X Chromosome Tiling Array
~385,000 probes
Keep comparisons simple
Two well defined groups is best, although more can be
used.
Normal vs control
Untreated vs treated (one drug)
Less complex samples are better than complex ones.
Cell lines are the least complex
Blood
RBC should be removed, hemoglobin mRNA and protein can
interfere
Mononuclear cells are better than total white cells
Solid tissue
Tumor only, no contaminating normal tissue
Muscle only, no contaminating fat
Decrease Variability
Samples should be as much the same as possible
If from patients
Exact same tissue
Strict criteria for diagnosis
Only meds to be studied
Same pubertal stage
Handled in a similar manner (immediately on ice)
Same quality of starting material (RNA or DNA)
Hybridization, washing and scanning should be done
by a single person at a single location.
How many samples should be
included in a study?
Many ‘tests’ are done on a single sample.
Each hybridization is expensive. This usually limits
the number of samples that can be included in an
experiment.
With the usual budget, it is not feasible to use
standard statistical tools to determine the number
of samples to be included in a study and analyze
the data.
How many samples should be
included in a study?
The best way to determine sample size is to
do a pilot study to obtain data from a
particular experimental system and do a
power analysis taking the challenges of
microarrays into consideration.
A few publications assess sample size with
public gene expression microarray data sets.
The results vary with dataset. One estimation
for sample size was 10-12 per group.
When a pilot study is not feasible, a general
rule of thumb is 5 – 10 samples per group.
1-Color Hybridizations
Common format for gene expression arrays
RNA or DNA from each sample is hybridized
to a single array
If there are two groups (control and
treated) with 10 samples each, then
A total of 20 samples will be used
A total of 20 arrays will be used
2-Color Hybridizations
Format for some gene expression, some
aCGH, and all chromatin structure arrays
RNA or DNA from two samples
simultaneously hybridized to a single array
One sample is experimental
Other sample is control
Each of the two samples is labeled with a
different fluorochrome.
One array is needed for each pair of
samples
aCGH Arrays –
Detection of Amplification/Deletion
Most platforms for aCGH require 2-color
hybridizations
Tumor Studies
Hybridize labeled DNA from tumor and normal
tissue from a single subject (patient/animal)
together.
Genetic Studies
Hybridize labeled DNA from control subject with
DNA from diseased subject
The same control should be used with all
diseased subjects.
The control could be DNA from a single
individual or from a single pool of individuals
Confirmation of results is necessary
Since there is such disparity in the
number of samples examined and the
number of tests performed (e.g. level of
transcripts measured) microarray
experiments results must be confirmed.
Methods for confirmation of results
Additional samples that were not examined on
the microarray should be used.
Quantitative PCR methods are often performed.
Confirmation usually involves a small number
of genes (or chromosomal regions, depending
on the type of array that was used).
The combination of a larger sample size and
small number of tests allows standard
statistical methods to be used.
Example of a microarry experiment
Modeled after:
Insight into Pathogenesis of
Antibiotic-Resistant Lyme Arthritis
through Gene Expression Profiling
AnneMare Brescia, MD
Principal Investigator
Hypothesis
There are differences in gene expression
of synovial fibroblasts from individuals
with acute Lyme synovitis and chronic
Lyme synovitis and these differences
allow for the perpetuation of
inflammation in chronic Lyme synovitis.
Biological material – Cell lines
Collect synovial fluid
Site of disease activity
Prospectively – don’t know whether the case is
acute or chronic Lyme disease at time of
collection
Culture cells from fluid
Primary cells
Adherent cells are selected
Consistent phenotype - no monocytes
Harvest while cells are dividing
At same passage
Before reaching confluence
Synovial fluid from 3 groups
Control
Injured joints
Acute Lyme Disease
Lyme synovitis resolved within 2 months
of initiation of antibiotic therapy
Chronic Lyme Disease
Lyme synovitis persisted for six or more
months despite antibiotic therapy
Experimental Details
Uniform samples
Cell line probably representing single cell
type
Affymetrix GeneChip
Single color hybridization
Human gene expression arrays – 15 chips
Biological replicates
5 samples per group
Overview of Analysis
Determine differential expression
Confirm results with real-time RT-PCR
Determine functional categories that are
represented by differentially expressed genes
Replication?
Recruitment and/or activation of immune system?
Unexpected results can generate additional
hypotheses
The Future
New technologies may replace microarrays
High through-put sequencing is on the
horizon
Less expensive
Faster
Short sequences (25 nt – 400 nt)
Presents new computing challenges
Experimental design will need to be
adjusted