Basics of Statistics - University of Delaware

Download Report

Transcript Basics of Statistics - University of Delaware

Study Design
 A study design is a careful advance plan of the
analytic approach needed to answer the research
question under investigation in a scientific way.
 The basics of study design:
 A carefully formed research question and a clearly stated
outcome measure
 Assessing the feasibility of study objectives and
considering alternative research designs
 Defining the study population and key concepts in
operational terms
 Selecting methods of sampling, data collection, and
analysis appropriate to the study's objectives
 Developing realistic budgets and time schedules for each
stage of the research.
Types of Study
 Experimental/ Interventional: Investigator controls
the assignment of the exposure or of the treatment e.g.
randomized controlled trial.
 Non-experimental/Observational: The allocation or
assignment of factors is not under control of
investigator. For example, in a study to see the effect of
smoking, it is impossible for an investigator to assign
smoking to the subjects. Instead, investigator can
study the effect by choosing a control group and find
the cause and relation effect. Some examples are Cross-sectional study
 Cohort study
 Case-control study
Study Design


Randomized controlled Trial: Random allocation of different
interventions (or treatments) to subjects in which one treatment
group is for the purpose of determining the efficacy of the other
treatment (s). E.g. placebo or standard medication can be used
as a controlled to compare the efficacy of the other (s)
treatment (s)
Types of control groups:



Placebo control group: Receive treatment
Active control group: For example a cancer patient can’t be given
placebo. Need to use a standard medication in the market.
Types of randomized controlled trials:



Open trial: Investigator and subject know the full details of the
treatment.
Single-blind trial: Investigator knows about the treatment but
subject does not.
Double-blind: Both investigator and subject do not know about the
treatment
Study Design



Cross-sectional study: A descriptive study of the relationship
between diseases and other factors at one point of time (usually) in
a defined population. This is also known as prevalence study or
survey study.
Cohort study: Subjects who presently have a certain condition
and/or receive a particular treatment are followed over time and
compared with another group who are not affected by the condition
under investigation. Cohort analysis attempts to identify cohorts
effects. E.g. recruit a group of smokers and a group of non-smokers
and follow them for a set period of time and note differences in the
incidence of lung cancer between the groups at the end of this time.
Case-control study: A study that compares two groups of people:
those with the disease or condition under study (cases) and a very
similar group of people who do not have the disease or condition
(controls) and look back to see if they had the exposure of interest.
E.g. two groups of people (lung cancer group and non-lung cancer
group) are selected and compare for an exposure (smoke).
Study Design
 A Protocol is a document that describes
the background, objective(s), design,
methodology, data collection and
management, variable assessment,
statistical considerations, and
organization of the study.
 Basically a protocol is a manuscript that
describes every step from proposal to
completion of the research study.
Basic concepts of clinical trials








A clinical trial is a research study to answer specific questions about
vaccines or new therapies or new ways of using known treatments.
Institutional Review Board (IRB): A committee of physicians,
statisticians, researchers, and others that ensures that a clinical trial is
ethical and that the rights of study participants are protected.
Efficacy is the maximum ability of a drug or treatment to produce a result
Baseline measurement is the measurement taken just before a
participant starts to receive the experimental treatment which is being
tested
Change from baseline measurement is the difference between baseline
and post-baseline measurements.
Percent change from baseline = (Change from baseline / baseline
measurement) x 100
Pharmacokinetics (PK) analysis explores what the body does to the drug.
That is, the processes (in a living organism) of absorption, distribution,
metabolism, and excretion of a drug or vaccine. It helps to decide the
duration of doses.
Pharmacodynamic (PD) analysis detects the effect of drug on the body
or microorganisms of the body.
Basic concepts of clinical trials
 Classification of clinical trials by their purposes





Treatment trials: Test experimental treatments, new
combinations of drugs, or new approaches to surgery or
radiation therapy.
Prevention trials: Look for better ways to prevent disease
in people who have never had the disease or to prevent a
disease from returning. These approaches may include
medicines, vitamins, vaccines, minerals, or lifestyle changes.
Diagnostic trials: Conducted to find better tests or
procedures for diagnosing a particular disease or condition.
Screening trials: Test the best way to detect certain
diseases or health conditions.
Quality of Life: Trials (or Supportive Care trials) explore
ways to improve comfort and the quality of life for individuals
with a chronic illness.
Study design of a Clinical Trial
 Title: Reflects the main research interest.
 Background/Rational of study: Importance of the study
and previous study results will be explained.
 Study Objectives:
 Primary objective (s): Focuses on the core research
question (s)
 Secondary Objectives: Focuses on the secondary/ optional
research questions
 Investigational Plan:
 Variable (parameter) selections to achieve the research
objectives. Avoid selection of unnecessary variables
 Overall study design and plan description: Brief
description of design and assessments.
Study design of a Clinical Trial


Selection of Study Population:
 Inclusion criteria: A set of conditions to include a subject in the
study. E.g. a adult study will include subjects only of age 18 or
more.
 Exclusion criteria: A set of conditions under which a subject
(met inclusion criteria) will be excluded from the study. E.g.
protocol violations, Non-compliance of the treatment etc.
Sample size calculation: Based on the effect size and the statistical
power needed to test of main research question. Here are some
useful websites for power and sample size calculation
compare means, compare proportions, population survey

Experimental Design, survival analysis

Regression/multiple regression
http://www.danielsoper.com/statcalc/calc01.aspx
http://department.obg.cuhk.edu.hk/researchsupport/Sample_size_EstM
ean.asp
http://hedwig.mgh.harvard.edu/sample_size/size.html
Study design of a Clinical Trial
 Description of the treatment groups/ treatment
administration/ treatment period
 Randomization of the treatment to the subjects
 Detailed descriptions of assessment/collection of all
parameters/variables including sign and symptoms (adverse
events)
 Statistical Methods: Detailed descriptions of the statistical
analyses of all variables in the study. A typical clinical trial
may include Hypothesis and Decision rules
 Rules for handling missing values
 Interim/Final analysis
 Subjects Disposition/summaries including the summary of
the reasons of early termination
Study design of a Clinical Trial
 Statistical Methods (continued)
 Disease Diagnosis/History: Usually descriptive
statistics is enough
 Summary of Medical/Surgical history
 Demographics (age, sex, race, BMI, height, weight,
etc.) and baseline characteristics : Usually
descriptive statistics are enough but these variables
are often used as covariates in efficacy analysis.
 Efficacy analysis: Needs some reasonable statistical
analysis to justify the research goal. Researchers
sometimes perform analysis on the change from
baseline and percent change from baseline values
instead of the observed values.
Study design of a Clinical Trial
 Statistical Methods (continued)
 Safety analysis (if subjects receives medication): Vital
signs (temperature, blood pressures, respiration, pulse
etc.), ECG/MRI results, Laboratory parameters, Physical
exams, Adverse Events, pregnancy tests etc.)- Usually
summary of the observed values and change from
baseline values are provided.
 Prior/concomitant medications: Summary of the all
medications taken during the study or just immediate
prior to study ( usually not more than one month) are
provided.
 Quality of life measurements: Both summary statistics
and reasonable statistical analysis are required.
 Pharmacokinetic (PK) & pharmacodynamic (PD)
parameters: Summary statistics is enough for most cases.
Study design of a Clinical Trial

Ethics:




Independent Ethics committee (IEC) or Institutional Review Board
(IRB)
Ethical conduct of the study: Guidelines of Food and Drug
Administration (FDA) and International Conference on
Harmonization (ICH) for good clinical practices and maintaining the
quality of research.
Patient information and consent: A document that describes the
rights and risks of the study participants, and includes details about
the study, such as its purpose, duration, required procedures, and
key contacts. The participant then decides whether or not to sign
the document.
Data collection and management



Storage security
Protection from data loss
Checking inconsistency of the data
Experimental Design for
Microarray Experiments
Suzanne McCahan, Ph.D.
Molecular Biologist
Microarray Research
 Should be Hypothesis Driven
 Test a specific statement
 Ask a specific question
 Involves data mining
 Often generates new hypotheses
Microarrays can measure…
 Gene Expression
 Chromatin Structure
 Methylation of Cytosine
 Histone Binding
 Array Comparative Genomic
Hybridization (aCGH)
 Amplification of Chromosomal Regions
 Deletion of Chromosomal Regions
General Background
 The application and type of array to
be used determine what should be
considered when designing
microarray experiments.
 A general understanding of
microarrays is needed.
General Characteristics of Microarrays
 Microarrays are small.
 This is an picture of an
Affymetrix GeneChip.
Image courtesy of Affymetrix.
Microarrays are comprised of DNA probes
 Probes are attached to (or
synthesized on) a surface.
 Oligos - 25 bp
 Oligos – 50-70 bp
 Cloned or Amplified DNA
 PCR – 500 bp
 BAC (Bacterial Artificial
Chromosome) – 300kb
Image courtesy of Affymetrix.
Hybridization
 Strands represent
 A probe on an
array
 Labeled DNA or
RNA from a sample
(This is also
referred to as the
‘target’.)
Image courtesy of Affymetrix.
C
T
A
A
G
A
G
C
G
A
T
T
C
T
C
G
C:
T :
A:
A:
G:
A:
G:
C:
G
A
T
T
C
T
C
G
Hybridization
No fluorescence
where labeled DNA
(or RNA) does NOT
hybridize to probe.
Fluorescence
where labeled DNA
(or RNA) hybridizes
to probe.
Images courtesy of Affymetrix.
DNA Microarrays
 There are many probes on a single
microarray.
 Amount of target is relative to the
intensity of fluorescent signal.
Image courtesy of Affymetrix.
Numbers of Probes on Microarrays
 Gene Expression
 Affymentrix Rat GeneChip has ~300,00 probes
representing ~15,000 genes
 Chromatin Structure
 Agilent Mouse CpG Island Array has ~100,000
probes
 aCGH (Amplification/Deletion)
 NimbleGen Human X Chromosome Tiling Array
~385,000 probes
Keep comparisons simple

Two well defined groups is best, although more can be
used.



Normal vs control
Untreated vs treated (one drug)
Less complex samples are better than complex ones.


Cell lines are the least complex
Blood



RBC should be removed, hemoglobin mRNA and protein can
interfere
Mononuclear cells are better than total white cells
Solid tissue


Tumor only, no contaminating normal tissue
Muscle only, no contaminating fat
Decrease Variability
 Samples should be as much the same as possible
 If from patients




Exact same tissue
Strict criteria for diagnosis
Only meds to be studied
Same pubertal stage
 Handled in a similar manner (immediately on ice)
 Same quality of starting material (RNA or DNA)
 Hybridization, washing and scanning should be done
by a single person at a single location.
How many samples should be
included in a study?
 Many ‘tests’ are done on a single sample.
 Each hybridization is expensive. This usually limits
the number of samples that can be included in an
experiment.
 With the usual budget, it is not feasible to use
standard statistical tools to determine the number
of samples to be included in a study and analyze
the data.
How many samples should be
included in a study?
 The best way to determine sample size is to
do a pilot study to obtain data from a
particular experimental system and do a
power analysis taking the challenges of
microarrays into consideration.
 A few publications assess sample size with
public gene expression microarray data sets.
The results vary with dataset. One estimation
for sample size was 10-12 per group.
 When a pilot study is not feasible, a general
rule of thumb is 5 – 10 samples per group.
1-Color Hybridizations
 Common format for gene expression arrays
 RNA or DNA from each sample is hybridized
to a single array
 If there are two groups (control and
treated) with 10 samples each, then
 A total of 20 samples will be used
 A total of 20 arrays will be used
2-Color Hybridizations
 Format for some gene expression, some
aCGH, and all chromatin structure arrays
 RNA or DNA from two samples
simultaneously hybridized to a single array
 One sample is experimental
 Other sample is control
 Each of the two samples is labeled with a
different fluorochrome.
 One array is needed for each pair of
samples
aCGH Arrays –
Detection of Amplification/Deletion
 Most platforms for aCGH require 2-color
hybridizations
 Tumor Studies
 Hybridize labeled DNA from tumor and normal
tissue from a single subject (patient/animal)
together.
 Genetic Studies
 Hybridize labeled DNA from control subject with
DNA from diseased subject
 The same control should be used with all
diseased subjects.
 The control could be DNA from a single
individual or from a single pool of individuals
Confirmation of results is necessary
 Since there is such disparity in the
number of samples examined and the
number of tests performed (e.g. level of
transcripts measured) microarray
experiments results must be confirmed.
Methods for confirmation of results
 Additional samples that were not examined on
the microarray should be used.
 Quantitative PCR methods are often performed.
 Confirmation usually involves a small number
of genes (or chromosomal regions, depending
on the type of array that was used).
 The combination of a larger sample size and
small number of tests allows standard
statistical methods to be used.
Example of a microarry experiment
 Modeled after:
Insight into Pathogenesis of
Antibiotic-Resistant Lyme Arthritis
through Gene Expression Profiling
 AnneMare Brescia, MD
 Principal Investigator
Hypothesis
There are differences in gene expression
of synovial fibroblasts from individuals
with acute Lyme synovitis and chronic
Lyme synovitis and these differences
allow for the perpetuation of
inflammation in chronic Lyme synovitis.
Biological material – Cell lines
 Collect synovial fluid
 Site of disease activity
 Prospectively – don’t know whether the case is
acute or chronic Lyme disease at time of
collection
 Culture cells from fluid




Primary cells
Adherent cells are selected
Consistent phenotype - no monocytes
Harvest while cells are dividing
 At same passage
 Before reaching confluence
Synovial fluid from 3 groups
 Control
 Injured joints
 Acute Lyme Disease
 Lyme synovitis resolved within 2 months
of initiation of antibiotic therapy
 Chronic Lyme Disease
 Lyme synovitis persisted for six or more
months despite antibiotic therapy
Experimental Details
 Uniform samples
 Cell line probably representing single cell
type
 Affymetrix GeneChip
 Single color hybridization
 Human gene expression arrays – 15 chips
 Biological replicates
 5 samples per group
Overview of Analysis
 Determine differential expression
 Confirm results with real-time RT-PCR
 Determine functional categories that are
represented by differentially expressed genes
 Replication?
 Recruitment and/or activation of immune system?
 Unexpected results can generate additional
hypotheses
The Future
New technologies may replace microarrays
 High through-put sequencing is on the
horizon




Less expensive
Faster
Short sequences (25 nt – 400 nt)
Presents new computing challenges
 Experimental design will need to be
adjusted