Summer Internship Presentation - Vanderbilt University Medical

download report

Transcript Summer Internship Presentation - Vanderbilt University Medical

Vanderbilt University Medical
SRC Presentation
Vincent Kokouvi Agboto
Assistant Professor/Director of Biostatistics, Meharry
Medical College
Assistant Professor of Biostatistics, Vanderbilt
University Medical Center
Introduction to Experimental Designs
in Biological and Clinical Settings.
2. Examples of Classical Designs
3. Optimal Experimental Design
4. Other Designs Issues
5. Conclusion
1. Introduction
Experiment: Investigation in which
investigator applies some treatments to
experimental units and then observes
the effects of treatments on the
experimental units through
measurement of response (s).
1. Introduction
Treatment: Set of conditions applied to
experimental units in an experiment.
 Experimental Unit: Physical entity to
which a treatment is randomly assigned
and independently applied.
1. Introduction
Response variable: Characteristic of an
experimental unit that is measured after
treatment and analyzed to assess the
effects of treatments on experimental
 Observational Unit: Unit on which a
response variable is measured.
1. Introduction
Experimental design procedure:
Decision before data collection.
Basic idea: Appropriate selection of values of
control variables.
Three Fundamental of Experimental Design
Concepts: Randomization, Blocking,
Replication. (R. A. Fisher)
1. Introduction
Important stages of an Experimental
Research: Background of the
experiment; Choice of factors;
Reduction of error; Choice of model;
Design criterion and Size of the design;
Choice of an experimental design;
Conduct of the experiment and
Analysis of the data
1. Introduction
Classical (Standard) Designs
 Optimal Experimental Design: Only
alternative when the standard designs
do not provide us with adequate
2. Examples of Classical Designs
Example1: Soils Moisture and gene
Expression in maize seedlings.
 Example2: Drug and Feed
Consumption on Gene Expression in
 Example3: Treatments on Gene
Expression in dairy cattle.
Example 1
Experiment: Effect of three soil moisture levels on
gene expression in maize seedlings.
Total of 36 seedlings were grown in 12 pots with 3
seedlings per pot.
Three soil moisture levels (low, medium, high)
randomly assigned to the 12 pots.
After three weeks, RNA extracted from the above
ground tissues of each seedling.
Each of the 36 RNA samples was hybridized to a
microarray slide to measure gene expression.
Example 1 (continued)
Treatment: The three moisture levels
Experimental Unit: Moisture levels randomly
assigned to the pots  Pots: experimental units. A
pot consisting of 3 seedlings is one experimental
Observational units: Gene expression was measured
for each seedling  Seedlings: Observational units.
Response variable: Each probe on the microarray
slide provide one response variable.
This is the Standard Experimental Design (CRD).
Example 2
Experiment: Gauge the effects of a drug and feed
consumption on gene expression in rats.
A total of 40 rats were housed in individual cages.
Half of them  calorie-restricted diet (R); Another
half Provided with access to feeders that were full
so calories intake unrestricted (U).
Within each diet group, four doses of an
experimental drug (1, 2, 3, 4)  rats with 5 rats per
dose within each diet group.
Example 2 (continued)
At the conclusion of the study, gene expression was
measured for each rat using microarrays.
Example 2 (continued)
Treatment (factors): Diet and Drug.
Factor Diet (R, U); Factor Drug (1, 2, 3, 4)
Each combination of diet and drug: Treatment (R1,
R2, R3, R4, U1, U2, U3, U4).
Each rat: Experimental unit/Observational unit.
Response variable: Each probe on the microarray
This is a full factorial treatment design. It was
used because all possible combination of diet
and drug were considered.
Example 3
Experiment: Study the effects of 5 treatments
(A, B, C, D, E) on gene expression in dairy
A total of 25 GeneChips and a total of 25
cows, located on 5 farms with 5 cows on
each farm are available for the experiment.
Which of the following designs is better from
a statistical standpoint?
Example 3 (Continued)
Design 1: To reduce variability within treatment
groups, randomly assign the 5 treatments to the 5
farms so all 5 cows on any one farm receive the
same treatment. Measure gene expression using
one GeneChip for each cow.
Design 2: Randomly assign the 5 treatments to the 5
farms within each farm so that all 5 treatments are
represented on each farm. Measure gene
expression using one GeneChip for each cow.
Example 3 (continued)
Design 1
Farm 1: B B B B B
Farm 2: D D D D D
Farm 3: A A A A A
Farm 4: E E E E E
Farm 5: C C C C C
Design 2
Farm 1: A B E D C
Farm 2: E D A C B
Farm 3: C D E A B
Farm 4: A B E C D
Farm 5: C A D B E
Example 3 (continued)
Observation Units: Cows in both designs.
Experimental Units: Farms in Design 1 and Cows in
Design 2.
Design 2: a randomized complete block design
(RCBD) with a group of 5 cows on a farm serving as
a block of experimental units.
Design 1 has no replication because only 1
experimental unit for each treatment. Design 2 has 5
replications per treatment.
Design 3 (continued)
Design 2 is by far the better design.
We can compare treatments directly among
cows that share the same environment.
With Design 1, it is impossible to separate
difference in expression due to treatment
effects from differences in expression due to
farm effects.
3. Optimal Experimental Design
3.1. Motivation Example
3.2. Comments on Orthogonal Designs.
3.3. Some Examples of Non-Orthogonal
3.4. Optimal Designs
3.1. Motivating Example
Suppose that the yield is linearly
related to temperature whose range is
[50, 150]: Y= a + b X
 If we want conduct experiments at two
points, which of the following will we
choose: Design1 at 50 and 150?
Design2 at 70 and 130? Design3 at 90
and 110?
3.1. Motivating Example
What is the optimal design in this case?
 Better design among the three designs
3.1. Motivating Example
It is the design1 because it gives the
smallest confidence region for the
parameters (D-optimality) and also give
the smallest maximum variance for the
predicted responses (G-optimality)
3.2. Comments on orthogonal Designs
Pros (Many desirable properties)
- Easy to calculate
- Easy to interpret
- Maximum Precision (in some sense)
- Tabled designs widely available
3.2. Comments on Orthogonal Designs
Cons: Not applicable if
- Irregular design space
- Mixture experiments
- Sample size not power of 2
- Mixed qual and quant factors
- Fixed covariates
- Nonlinear models
2.3. Some Examples of NonOrthogonal Designs
16-run design with 8 two-level factors
with main effects and 6 interactions:
 12-run mixed level design with one 3
level factor and 9 two-level factors
2.4. Optimal Designs
Optimal Experimental Design (OED):
Standard alternative when classical designs
not applicable.
Choice of a particular experimental design:
Depends on the experimenter’s design
criterion (optimization problem).
OED: Reduce costs of experimentation by
allowing statistical models to be estimated
with fewer experimental runs; Evaluated
using statistical criteria.
3.4. Optimal Designs
Ynxp ~ N (X +  , 2I), Xnxp: design
matrix, : unknown px1 parameter
vector and 2: known
 y(xi) = f’(xi)  + i
 X=[f(x1), …, f(xn)]’
3.4. Optimal Designs
Design : Probability measure over a
compact region  with (xi) = i
  places weight (xi) on xi
 Problem: n(xi) is not necessary an
3.4. Optimal Designs
Approximate design:  = x1 x2… xn 
1 2…n
with  (dx) =1 and 0  i 1
 Exact design: n(xi) must be an integer
3.4. Optimal Designs
nM()=X’X= m(x)(dx)=  f(x) f’(x)
(dx) =  i f(xi)f(xi)’ : Information matrix
of 
 Optimality crietria: * = arg max (M())
3.5. Some Useful Criteria
D-Optimality: max |X’X|:
 A-Optimality: min{trace (X’X)-1}
 G-Optimality: min{max d(x)} where d(x)
= f’(x)(X’X)-1f(x)
 V-Optimality: min{average d(x)}
3.5. Some Useful Criteria
D and A-Optimality: Estimation based criteria.
G and V-Optimality: Prediction based criteria.
3.6. Algorithms for Optimal Designs
Development of efficient computing methods and
high power computer systems  Great interest in
algorithmic approaches.
In general: Difficult to find exact designs analytically.
Finding exact designs  Solving a large nonlinear
mixed integer programming problem.
In practice: Find designs close to the best design
locally optimal  introduction of exact design
3.6. Algorithms for Optimal Designs
Typical Exact Design Algorithm steps:
- Choose an initial feasible solution design
- Modify solution slightly, by exchanging a
point in the design for a point in the design
space .
3.6. Algorithms for Optimal Designs
Fedorov algorithm (Fedorov, 1969).
 Modified Fedorov algorithm(Johnson
and Nachtsheim, 1983).
 K-L exchange algorithm (Donev and
Atkinson, 1988).
 Coordinate exchange algorithm (Meyer
and Nachtsheim, 1995).
 Columnwise-Pairwise (CP) algorithm
(Wu and Li, 1999).
3.7. Software for the Computation of
Optimal Designs
4. Other Designs Issues
Supersaturated Designs
Bayesian Designs
Model Robust Designs
Model Discrimination Designs
5. Conclusion
All problems are different
Statistical knowledge will help improve the design.
Get involved with the statistician (biostatistician)
early in the process.
Collaborate closely with people who know the
background of the study.
Even the most sophisticated statistical analysis could
save do much to save a study based on a “bad
Agboto V. , 2006. Bayesian approaches to model robust and model
discrimination designs. Unpublished Ph.D. dissertation, School of
Statistics, University of Minnesota.
Agboto V, Nachtsheim C, Li W. Screening designs for model
discrimination. Journal of Statistical Planning and Inference,140:3,
766-780, 2010.
Atkinson, A.C & Donev, A.N. (1992): “Optimal Experimental Designs”.
Oxford Statistical Sciences Series:8, 1-328.
Chaloner, K. (1984). “Bayesian experimental design: A review”.
Statistical Science 10, 273-304.
Cook, R. D. & Nachtsheim, C. J. (1982). “A comparison of algorithms
for constructing exact D-opitmal designs”. Technometrics 22, 315-324.
Li, W. & Wu, C. F. J. (1997). “Columwise-pairwise algorithms with
applications to the construction of supersaturated designs”.
Technometrics 39, 171-179.