Transcript PPT
MAGE and the
Biospecimen Research
Database
Experiment Design and
other issues
Ian Fore, D.Phil
U.S. National Cancer Institute - Center for Bioinformatics
Incorporating material from NCI Office of Biorepositories and
Biospecimen Research
December 2006
January 24, 2008
Why?
•
•
•
View from the outside
Discussion of experimental factor
Want not to lose a key feature
• Identify some issues
• Expect many are already in hand
• State the use case
• So we don’t lose the feature
•
Describe how we are using experimental factor in the Biospecimen Research
Database
• Inspired by MAGE model!
• Vision for how it should interact with MAGE
Pre- and Post- Acquisition Variables
Impact Clinical and Research Outcomes
• Effects on Clinical Outcomes
• Potential for incorrect diagnosis
• Morphological/immunostaining artifact
• Skewed clinical chemistry results
• Potential for incorrect treatment
• Therapy linked to a diagnostic test on a biospecimen
(e.g., HER2 in breast cancer)
• Effects on Research Outcomes
• Irreproducible results
• Variations in gene expression data
• Variations in post-translational modification data
• Misinterpretation of artifacts as biomarkers
Variables of Biospecimen Acquisition
Pre-acquisition variables:
Antibiotics
Other drugs
Type of anesthesia
Duration of anesthesia
Arterial clamp time
Blood pressure variations
Intra-op blood loss
Intra-op blood administration
Intra-op fluid administration
Pre-existing medical conditions
Patient gender
Post-acquisition variables:
Time at room temperature
Temperature of room
Type of fixative
Time in fixative
Rate of freezing
Size of aliquots
Type of collection container
Biomolecule extraction method
Storage temperature
Storage duration
Storage in vacuum
Use case 1
•
Actor - An investigator who does not work in a microarray lab and who does
not understand the details of how microarray experiments are performed
•
•
Description:
The investigator wishes to understand what happens to gene expression
under particular experimental conditions. They are happy to make the
assumption that the experiments were done correctly and that appropriate QA
and/or peer review of that took place. They want to work with the “high level”
output from microarray experiments - experimental factors vs gene
expression.
Experiment design - example
•
•
•
•
Drug
• Control
• WY27127
• WY26382
Dose
• 10, 30, 100, 300 mg/kg
Time
• -10, -5, 0, 5, 10, 20, 30, 60, 120 mins
Rat
• 4 rats per drug treatment
Dose in protocol text - 200 mg/kg/day
•
•
•
•
•
•
•
•
•
<Protocol text="Rat treated q.d. by oral gavage with 200 mg/kg/day clofibrate
(formulated in 0.5 % methylcellulose (w/v) plus 0.1 % polysorbate 80 (v/v) in
distilled water). Animal sacrificed after five days, at which time a portion
of
liver was removed and flash frozen in liquid N2, then stored at -80 C. All
experimental procedures were approved by the Pharmacia Institutional Animal
Care and Use Committee and were performed in compliance with laws regarding
humane treatment of laboratory animals."
identifier="P-MEXP-357"
name="SAMPLETREATPRTCL357">
Dose in protocol text - 400 mg/kg/day
•
•
•
•
•
•
•
•
•
<Protocol text="Rat treated q.d. by oral gavage with 400 mg/kg/day clofibrate
(formulated in 0.5 % methylcellulose (w/v) plus 0.1 % polysorbate 80 (v/v) in
distilled water). Animal sacrificed after five days, at which time a portion
of
liver was removed and flash frozen in liquid N2, then stored at -80 C. All
experimental procedures were approved by the Pharmacia Institutional Animal
Care and Use Committee and were performed in compliance with laws regarding
humane treatment of laboratory animals."
identifier="P-MEXP-357"
name="SAMPLETREATPRTCL357">
Use case 2
•
•
Actor - A scientist who wants to analyze microarray data who only has
moderate understanding of statistical analysis techniques.
(different person than the previous use case)
•
Description - The scientist uses a statistical analysis tool which uses a wizard
like approach to help them understand how the data might be analyzed. The
microarray experiment data contains sufficient description for the statistical
tool to extract the design of the experiment in a way that it can suggest
appropriate ways to analyze the data.
•
Implementation notes:
• Need the factors
• And their nature in the design - i.e. whether “intended” or not
Experiment design - example
•
•
•
•
Drug - “intended”
• Control
• WY27127
• WY26382
Dose - “intended”
• 10, 30, 100, 300 mg/kg
Time - “intended”
• -10, -5, 0, 5, 10, 20, 30, 60, 120 mins
Rat
• 4 rats per drug treatment
Experiment design - example
•
•
•
•
Drug - “fixed effect”
• Control
• WY27127
• WY26382
Dose - “fixed effect”
• 10, 30, 100, 300 mg/kg
Time - “fixed effect”
• -10, -5, 0, 5, 10, 20, 30, 60, 120 mins
Rat- “random effect”
• 4 rats per drug treatment
Use case 3
•
•
Actor - A database or system that wishes to index and publish microarray data
This is not the same thing as a microarray data repository - it is a more topic
based database or datamart.
•
Description: The database wants to be able to automatically extract the
experiment design from the experimental data file
•
•
•
Example:
Biospecimen Research Database
http://brd.nci.nih.gov/BRN
Use case 4
•
Actor - A scientist investigating cancer or another disease
•
•
Description:
The scientist understands that certain pre-analytical factors influence gene expression
and that specimens must be collected in a way to remove those as variables from the
experiment. They know that sometimes they do not follow the protocol exactly and wish
to annotate their experiments with those variations from protocol.
They visit a site such as the BRD which provides them with structured tissue processing
protocols.
They are able to download electronic of these protocols in MAGE or MAGE like format.
The structured information therein can be used by programs they use to do their
experiments
and complete the “protocol execution” data specific to their experiment when they run
them.
•
•
•
•
Use case 5
•
Actor - A scientist investigating cancer or another disease
•
Description: The scientist knows that they were not able to control all factors in
the experiment - such as those performed in by the surgeon or anesthetist
during surgery. However, the BRD provides information about which genes
are affected by these uncontrolled factors. They download gene lists from the
BRD and remove these from consideration as factors relevant to cancer.
MAGE-ML overhead