Cum. Survival - Mouse Genome Informatics

Download Report

Transcript Cum. Survival - Mouse Genome Informatics

Functional Genomics
http://www.ruf.rice.edu/~metabol/images/genotype.jpg
Winter/Spring 2010
Carol Bult, Ph.D.
Course coordinator
[email protected]
The Jackson Laboratory
Keith Hutchison, Ph.D.
Course co-coordinator
[email protected]
University of Maine
What is Functional Genomics?
• A field of molecular genetics that uses
genome-wide, high-throughput
measurement technologies to understanding
the relationships between genotype and
phenotype
– Genomics, epigenomics, transcriptomics, proteomics
– Computational genomics (data mining)
– Transgenics, targeted mutations, etc.
http://en.wikipedia.org/wiki/Functional_genomics
What topics will this course cover?
• Primary focus:
– Transcriptional profiling using microarrays
– Microarray data analysis
• Use of the R statistical programming
language/environment
• Other topics:
–
–
–
–
Genome structure and sequence variation
Epigenomics
Bio-ontologies
Proteomics
How will this course be structured?
• Lectures and readings assigned by instructors
• Assignments and discussion
• Student project
– Choose a microarray data set to analyze from the Gene
Expression Omnibus (GEO) resource at NCBI
– Do some background research on the data set
– Perform an analysis of the data
– Write up the analysis in the format of a scientific
manuscript as if you were submitting the manuscript to
PLOS Computational Biology
• http://www.ploscompbiol.org/home.action
– Oral presentation on the project
• 15 minutes
• Scheduled for April 13-22
Who are the instructors?
• Carol Bult (JAX), course coordinator
– Microarrays, Using R
• Keith Hutchison (UM), co-coordinator
– Genome structure/variation
• Anne Peaston (JAX)
– epigenomics
• Doug Hinerfeld (JAX)
– next generation sequencing and proteomics
• Judith Blake (JAX)
– bio-ontologies
• Matt Hibbs (JAX)
– mining expression data
• Joel Graber (JAX)
– RNA processing
“In the event of disruption of normal
classroom activities due to an
H1N1 swine flu outbreak, the format for
this course may be modified to enable
completion of the course. In that event,
you will be provided an addendum to
the syllabus that will supersede this
version.”
What resources will be used for this course?
• R Project for Statistical Computing
– http://www.r-project.org/
• Gene Expression Omnibus (GEO) @ NCBI
– http://www.ncbi.nlm.nih.gov/geo/
• Gene Ontology web site
– http://www.geneontology.org/
For next time
• Download and install R on your computer
– http://www.r-project.org/
– You might find the following link to Dr. Karl Broman’s into to R
useful:
• http://www.biostat.wisc.edu/~kbroman/Rintro/
• Keith Hutchison will lecture on
– Genome Structure/Sequence Variation
Measuring Gene Expression
Idea: measure the amount of mRNA to see which genes are
being expressed in (used by) the cell. Measuring protein
might be more direct, but is currently harder.
Central Assumption of Gene
Expression Microarrays
• The level of a given mRNA is positively
correlated with the expression of the
associated protein.
– Higher mRNA levels mean higher protein
expression, lower mRNA means lower protein
expression
• Other factors:
– Protein degradation, mRNA degradation,
polyadenylation, codon preference, translation
rates, alternative splicing, translation lag…
Principal Uses of Microarrays
• Genome-scale gene expression analysis
– Differential gene expression between two (or
more) sample types
– Responses to environmental factors
– Disease processes (e.g. cancer)
– Effects of drugs
– Identification of genes associated with clinical
outcomes (e.g. survival)
Biological question
Differentially expressed genes
Sample class prediction etc.
Experimental design
Microarray experiment
Image analysis
Normalization
Estimation
Testing
Clustering
Biological verification
and interpretation
Discrimination
Samples
Microarray example: Biomarker
identification - lung cancer
Garber, Troyanskaya et al. Diversity of gene expression in adenocarcinoma of the
lung. PNAS 2001, 98(24):13784-9.
Data partitioning clinically important:
Patient survival for lung cancer subgroups
1
Cum. Survival (Group 1)
Cum. Survival (Group 2)
Cum. Survival
.8
Cum. Survival (Group 3)
.6
p = 0.002
for Gr. 1 vs. Gr.
3
.4
.2
0
0
10
20
30
40
50
60
Time (months)
Garber, Troyanskaya et al. Diversity of gene expression in adenocarcinoma of the
lung. PNAS 2001, 98(24):13784-9.
Technology basics
• Microarrays are composed of short, specific
DNA sequences attached to a glass or silicon
slide at high density
• A microarray works by exploiting the ability
of an mRNA molecule to bind specifically to,
or hybridize, the DNA template from which it
originated
• RNA or DNA from the sample of interest is
fluorescently-labeled so that relative or
absolute abundances can be quantitatively
measured
Two color vs single color
Bakel and Holstege. 2007. http://www.cell-press.com/misc/page?page=ETBR
Other applications of microarray
technology
(besides measuring gene expression)
•
•
•
•
•
DNA copy number analysis
SNP analysis
chIP-chip (interaction data)
Competitive growth assays
…
Major technologies
• cDNA probes (> 200 nt), usually
produced by PCR, attached to either
nylon or glass supports
• Oligonucleotides (25-80 nt) attached to
glass support
• Oligonucleotides (25-30 nt) synthesized
in situ on silica wafers (Affymetrix)
• Probes attached to tagged beads
cDNA Microarray Design
• Probe selection
– Non-redundant set of probes
– Includes genes of interest to project
– Corresponds to physically available clones
• Chip layout
– Grouping of probes by function
– Correspondence between wells in
microtiter plates and spots on the chip
Building the chip
Ngai Lab arrayer , UC Berkeley
Print-tip head
http://transcriptome.ens.fr/sgdb/presentation/principle.php
Example dual channel cDNA array results
Affymetrix GeneChips
• Probes are oligos synthesized in situ using
a photolithographic approach
• Typically there are multiple oligos per
cDNA, plus an equal number of negative
controls
• The apparatus requires a fluidics station
for hybridization and a special scanner
• Only a single fluorochrome is used per
hybridization
Affy
There may be 5,000-100,000 probe sets per chip
A probe set = 11-20 PM, MM pairs
http://www.weizmann.ac.il/home/ligivol/pictures/system.jpg
Interpreting Affymetrix Output
Perfect Match/Mismatch Strategy
• Each probe designed to be perfectly complementary
to a target sequence, a partner probe is generated
that is identical except for a single base mismatch in
its center.
• These probe pairs, called the Perfect Match probe
(PM) and the Mismatch probe (MM), allow the
quantitation and subtraction of signals caused by nonspecific cross-hybridization.
• The difference in hybridization signals between the
partners serve as indicators of specific target
abundance