Dvelopment and Differentiation Data Set Ivanova et al

Download Report

Transcript Dvelopment and Differentiation Data Set Ivanova et al

Development and Differentiation:
The Princeton Stem Cell Data Set
Chris Stoeckert
Center for Bioinformatics, University of
Pennsylvania
GPBA Workshop October 13, 2003
Science 298:601-604, 2002
Relevant Analysis Points
• Affymetrix
– Single channel
– Probe sets & MAS 4.0
• Biology-based classification
– Organize genes into clusters
• Comparison of gene lists
– Mostly looking at overlap
– Significance of overlap
• RT-PCR validation
– Independent corroboration
Affymetrix Probe Sets & MAS 4
• Probe sets
– Targeted region of gene covered with 16 overlapping 25
nt oligos. (PM)
– Each is matched with an oligo with a mismatch in the
middle. Used to subtract out non-specific signal. (MM)
• MAS 4: P/A calls (Present/Absent/Marginal)
– Proprietary heuristic
– Replaced by a statistical test in MAS 5.
• MAS 4: AvgDiff (expression level estimator)
– Average difference of PM-MM. Can be negative!
– Replaced in MAS 5.
• MAS 4: logFC (log base 2 fold change)
– Ratio of a sample to a reference
– Normalize by global scaling (multiplying with a
constant based on trimmed mean intensity)
How Affymetrix data was used in this study
• P/A calls: use replicates to identify expressed
genes
– eliminate from consideration genes that are not
expressed in any sample
– Compare results with EST-based methods
• AvgDiff: means of replicates used for fold change
determinations.
• logFC: eliminate from consideration genes that
don’t vary. Use for classification.
– logFC relative to bone marrow mature blood cells
– Must be > 2X difference between highest and lowest
signals
– Fold change must be greater than the noise
Reality Check
From about 36,000 probe sets on mouse U74 v2 set
(A, B, C), ended up with ~ 4,290 for fetal and adult
HSCs.
Biologically-based classification
• Created virtual genes for bone marrow
(~50) and fetal (14) with ratios relative to
mature blood cells expected for different
hematopoietic stages. (vectors)
• Associated real genes with these based on
closest correlation (Pearson).
• Combined these to create 7 clusters
Generating Clusters
Vector for a HSC gene
logFC
2
1
0
-1
-2
LT-HSC ST-HSC LMP MBC
Vectors from Affy data
Adult and fetal HSC comparisons
Mouse and human HSC common genes
Likelihood of seeing mouse-human overlap
From Mathematica 4.1:
“The hypergeometric distribution HypergeometricDistribution[n, n_succ, n_tot] is used
in place of the binomial distribution for experiments in which the a trials correspond to
sampling without replacement from a population of size n_tot with n_succ potential
successes.”
Comparison of different stem cell types
Science 298:597-600, 2002
How many of the stem cell genes
were common between the two
studies?
• What was in common experimentally?
• What was different:
– Experimentally?
– Computationally?