Gene Expression Deconvolution with Single
Download
Report
Transcript Gene Expression Deconvolution with Single
Gene Expression Deconvolution
with Single-cell Data
JAMES LINDSAY1
CAROLINE JAKUBA2
ION MANDOIU1
CRAIG NELSON2
UNIVERSITY OF CONNECTICUT
1DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
2DEPARTMENT OF MOLECULAR AND CELL BIOLOGY
Mouse Embryo
ANTERIOR / HEAD
Somites
Node
Primitive streak
POSTERIOR / TAIL
Unknown Mesoderm Progenitor
• What is the expression
profile of the progenitor
cell type?
NSB=node-streak border;
PSM=presomitic mesoderm; S=somite;
NT=neural tube/neurectoderm;
EN=endoderm
Characterizing Cell-types
• Goal: Whole transcriptome
expression profiles of individual
cell-types
• Technically challenging to measure
whole transcriptome expression
from single-cells
• Approach: Computational
Deconvolution of cell mixtures
• Assisted by single-cell qPCR
expression data for a small number
of genes
Modeling Cell Mixtures
Mixtures (X) are a linear combination of signature matrix (S) and
concentration matrix (C)
𝑋𝑚 𝑥 𝑛 = 𝑆𝑚 𝑥 𝑘 ∙ 𝐶𝑘 𝑥 𝑛
cell types
mixtures
cell types
genes
genes
mixtures
Previous Work
1. Coupled Deconvolution
Given: X, Infer: S, C
•
•
•
NMF
Minimum polytope
Repsilber, BMC Bioinformatics, 2010
Schwartz, BMC Bioinformatics, 2010
2. Estimation of Mixing Proportions
Given: X, S Infer: C
•
•
•
Quadratic Prog
LDA
Gong, PLoS One, 2012
Qiao, PLoS Comp Bio, 2o12
3. Estimation of Expression Signatures
Given: X, C Infer: S
•
•
csSAM
Shen-Orr, Nature Brief Com, 2010
Single-cell Assisted Deconvolution
Given: X and single-cells qPCR data
Infer: S, C
Approach:
1. Identify cell-types and estimate reduced signature
matrix 𝑆 using single-cells qPCR data
•
•
Outlier removal
K-means clustering followed by averaging
2. Estimate mixing proportions C using 𝑆
•
Quadratic programming, 1 mixture at a time
3. Estimate full expression signature matrix S using C
•
Quadratic programming , 1 gene at a time
Step 1: Outlier Removal + Clustering
Remove cells that have maximum Pearson
correlation to other cells below .95
unfiltered
filtered
Step 2: Estimate Mixture Proportions
For a given mixture i:
min( 𝑆𝑐 − 𝑥
2
), 𝑠. 𝑡.
𝑐=1
𝑐𝑙 ≥ 0 ∀𝑙 = 0 … 𝑘
𝑥 = 𝑋𝑗,𝑖 ∀ 𝑗 = 1 … 𝑚
𝑐 = 𝐶𝑙,𝑖 ∀ 𝑙 = 1 … 𝑘
Step 3: Estimating Full Expression Signatures
cell types
mixtures
cell types
genes
genes
mixtures
C: known from step 2
x: observed signals from new gene
s: new gene to estimate signatures
Now solve:
min( 𝑠𝐶 − 𝑥
2)
Experimental Design
Single Cell Profiles
•
92 profiles
•
31 genes
Simulated Concentrations
•
Sample uniformly at random [0,1]
•
Scale column sum to 1.
Actual Mixtures
•
12 mixtures
•
31 genes
Dimensions
•
k=3
•
m = 31
•
n = 92, 12
•
# mixtures = {10…300}
Simulated Mixtures
•
Choose single-cells randomly with
replacement from each cluster
•
Sum to generate mixture
Data Processing
RT-qPCR
• CT values are the cycle in which gene was detected
• Relative Normalization to house-keeping genes
• HouseKeeping genes
• gapdh, bactin1
• geometric mean
• Vandesompele, 2002
• dCT(x) = geometric mean – CT(x)
• expression(x) = 2^dCT(x)
Accuracy of Inferred Mixing Proportions
Concentration Matrix: Concordance
predicted
Leave-one-out Accuracy of Inferred Gene
Expression Signatures
Future Work
• Apply gene signature estimation technique using
more genes in mixed samples
• Identify PSM-Pr Signature
• Confirm the anatomical location of the putative PSM-Pr cell
population through exhaustive ISH
Conclusion
Special Thanks to:
•
•
•
•
Ion Mandoiu
Craig Nelson
Caroline Jakuba
Mathew Gajdosik
[email protected]