Mining and Visualization of Flow Cytometry Data

Transcript Mining and Visualization of Flow Cytometry Data

Mining and Visualization
of Flow Cytometry Data
A N G EL A CHI N
UN I V ERSITY OF HOUSTON R ES EARCH E X P ERI ENCE FOR UN DE RGRADUATES
JU LY 3 , 2 0 1 3
1
Contents
1. Introduction to Flow Cytometry
2. The Problem
3. Current Approaches & Results
4. Future Work
2
Flow Cytometry
MEDICAL TECHNIQUE USED FOR CELL COUNTING AND CELL
SORTING
3
How it Works
Picture from: Abcam
http://www.abcam.com/index.html?pageconfig=resource&rid=11446
4
Flow Cytometry Application
Determine whether a person has b-cell lymphoma
Based on the number of clusters that result from flow cytometry
• Two clusters : cancer patient
• Three clusters : healthy individual
5
Example: Flow Cytometry Results
Healthy Patient
Cancer Patient
6
Problems with Current Methods
The process for determining if there are two or three clusters is
manual
Doctors’ time could be better spent on other tasks
7
The Problem
CREATING AN AUTOMATED METHOD TO DETERMINING THE
NUMBER OF CLUSTERS
8
Past Approaches
Many ways to determine number of clusters
• Most need to know the number of clusters ahead of time
Most popular is k-means, but there are some problems
• Need to give the algorithm the number of clusters beforehand
• Has difficulty when clusters are close, different sizes, etc.
9
Further Defining the Problem
We want to be able to determine the number of clusters
when:
The distance between clusters is very small
The ratio of cluster sizes is large (100:1 to 1000:1)
We decided to further constrain the problem such that we could
determine:
1 cluster vs 2 clusters when the size ratio was up to 1000:1
10
Current Approaches &
Results
11
Two Approaches
Approach #1:
Transformation
Find the center of the data
Take each point and find its angle
from the horizontal line located at the
center (new x-value) and distance from
the center (new y-value)
Use transformed data to determine
number of clusters
Approach #2:
Testing Normal Fit
Project 2D data onto line to create 1D
data
Apply normal distribution fit
Compare the Bayesian Information
Criterion (BIC) of the fit to a cut-off
limit
If the BIC is above the limit, there are
two clusters; otherwise, there is one
12
Approach #1:
Transformation
13
Approach #1: Transformation
𝜋/2
𝜋
3𝜋/2
2𝜋
14
Approach #1: Transformation Process
𝜋/2
𝜋/2
𝜋
3𝜋/2
2𝜋
15
Approach #1: Transformation
𝜋/2
𝜋
3𝜋/2
2𝜋
16
Approach #2: Testing
Normal Fit
17
Approach #2: Testing Normal Fit
3 standard deviations apart, ratio 1:99
ONE CLUSTER BEST FITS
TWO CLUSTER BEST FITS
18
Approach #2:
Testing Normal
Fit
 Comparing BIC of the one
cluster versus two clusters
 All data was generated using
100000 points and the same
standard deviations
 The ratios between clusters
and distance between two
clusters (if applicable) was
varied
•
•
Ratios: 199:1 to 63:1
Distance: 1.5 to 5 Standard
Deviations apart
19
Approach #2:
Testing Normal
Fit
 Comparing BIC of the one
cluster versus two clusters
 All data was generated using
100000 points and the same
standard deviations
 The ratios between clusters
and distance between two
clusters (if applicable) was
varied
•
•
Ratios: 199:1 to 63:1
Distance: 1.5 to 5 Standard
Deviations apart
20
Future Work
21
Future Work
Approach #1:
Determine if there is a way to detect the second cluster in the
transformation
Approach #2:
Use real data to see if a cut-off can be determined
Overall:
After figuring out how to distinguish one and two clusters, extend the
method to two versus three clusters
22
Limitations
Assume the data will have Gaussian distribution
Number of clusters limited to two or three
23
Acknowledgements
I would like to thank my research advisor, Dr. Stephen Huang, and
Mitch Shih for their guidance on this project. I would also like to
thank the University of Houston Computer Science Department and
the National Science Foundation for providing me with the
opportunity to participate in the REU.
24

Mining and Visualization of Flow Cytometry Data

Transcript Mining and Visualization of Flow Cytometry Data

Directory