Data Processing Technologies for DNA Microarray

Download Report

Transcript Data Processing Technologies for DNA Microarray

Data Processing Technologies
for DNA Microarray
Nini Rao
School of Life Science And Technology
UESTC
14/11/2004
• Introduction
• The Applications of SVD Technology
• The Applications of NMF Technology
• Summarization
Introduction
• 1. Gene and Genomes
Gene ----The basic unit of genetic function
Gene Expression ----The process by which
genetic information at the DNA level is
converted into functional proteins.
Introduction
Genome Structure ---- each organism
contains a unique genomic sequence
with a unique structure.
Gene structure
Genome Data with unknown biological
meanings exponentially increase.
There are needs for mining these data.
Analysis of these new data requires
mathematical tools that are adaptable to
the large quantities of data, while reducing
the complexity of the data to make them
comprehensible.
2. A Microarray
A small analytical device.
That allows genomic exploration with
speed and precision unprecedented in
the history of biology.
This technology was presented in 1990s.
3. Microarray Analysis
The process of using microarrays for scientific
exploration.
Massive Technologies for microarray analysis have
been adopted since the early 1990s.
4. Type of Microarray
5. The Roles of Microarray
To monitor gene expression levels on a
genomic scale
To enhance fundamental understanding of
life on the molecular level
regulation of gene expression
gene function
cellular mechanisms
medical diagnosis, treatment,
drug design
The microarray data form a matrix
Applications of SVD
Mathematical definition of the SVD
U is an m x n matrix
S is an n x n diagonal matrix
VT is also an n x n matrix
One important result of the SVD
of X
• X(l) is the closest rank-l matrix to X.
• The term “closest” means that X(l)
minimizes the sum of the squares of
the difference of the elements of X
and X(l)
∑ij|xij – x(l)ij|2=min
SVD analysis of gene expression
data
The results for Elutriation Dataset
Pattern Inference
The result analysis for Pattern
Inference
• (a) Raster display of v’ , the expression of
14 eigengenes in 14 arrays.
• (b) Bar chart of the fractions of
eigenexpression
• (c) Line-joined graphs of the expression
levels of r1 (red) and r2 (blue) in the 14
arrays fit dashed graphs of normalized
sine(red) and osine(blue) of period T
=390 min and phase = 2*3.14/13,
respectively.
Data Sorting
The results analysis for data sorting
Fig.3.Genes sorted by relative correlation with r1
and r2 of normalized elutriation.
(a)Normalized elutriation expression of the
sorted 5,981 genes in the 14 arrays, showing
traveling wave of expression.
(b) Eigenarrays expression; the expression of a1
and a2, the eigenarrays corresponding to r1
and r2, displays the sorting.
(c)Expression levels of a1(red) and a2(green) fit
normalized sine and cosine functions of period
Z=N-1= 5,980 and phase Q=2*3.14/13
(blue), respectively.
Other Applications for SVD
• Missing data
• Comparison between two genomic
sequences
The Applications of NMF
Mathematical definition of the NMF
V (nm) = W (nr) . H (rm)
In general, (n+m)r < nm.
It can be used to extract the
features that are hidden in dataset.
Comparison with SVD
The results for Elutriation Dataset
The results for a - factor Dataset
Summarization
1. SVD:Normalization 。
no data limitation
NMF:No Normalization
Positive data
2. SVD: Missing data, Cluster, Pattern inference,
weak pattern extraction, Comparison
NMF: Pattern inference, Cluster, Finding
similarity
3. ICA is used to mining DNA microarray data.
Thanks a lot!