Transcript poster-sbbq
Toward the Identification of a Gene Expression Framework in
Different Types of Tissues and Organisms
Saulo Augusto de Paula Pinto1, 2
{[email protected]}
1
Ortega
J. Miguel
{[email protected]}
2Instituto
1Laboratório
de Informática
PUC MINAS BARREIRO
de Biodados
Departamento de Bioquímica e Imunologia
Instituto de Ciências Biológicas – UFMG
1 Introduction
In order to identify a possible common framework of gene expression in samples of
gene expression data, 418 samples that compose 13 NCBI-GEO series generated on the
top of Affymetrix GeneChips platform and 31 SAGE Genie libraries were analyzed.
2 Methodology
Some results are shown for two data series: one of 36 human normal tissues samples
and one of 11 A. thaliana tissues (GEO accessions: GSE2361, GSE607).
It was found that the expression sorting is kept in such a way that weak framework
rate between a pair of samples can be used even to cluster a set of gene expression
data samples.
INPUT: a set of samples sorted by the gene expression such that
the most expressed gene (sequence) is in the position 0 and the
least expressed is in position N-1, where N is the number of
genes (sequences) in each sample.
OUTPUT: a list of pairs of genes (sequences) in which the first
member keeps its expression order relative to the second member
in every sample.
1) Chose a reference sample only to build pairs;
2) For each pair of genes [GC, GL] in the reference sample where
GC is more expressed than GL do
2.1) If GC is more expressed than GL in every sample
then include the pair [GC, GL] in the weak framework;
else discard the pair.
An algorithm to find out a weak framework: one that is composed by pairs of genes in
which the first element of the pair is always more expressed than the second one in
every analyzed sample.
3 Results
Every sample from different organisms follows a exponential-like decay as the
expression values diminish, disregarding the technology, the number of distinct
sequences in the samples, the organism or tissue kind.
This finding suggests that the sorting of gene expression and not only the genes
expressed has a determinant role in the tissues or organism character.
Highly physiologically-related tissue pairs like [amygdala, hippocampus] and
[prostate, bladder] or sample replicates like [leaf_gh1, leaf_gh2] have as high as
94.7%, 89.7%, and 94.12% of their sequences pairs conserved, respectively.
Amygdala
Caudate Nucleus
hippocampus
hippocampus
94.7 S Muscle
91.2 Cerebellum
Small Intestine
Liver
Amygdala
Amygdala
Corpus
hippocampus
Caudate Nucleus
Prostate
Brain
Brain
Brain
Ovary
Thalamus
Caudate Nucleus
Spinal Cord
Thalamus
Thalamus
Bladder
Thalamus
Amygdala
hippocampus
Bladder
91.1
91.0
90.7
90.4
89.8
89.7
89.6
89.5
89.1
88.5
Small Intestine
S Muscle
Heart
Pancreas
Heart
Caudate Nucleus
S Muscle
Salivary Gland
Small Intestine
Bone Marrow
Fetal Brain
Caudate Nucleus
Salivary Gland
Caudate Nucleus
Liver
Fetal Liver
Liver
Caudate Nucleus
Caudate Nucleus
Thalamus
21.5
21.3
20.
8
20.4
20.1
19.6
19.3
19.1
18.9
18.8
18.6
18.5
Small Intestine
Thalamus
Bladder
Caudate Nucleus
Caudate Nucleus
Cerebellum
Bladder
Bladder
Colon
Spinal Cord
Breast
Spinal Cord
Corpus
Brain
Lung
Trachea
88.2
88.0
87.9
87.7
87.7
87.5
87.3
87.2
Salivary Gland
Fetal Brain
Salivary Gland
Caudate Nucleus
Bone Marrow
S Muscle
Heart
Bone Marrow
Liver
Liver
Bone Marrow
Liver
Liver
Bone Marrow
Bone Marrow
Caudate Nucleus
18.1
17.7
17.4
16.4
15.0
14.1
12.8
12.1
On the other side, in H. sapiens pairs composed of different tissues like those
involving bone marrow, liver and the central nervous system tissues keep expression
sorting poorly (< 22%).
4 Discussion
Considering all 36 H. sapiens tissues together, 28.5% of the 3,064,841 possible pairs
were conserved. For A. thaliana stem and flower conserved least (< 47%)
and the 11 samples conserved 55.45% (22892007 of 41286376), as expected to a less
complex organism with less diversity of tissues.
5 Conclusions
The results point to the existence of a gene expression framework of genes that keep
their expression sorting through a vast different set of tissues.
Part of a weak framework found for 36 human normal tissues samples considering only
the 20 most expressed sequences (MESs) from each sample. A directed edge indicates
the gene that is most expressed (source) and the least expressed (target).
Support: FAPEMIG