Arrowsmith extensions to bioinformatics
Download
Report
Transcript Arrowsmith extensions to bioinformatics
Arrowsmith extensions to
bio-informatics
Vetle I. Torvik
Discovering new gene sequences
Start with a novel DNA sequence
find overlapping sequences within the expressed
sequence tag (EST) database
find others that overlap with that one, until one has
identified an entire new full-length gene
ATGATAGGAGA
GGAGAGCTGAGA
TGAGATGCGCTG
CGCTGATACTAGA
CTAGATGATAGAGATGCC
ATGATAGGAGAGCTGAGATGCGCTGATACTAGATGATAGAGATGCC
The Arrowsmith approach applied to
nucleotide or protein sequences
begin with two different sets A and C of sequences
that do not overlap
search for sequences B in the database that overlap
with one or more sequences in both A and C
AB1
ATGCTCTCGCGCTACGACTAGCATACTG
CCTGATCGCTACTACTAGCTGA
CTCGATGAGCGATGATCGCTAGCTATGGG
GTGAGGATCGCGATGATGATG
B1
ACTGATCGCTAGCTATGA
BC1
ATCGACAAGCTATGTGCAACTG
TCTCGCTACTAGATCACTAGCTTA
ATCTGATACTAGCTACGACTAGC
Linking to microarray
experimental data
A = set of microarray experiments that measured reelin
C = set of microarray experiments that measured tooth
development
A and C might be in the same or different databases
B-terms = genes whose expression was correlated with
reelin in some system, and that were expressed during
tooth developing on the other
If reelin regulates certain genes that have roles during tooth
development, one may hypothesize a role for reelin in
tooth development as well, even if none of the tooth
microarray studies had examined reelin explicitly
This might stimulate someone to test...
if reelin is expressed at specific times and places
within the developing toothbud
if reelin actively regulates the genes on the B-list
if tooth development is abnormal in the reeler
mouse that genetically lacks reelin
Linking PubMed to bioinformatics databases
B-gene
list
Microarray
Microarray
gene A
gene C
PubMed
A-literature
PubMed
C-literature
Other databases
Genomic
Quantitative trait loci (QTL)
Atlases
Images
ETC
Using the literature to link genes
If genes A strongly co-occurs with gene B in the
literature due to a biologically significant
relationship, and
gene B and C similarly co-occur,
Then genes A and C are likely to be biologically
related as well
When A and C do not co-occur above the chance
level, then the relation between A and C may not
be previously known or documented
Special case of the Arrowsmith 1-node
search
Gene B
0.9
0.9
Gene C
Gene A
0.2