Arrowsmith extensions to bioinformatics

Download Report

Transcript Arrowsmith extensions to bioinformatics

Arrowsmith extensions to
bio-informatics
Vetle I. Torvik
Discovering new gene sequences
 Start with a novel DNA sequence
 find overlapping sequences within the expressed
sequence tag (EST) database
 find others that overlap with that one, until one has
identified an entire new full-length gene
ATGATAGGAGA
GGAGAGCTGAGA
TGAGATGCGCTG
CGCTGATACTAGA
CTAGATGATAGAGATGCC
ATGATAGGAGAGCTGAGATGCGCTGATACTAGATGATAGAGATGCC
The Arrowsmith approach applied to
nucleotide or protein sequences
 begin with two different sets A and C of sequences
that do not overlap
 search for sequences B in the database that overlap
with one or more sequences in both A and C
AB1
ATGCTCTCGCGCTACGACTAGCATACTG
CCTGATCGCTACTACTAGCTGA
CTCGATGAGCGATGATCGCTAGCTATGGG
GTGAGGATCGCGATGATGATG
B1
ACTGATCGCTAGCTATGA
BC1
ATCGACAAGCTATGTGCAACTG
TCTCGCTACTAGATCACTAGCTTA
ATCTGATACTAGCTACGACTAGC
Linking to microarray
experimental data
 A = set of microarray experiments that measured reelin
 C = set of microarray experiments that measured tooth
development
 A and C might be in the same or different databases
 B-terms = genes whose expression was correlated with
reelin in some system, and that were expressed during
tooth developing on the other
 If reelin regulates certain genes that have roles during tooth
development, one may hypothesize a role for reelin in
tooth development as well, even if none of the tooth
microarray studies had examined reelin explicitly
This might stimulate someone to test...
 if reelin is expressed at specific times and places
within the developing toothbud
 if reelin actively regulates the genes on the B-list
 if tooth development is abnormal in the reeler
mouse that genetically lacks reelin
Linking PubMed to bioinformatics databases
B-gene
list
Microarray
Microarray
gene A
gene C
PubMed
A-literature
PubMed
C-literature
Other databases
 Genomic
 Quantitative trait loci (QTL)
 Atlases
 Images
 ETC
Using the literature to link genes
 If genes A strongly co-occurs with gene B in the
literature due to a biologically significant
relationship, and
 gene B and C similarly co-occur,
 Then genes A and C are likely to be biologically
related as well
 When A and C do not co-occur above the chance
level, then the relation between A and C may not
be previously known or documented
 Special case of the Arrowsmith 1-node
search
Gene B
0.9
0.9
Gene C
Gene A
0.2