A central problem in bioinformatics
Download
Report
Transcript A central problem in bioinformatics
Bioinformatics
Ch1. Introduction
阮雪芬
2002, Oct 17
NTUST
www.ntut.edu.tw/~yukijuan/lectures/bioinfo/Oct17.ppt
Outline
A scenario
Life in space and time
Dogmas: central and peripheral
Observable and data archives
Traditional and Current Biology
Traditionally, biology has been an
observational science.
Now, biology has been converted into
deductive science.
The Data of Bioinformatics
Very very large amount
Nucleotide sequence databanks
contain 16 x 109 bases
The full three-dimensional
coordinates of proteins of average
length ~400 residues: 16000 entries
Not only are the individual databanks
large, but their sizes are increasing as
a very high rate.
GenBank
Goals
“Saw life clearly and saw it whole”
Understand integrative aspects of
the biology of organisms
To interrelate sequence, threedimensional structure, interactions,
and function of individual proteins,
nucleic acids and protein-nucleic acid
complexes
Goals
To deduce events in evolutionary
history.
To support application to medicine,
agriculture and other scientific fields.
A Scenario
Imagine a Crisis (1)
A new biological virus creates an
epidemic of fatal disease in humans
Laboratory scientists will
or animals
isolate its genetic materiala molecule of nucleic acid
and determine the
sequence.
Computer program will then take over
Imagine a Crisis (2)
Screening this new genome against a
data bank of all know genetic messages
Developing antiviral therapies: virus
contain protein molecules which are
suitable targets, for drugs that will
interfere with viral structure or function
Imagine a Crisis (3)
From the viral DNA sequences
Computer program
Protein sequence
Imagine a Crisis (4)
From amino acid sequences
Computer program
Three-dimensional structure
Homology Modelling
Data bank will be screened for related proteins of know structures
Computer program
Structure will be predicted
A
B
Ab initio
No related protein of known structure is found
Ab initio
Predicting the structure
Design Therapeutic Agents
Knowing the viral protein structure
Design therapeutic agents
Life in space and time
In Space
Biosphere
Ecosystem
Darwinian selection or genetic drift
Natural mutation
The recombination of
genes in sexual
reproduction
Direct gene transfer
The generation of variants
In Space
Ecosystem
Species
Cell
Nuclei, organelles and cytoskeleton
Molecules
In Time
A history of life
3.5 billion years
Dogmas: Central and
Peripheral
Central Dogmas
1957, Crick提出中心教條”DNA製造RNA,RNA
製造蛋白質”
中心教條大體上是對的,但也有些需要修正
有許多RNA病毒:RNADNARNAProtein
跳躍基因
真核細胞RNA需要經過剪接
不只蛋白質具酵素功能,某些RNA也具酵素功能
某些基因可經不同的轉錄起始點或不同的剪接方式,
製備出多種RNA ,而轉錄成功能不同的蛋白質
真核細胞RNA需要經過剪接
不只蛋白質具酵素功能,某些RNA也具
酵素功能
First identified in plant virus
Purines and Pyrimidines
The Strand in the Double-helix are
Antiparalle
3’
5’
5’
3’
Paradigm
DNA sequence
determines
Protein sequence
determines
Protein structure
determines
Protein function
Most of the organized activity of bioinformatics has been
focused on the analysis of the data related to these processes
Observable and Data
Archives
A Databank
An archive of information
A logical organization
Structure of that information
Tools to gain access to it
A Databank in Molecular Biology
Archival databanks of biological
information
DNA and protein sequence
Nucleic acid and protein structure
Databanks of protein expression
A Databank in Molecular Biology
Derived Databanks
Sequence motifs
Mutations and variants in DNA and protein
sequences
Classification and relationships
Bibliographic Databanks
Databanks of web sites
Databanks of databanks containing biological
information
Links between databanks
The Mechanism of Access to a
Databank is
the Set of Tools for answering
Question Such as:
Does the databank contain the
information I require?
How can I assemble information from the
databank in a useful form?
Indices of databanks are useful in
asking ” Where can I find some specific
piece of information?”
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (1)
Give a sequence or fragment of a sequence
Find sequence in the database that are similar to it
A central problem in bioinformatics
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (2)
Give a protein structure or fragment
Find protein structures in the database that are similar to it
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (3)
Give a sequence of a protein of unknown structure
Find structures in the database that adopt similar threedimensional structures
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (3)
For if two protein have sufficiently similar sequences
They will have similar structure
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (4)
Give a protein structure
Find sequences in the data bank that correspond
to similar structures
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics
(1) and (2) are solved problems
(3) and (4) are active fields of
research
Curation, Annotation and Quality
Control
Older data were limited by older
techniques
Amino acid sequences of protein used to
be determined by peptide sequencing.
Now, almost al are translated from DNA
sequences.
Curation, Annotation and Quality
Control
Distributed error-correction and
annotation
Dynamic error-correction and
annotation