A central problem in bioinformatics

Transcript A central problem in bioinformatics

Bioinformatics
Ch1. Introduction
阮雪芬
2002, Oct 17
NTUST
www.ntut.edu.tw/~yukijuan/lectures/bioinfo/Oct17.ppt
Outline




A scenario
Life in space and time
Dogmas: central and peripheral
Observable and data archives
Traditional and Current Biology
 Traditionally, biology has been an
observational science.
 Now, biology has been converted into
deductive science.
The Data of Bioinformatics
 Very very large amount
 Nucleotide sequence databanks
contain 16 x 109 bases
 The full three-dimensional
coordinates of proteins of average
length ~400 residues: 16000 entries
 Not only are the individual databanks
large, but their sizes are increasing as
a very high rate.
GenBank
Goals
 “Saw life clearly and saw it whole”
Understand integrative aspects of
the biology of organisms
 To interrelate sequence, threedimensional structure, interactions,
and function of individual proteins,
nucleic acids and protein-nucleic acid
complexes
Goals
 To deduce events in evolutionary
history.
 To support application to medicine,
agriculture and other scientific fields.
A Scenario
Imagine a Crisis (1)
 A new biological virus creates an
epidemic of fatal disease in humans
Laboratory scientists will
or animals
isolate its genetic materiala molecule of nucleic acid
and determine the
sequence.
Computer program will then take over
Imagine a Crisis (2)
Screening this new genome against a
data bank of all know genetic messages
Developing antiviral therapies: virus
contain protein molecules which are
suitable targets, for drugs that will
interfere with viral structure or function
Imagine a Crisis (3)
From the viral DNA sequences
Computer program
Protein sequence
Imagine a Crisis (4)
From amino acid sequences
Computer program
Three-dimensional structure
Homology Modelling
Data bank will be screened for related proteins of know structures
Computer program
Structure will be predicted
A
B
Ab initio
No related protein of known structure is found
Ab initio
Predicting the structure
Design Therapeutic Agents
Knowing the viral protein structure
Design therapeutic agents
Life in space and time
In Space
 Biosphere
 Ecosystem
 Darwinian selection or genetic drift
Natural mutation
The recombination of
genes in sexual
reproduction
Direct gene transfer
The generation of variants
In Space
Ecosystem
Species
Cell
Nuclei, organelles and cytoskeleton
Molecules
In Time
 A history of life
3.5 billion years
Dogmas: Central and
Peripheral
Central Dogmas
 1957, Crick提出中心教條”DNA製造RNA，RNA
製造蛋白質”
 中心教條大體上是對的，但也有些需要修正





有許多RNA病毒：RNADNARNAProtein
跳躍基因
真核細胞RNA需要經過剪接
不只蛋白質具酵素功能，某些RNA也具酵素功能
某些基因可經不同的轉錄起始點或不同的剪接方式，
製備出多種RNA ，而轉錄成功能不同的蛋白質
真核細胞RNA需要經過剪接
不只蛋白質具酵素功能，某些RNA也具
酵素功能
First identified in plant virus
Purines and Pyrimidines
The Strand in the Double-helix are
Antiparalle
3’
5’
5’
3’
Paradigm
DNA sequence
determines
Protein sequence
determines
Protein structure
determines
Protein function
Most of the organized activity of bioinformatics has been
focused on the analysis of the data related to these processes
Observable and Data
Archives
A Databank




An archive of information
A logical organization
Structure of that information
Tools to gain access to it
A Databank in Molecular Biology
 Archival databanks of biological
information
 DNA and protein sequence
 Nucleic acid and protein structure
 Databanks of protein expression
A Databank in Molecular Biology
 Derived Databanks
 Sequence motifs
 Mutations and variants in DNA and protein
sequences
 Classification and relationships
 Bibliographic Databanks
 Databanks of web sites
 Databanks of databanks containing biological
information
 Links between databanks
The Mechanism of Access to a
Databank is
 the Set of Tools for answering
Question Such as:
 Does the databank contain the
information I require?
 How can I assemble information from the
databank in a useful form?
 Indices of databanks are useful in
asking ” Where can I find some specific
piece of information?”
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (1)
Give a sequence or fragment of a sequence
Find sequence in the database that are similar to it
A central problem in bioinformatics
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (2)
Give a protein structure or fragment
Find protein structures in the database that are similar to it
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (3)
Give a sequence of a protein of unknown structure
Find structures in the database that adopt similar threedimensional structures
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (3)
For if two protein have sufficiently similar sequences
They will have similar structure
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics (4)
Give a protein structure
Find sequences in the data bank that correspond
to similar structures
A Variety of Possible Kinds of Database
Queries Can Arise in Bioinformatics
 (1) and (2) are solved problems
 (3) and (4) are active fields of
research
Curation, Annotation and Quality
Control
 Older data were limited by older
techniques
 Amino acid sequences of protein used to
be determined by peptide sequencing.
 Now, almost al are translated from DNA
sequences.
Curation, Annotation and Quality
Control
 Distributed error-correction and
annotation
 Dynamic error-correction and
annotation

A central problem in bioinformatics

Transcript A central problem in bioinformatics

Directory