Bioinfo primer

Download Report

Transcript Bioinfo primer

Christophe Roos - MediCel ltd
[email protected]
Molecular biology goes in silico
Informatics goes system biology
Gene-networks in signaling
Drosophila as a model
Christophe Roos - Bioinfo primer
Spring 2002
From single cell to organism – a life cycle
The use of a model organism
• Fertilisation followed by cell
division
• Pattern formation – instructions for
– Body plan (Axes: A-P, D-V)
– Germ layers (ecto-, meso-, endoderm)
• Cell movement - form – gastrulation
• Cell differentiation
• Cell growth, cell death (apoptosis)
Christophe Roos - Bioinfo primer
Spring 2002
Development of the body plan
• We are much more like flies in our development than you
might think.
• Drosophila is the best understood of all developmental
systems.
• We have evolved by two genome duplications
• Like all animals with bilateral symmetry, the fly embryo is
patterned along two distinct and largely independent axes:
Anterior-Posterior and Dorsal-Ventral
Christophe Roos - Bioinfo primer
Spring 2002
Pattern formation – germ layers
Christophe Roos - Bioinfo primer
Spring 2002
Bioinfo
5 text slides
From lab bench to computer keyboard
• Molecular biology is the mother of Biotechnology, an area with huge
potential applications.
• Molecular biology has handled on single genes and proteins, but now
methods make it possible to operate on large sets simultaneously.
• Information technology is an essential enabling technolgy(tool) in
molecular biology. We know it as bioinformatics or biocomputing.
• Bioinformatics is to a large extent a predictive science, the results of which
enter public and private electronic databases.
Christophe Roos - Bioinfo primer
Spring 2002
Bioinfo
4 text slides
The experimental data hits the hard disk
• Biological data is accumulating at a high rate
–
–
–
–
DNA and protein sequence
Gene expression profiles
Protein structure
Scientific litterature are accumulating at a high rate.
• The genome of several organisms has been sequenced (many
viruses and bacteria, yeast, C.elegans and the fruitfly
Drosophila). Most of the DNA sequence of the complete
human genome has been determined.
Christophe Roos - Bioinfo primer
Spring 2002
Bioinfo
3 text slides
… but what is the data?
genome, gene, protein
• Each cell contains a full genome (some exceptions), it consists of DNA
• The size varies:
– Small for viruses and prokaryotes (10 kbp-20Mbp)
– Medium for lower eukaryotes
• Yeast, unicellular eukaryote 13 Mbp
• Worm (Caenorhabditis elegans) 100 Mbp
• Fly, invertebrate (Drosophila melanogaster) 170 Mbp
– Larger for higher eukaryotes
• Mouse and man 3.000 Mbp
– Very variable for plants (many are polyploid)
• Mouse ear cress (Arabidopsis thaliana) 120 Mbp
• Lilies 60.000 Mbp
Christophe Roos - Bioinfo primer
Spring 2002
Bioinfo
2 text slides
… the data
genome, gene, protein
• The genome is partitioned over one or many chromosomes. Their number
is constant within a species but varies between species (range ca. 1-100).
• The chromosome is one DNA double helix molecule
• A gene is the smallest functional unit on a chromosome that codes for a
protein or an effector RNA (e.g. tRNA and rRNA). The gene is directional
(5’ end → 3’ end).
– Regulatory regions (promotor & enhancer)
– Transcribed regions: exons and introns
• Introns are spliced away during maturation, exons are concatenated
• Exons make up the 5’ UTR, the CDS and the 3’ UTR)
chromosome
promotor
5’UTR
Christophe Roos - Bioinfo primer
CDS
3’UTR
mRNA
Spring 2002
Bioinfo
Last insert
… the data
genome, gene, protein
• The proteins are formed by amino acids, each of them (20 different) is
coded for by one or several triplets (43=64 different).
• As the gene code is read is triplets, it is essential to keep the frame correct
(theoretically 3 forward and 3 reverse frames for any DNA segment).
• The polypeptide chain is linear but folds into a 3D-structure.
– The 3D structure is pivotal for the function of most proteins
– The 3D structure consists of folds
– Some discrete structures make up the folds (a-helix, b-sheet, etc.)
– The 3D structure cannot (yet) be predicted, but can be measured by
NMR or X-ray spectroscopy of crystals.
– The structure is not static and depends also on partners.
Christophe Roos - Bioinfo primer
Spring 2002
Genes control cell behavior by controlling
which proteins are made by a cell
• Genomic content constant: all cells have the same
instructive set
• Differential gene activity controls development
• Understanding development means a.o.
understanding gene control
•
•
•
•
•
•
•
Chromatin structure
Transcription
Processing (splicing)
Nuclear export
Cytoplasmic location (storage)
Translation
Modifications of the polypeptide
• Glycosylation (sugars)
• Proteolytic cleavage
• Complex formation
Christophe Roos - Bioinfo primer
Spring 2002
Development is progressive
• Specification of cell fate:
determination
– All cells still ‘look the same’
– Can be tested by transplantation
experiments
• Interactions can make cells different
from each other: induction
Christophe Roos - Bioinfo primer
Spring 2002
Patterning – interpretation of
positional information
• Positional value
– Morphogen – a substance
– Threshold concentration
• Program for development
– Generative rather than
descriptive
Christophe Roos - Bioinfo primer
Spring 2002
The bicoid gene
provides an A-P
morphogen
gradient
Christophe Roos - Bioinfo primer
Spring 2002
The A-P axis is divided into broad
regions by gap gene expression
• The first zygotic genes
• Respond to maternally-derived
instructions
• Short-lived proteins, gives bellshaped distribution from source
Christophe Roos - Bioinfo primer
Spring 2002
Transcription factors in cascade
• Hunchback (hb) , a gap gene,
responds to the dose of bicoid
protein
• A concentration above threshold
of bicoid activates the expression
of hb
• The more bicoid transcripts, the
further back hb expression goes
Christophe Roos - Bioinfo primer
Spring 2002
Krüppel reads two values
• Krüppel (Kr), a gap gene,
responds to the dose of hb protein
• A concentration above minimum
threshold of hb activates the
expression of Kr
• A concentration above maximum
threshold of hb inactivates the
expression of Kr
Christophe Roos - Bioinfo primer
Spring 2002
Segmentation:
activation of the pairrule genes
• Parasegments are delimited
by expression of pair-rule
genes in a periodic pattern
• Each is expressed in a
series of 7 transverse
stripes
Christophe Roos - Bioinfo primer
Spring 2002
Lewis Wolpert, Rosa Beddington, Jeremy Brockes,
Thomas Jessel, Peter Lawrence, Elliot Meyerowitz,
Principles of Development, Current Biology ltd,
Oxford University Press 1998, ISBN 0-19-850263-X
Universals:
The homeotic genes also specify human development
Christophe Roos - Bioinfo primer
Spring 2002
Genes are controlled by activating and repressing
transcription factors that bind the promotor
promotor
• The 500bp promoter region of the even-skipped gene
• Gene expression occurs when the activating factors are present above a
threshold
• Repressors may act by preventing binding of activators
Christophe Roos - Bioinfo primer
Spring 2002
… understand the promotor
What we might know about the promotor:
...CAGTGCTAATATAAAACTGATATTTAATTGAAATCTTTTCTAATTTAGCGCGCTCAGCTGTTGGGTGACCTTGCTGCCGTTCAAATTCCGGAGGAGGAGCTGCAGCAGTATACTTCC
ATTAGCCAAGTGCAAACCGTGGGATTAAAGCGTCTACCCACCCTTGACGAGTATCTAGCCAAGAAAAAGGAAAGACAGGCCCAAGTTTTAGCTGAAAAAAGCTCGGCGTCGGGTCTCCGC
GTAAATGCTATAAAGGGCTCCAAGCGCAAGCTTCTCGTCGAAGAGGAGGAGGAACTACAGGCCAAGCGAAAGAATCCGAATGTAATTAGCGTGGAGGAAGATGACGAAGATTCTTCATCC
TCTGATGAGGACGATGAGGAGGCACCAGCTCAATCCGCTCCTATTGCCATACCCACTCCAGTGTCTATAGCTCCACCGCAAATCGCTGTTAAACCACCCATTAAAAAGTTGAAGCCAGAG
CCTAACCCACCTGCCTGTATCCACCAGACTGTCTATGTGCCCGTACATCGGACAACAGAAGTTCAGAATGCCCGTCTTCGACTGCCTATCCTCGCGGAGGAGCAGCAGGTGATGGAGACA
ATCAACGAAAACCCCATTGTGATCGTGGCTGGTGAGACTGGCTCTGGAAAGACTACCCAGCTACCGCAGTTCCTGTACGAAGCGGGGTATGCCCAGCACAAGATGATTGGAGTGACGGAG
CCGCGGCGAGTGGCTGCTATTGCCATGTCCAAGCGGGTGGCCCACGAGATGAACCTGCCGGAGAGCGAGGTGTCATACCTCATTCGCTTCGAGGGAAACGTAACACCAGCGACGCGCATT
AAATTCATGACAGATGGTGTGTTGCTTAAGGAGATCGAAACTGACTTTCTGCTTAGTAAGTACTCAGTGATCATCCTGGACGAGGCGCACGAGCGCAGTGTTTACACAGACATCCTAGTG
GGTCTCCTGTCAAGGATCGTGCCCTTGCGTCACAAACGCGGGCAGCCGCTGAAGCTGATCATTATGTCTGCCACTTTGCGGGTATCCGATTTTACAGAGAATACTCGCTTGTTTAAGATT
CCGCCACCGTTGCTTAAAGTGGAGGCTCGACAATTTCCGGTGACTATTCACTTCCAGAAGCGCACACCTGATGACTATGTGGCGGAGGCTTACCGCAAGACCTTAAAAATCCATAATAAG
CTTCCGGAAGGCGGCATACTAATTTTTGTGACGGGACAGCAGGAGGTCAACCAACTGGTGCGCAAGCTGCGACGTACGTTTCCGTATCATCATGCGCCAACCAAGGATGTCGCTAAAAAT
GGAAAGGTATCGGAGGAAGAAAAAGAGGAAACAATAGATGATGCGGCATCGACTGTGGAGGATCCCAAGGAGCTGGAGTTTGATATGAAACGAGTTATACGTAATATTCGTAAATCTAAG
AAAAAGTTCTTGGCGCAAATGGCGTTACCCAAAATCAATTTGGACGACTACAAGCTCCCTGGTGATGATACGGAAGCAGACATGCACGAGCAGCCGGATGAGGATGATGAGCAGGAGGGA
CTAGAAGAGGATAACGACGATGAACTAGGCTTGGAGGATGAGTCGGGAATGGGATCTGGTCAAAGGCAACCTCTGTGGGTCCTGCCGCTCTACTCGCTCCTCTCCTCGGAGAAGCAAAAC
CGCATCTTCCTGCCCGTTCCCGATGGCTGCCGGCTATGCGTGGTTAGCACCAATGTGGCAGAGACATCTCTCACCATCCCGCACATCAAGTATGTTGTTGACTGTGGTCGCCAGAAGACG
CGTCTTTACGACAAACTGACGGGTGTGAGTGCTTTTGTGGTAACCTACACGTCTAAGGCCTCGGCGGATCAGCGTGCTGGACGAGCGGGTCGCATCAGCGCCGGACATTGCTATCGCCTC
TACTCGAGTGCCGTGTACAACGACTGCTTCGAGGACTTTTCCCAGCCGGATATCCAGAAAAAGCCCGTCGAGGACCTTATGCTGCAAATGCGCTGCATGGGCATCGATCGCGTGGTGCAC
TTTCCCTTTCCCTCACCACCGGATCAAGTGCAGCTGCAAGCCGCCGAGCGGCGATTGATCGTGCTAGGTGCCCTGGAGGTCGCCAAGACAGAGAATACAGATTTGCCACCAGCCGTTACT
CGTTTGGGTCACGTTATCTCCCGCTTTCCCGTGGCGCCGCGCTTTGGAAAAATGCTGGCTCTGTCCCACCAGCAGAACCTACTGCCCTACACCGTCTGCCTGGTGGCCGCACTTTCAGTC
CAGGAGGTGCTAATCGAAACGGGCGTTCAAAGGGATGAGGATGTGGCACCTGGCGCGAATCGGTTCCACCGCAAACGCCAAAGTTGGGCGGCCAGCGGCAACTATCAGTTGCTTGGAGAT
CCTATGGTCTTATTACGTGCCGTAGGAGCTGCAGAGTACGCCGGATCGCAGGGCCGCTTGCCAGAGTTTTGTGCTGCGAATGGATTGCGCCAGAAAGCGATGAGCGAGGTGCGAAAATTG
CGCGTCCAGCTGACTAACGAGATTAACCTGAATGTTAGTGACGTTGAGCTGGGTGTGGACCCCGAACTGAAGCCTCCCACCGATGCCCAGGCGCGTTTCCTTCGCCAAATTCTATTGGCC
GGCATGGGCGACCGGGTGGCTAGAAAGGTACCTCTGGCAGACATCGCCGACAAGGAAGAGCGGCGGCGATTAAAGTACGCATACAATTGTGCTGACATGGAGGAACCAGCGTTCCTGCAC
GTCTCATCCGTGTTGCGTCAAAAAGCACCCGAATGGGTAATCTATCAGGAGGCATACGAGCTGCAAAACGGCGACTCTACCAAGATGTTCATCCGCGGC...
This page shows 3000 characters
Thus, the human genome has about 106 pages
Christophe Roos - Bioinfo primer
… however
Spring 2002
… the reality is certainly different
What the protein regulators might know about the promotor:
Christophe Roos - Bioinfo primer
Spring 2002
… and when regulation concerns many genes
simultaneously…
Lytic cycle decision l-phage: 11
genes
Human Genome:
~ 31 000 – 40 000 genes
There is more than
promotors
McAdams and Shapiro Science, 1995, 269, pp.650-656
Christophe Roos - Bioinfo primer
Spring 2002
Data types will diversify
• While the promotor is a challenge to bioinformatics, it is only one
tiny facet of biological data
• Other data types concern among other
– Gene transcripts or proteins present
• At various time points
• In different tissues
• In diseases
– Interactions between components
– Pathways
• Metabolic
• Regulatory
How can it be organised?
Christophe Roos - Bioinfo primer
Spring 2002
Pathway database, interaction database, ...
Christophe Roos - Bioinfo primer
Spring 2002
A wealth of databases
•Primary and derived
databases
•Each one accessible via
separate tools
• sometimes crossindexed
• with separate syntax
• with different levels of
confidence
• with errors
We have a problem
Christophe Roos - Bioinfo primer
Spring 2002
Biology in the computing age
• What does informatics mean to biologists?
•
•
•
•
•
Representing the data to the user
Organising the data in databases
Disseminating the data over Internet
Manipulating and interlinking of the data
Analysing of the data
• What challenges does biology offer computer scientists?
•
•
•
•
•
Cracking the genome code
Presenting data in an intelligible form
Biological data is complex and interlinked
Multiple entities interact to form pathways, networks
Model, simulate and understand how living things function
Christophe Roos - Bioinfo primer
Spring 2002
System biology needs more
•
•
•
•
Mathematics, systematics, semantics
Reverse engineering
Modelling
Data mining
• However, it has to be done starting from biological
premises
Christophe Roos - Bioinfo primer
Spring 2002