Introduction to bioinformatics

Download Report

Transcript Introduction to bioinformatics

What is bioinformatics?
+
BIO
INFORMATICS
Bioinformatics is the use of computers to solve
biological problems
2
How large is your DNA?
3 billion (3 x 109) base pairs
How many phone books would you
need, if each book had:
•1000 pages
•10000 letters per page
How high would your stack of books be
if each book was 5 cm thick?
600 books → stack is 30 m tall!
3
How large is your DNA?
3 billion (3 x 109) base pairs
How many DVD-ROMs would you need if:
•1 base = 1 byte
•1 DVD = 4.7 Gigabyte = 4.7 * 109 byte
2 DVDs
Computers are essential for the storage of large
amounts of data
4
Dealing with data
3.000.000.000 base pairs…
600 phone books!
How to find information?
Imagine that you have to find phone number 471-1435 …
Computers are essential to find information among a
large amount of data
5
Databanks
Databanks are “storage places” for information.
By sorting information in a clever way, things can be retrieved
easier later on.
Three important databanks in bioinformatics:
EMBL : nucleotide sequences (DNA, RNA)
Swiss-Prot : protein sequences and their
function
PDB : protein structures
6
Development of biotechnology and bioinformatics
3 elementary components:
Experimental development: the development of new
methods and techniques in the lab
Discoveries and other advances in knowledge
Computer improvement
7
Important molecules
8
Protein structure
9
What makes a protein fold?
Hydrophobic interaction
Disulphide bond
Hydrogen bond
Ionic interaction
10
The Human Genome Project
The idea for this project was born in 1988. At that time,
scientists predicted that it would take around 20 years to
complete the project
3.000.000.000 base pairs were sequenced in 2003
Only 2% of the genome contains information about
proteins. At this time, it is still unknown what the other
98% does => is this “junk” DNA?
We have around 20,000 genes in our genome. This is not
much when you think that a worm with 350 brain cells has
barely fewer genes. Therefore the hot question is: how
many proteins are really encoded in those 20,000 genes?
Half of the genes code for proteins of (currently)
unknown function
11
SNPs and mutations
12
Homology: genomic analysis
13
Homology
Amino acid sequence
Eiwit X: ALLELAMKLAIGNSGP
Eiwit Y: ALLLIAMKLAIGNSGP
X
Y
14
So now we know…
… That there is a lot of information available, so it must
be ordered
… That computers are essential to search the
information
… That we still have to research a lot of things: we now
have a rich “bio-vocabulary”, but we still can’t really
speak the language of life
15
www.bioinformatics-at-school.eu
16