Bioinformatics Tools

Download Report

Transcript Bioinformatics Tools

Introduction to Bioinformatics
236523/234525
Lecturer: Prof. Yael Mandel-Gutfreund
Teaching Assistance:
Shai Ben-Elazar
Idit kosti
Course web site :
http://webcourse.cs.technion.ac.il/236523
What is Bioinformatics?
2
Course Objectives
• To introduce the bioinfomatics discipline
• To make the students familiar with the major
biological questions which can be addressed
by bioinformatics tools
• To introduce the major tools used for
sequence and structure analysis and explain
in general how they work (limitation etc..)
3
Course Structure and Requirements
1.Class Structure
1.
2.
2 hours Lecture
1 hour tutorial
2. Home work
•
Homework assignments will be given every second
week
•
The homework will be done in pairs.
•
5/5 homework assignments will be submitted
2. A final project will be conducted in pairs
* Project will be presented as a poster –poster day 14.3
4
Grading
• 20 % Homework assignments
• 80 % final project
5
Literature list
• Gibas, C., Jambeck, P. Developing Bioinformatics
Computer Skills. O'Reilly, 2001.
• Lesk, A. M. Introduction to Bioinformatics. Oxford
University Press, 2002.
• Mount, D.W. Bioinformatics: Sequence and Genome
Analysis. 2nd ed.,Cold Spring Harbor Laboratory
Press, 2004.
Advanced Reading
Jones N.C & Pevzner P.A. An introduction to
Bioinformatics algorithms MIT Press, 2004
6
What is Bioinformatics?
7
What is Bioinformatics?
“The field of science in which biology, computer
science, and information technology merge to
form a single discipline”
Ultimate goal: to enable the discovery of new
biological insights as well as to create a global
perspective from which unifying principles in
biology can be discerned.
8
Central Paradigm in Molecular Biology
Gene (DNA)
mRNA
Protein
21ST centaury
Genome
Transcriptome
Proteome
9
From DNA to Genome
Watson and Crick
DNA model
1955
1960
1965
1970
1975
1980
1985
10
1990
First genome
Hemophilus Influenzae
1995
Yeast genome
2000
First human
genome draft
11
Complete Genomes
Total
2010
1379
2005
294
Eukaryotes
133
39
Bacteria
1152
235
Archaea
94
23
12
1,000 Genomes Project: Expanding the
Map of Human Genetics
Researchers hope the effort will speed up
the discovery of many diseases's genetic
roots
13
25000 genomes… What’s Next ?
The “post-genomics” era
Annotation
Comparative
genomics
Functional
genomics
Systems
Biology
Main Goal:
To understand the living cell
14
From ….25000 genomes
To…Understanding living cells
Annotation
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT ......
.............. TGAAAAACGTA
16
Identify the genes within a
given sequence of DNA
Identify the sites
Which regulate the gene
Annotation
Predict the function
17
How do we identify a gene
in a genome?
A gene is characterized by several features (promoter, ORF…)
some are easier and some harder to detect…
18
TF binding site
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT .................................
Transcription
Start Site
promoter
.............. TGAAAAACGTA
ORF=Open Reading Frame Ribosome binding Site
CDS=Coding Sequence
19
Using Bioinformatics approaches for Gene hunting
Relative easy in simple organisms (e.g. bacteria)
VERY HARD for higher organism (e.g. humans)
20
Comparative
genomics
21
Perhaps not surprising!!!
How humans
are chimps?
Comparison between the full drafts of the human and chimp genomes
revealed that they differ only by 1.23%
22
So where are we different ??
Human
Chimp
Mouse
ATAGCGGGGGGATGCGGGCCCTATACCC
ATAGGGGGGATGCGGGCCCTATACCC
ATAGCGGGATGCGGCGCTATACCA
Human
Chimp
Mouse
ATAGCGGGGGGATGCGGGCCCTATACCC
ATAGGGG--GGATGCGGGCCCTATACCC
ATAGCG---GGATGCGGCGC-TATACC-A
23
And where are we similar ???
VERY
DIFFERENT
VERY SIMAILAR
Conserved between
many organisms
24
Functional
genomics
25
TO BE IS NOT ENOUGH
In any time point a gene can be functional or not
26
From the gene expression pattern we can lean:
What does the gene do ?
When is it needed?
What other genes or proteins interact with it?
…..
What's wrong??
27
Systems
Biology
28
Biological networks
Jeong et al. Nature 411, 41 - 42 (2001)
What can we learn from a network?
What can we learn from
Biological Networks
What can we
learn about this
protein
• Is the protein essential for the organism ?
• Is it a good drug targets?
What of all this will we learn in
the course?
The course will concentrate on the bioinformatics
tools and databases which are used to :
Annotate genes,
Compare genes and genomes
Infer the function of the genes and proteins
Analyze the interactions between genes and proteins
ETC….
32
Biological Databases
The different types of data are collected in
database
– Sequence databases
– Structural databases
– Databases of Experimental Results
All databases are connected
33
Sequence databases
•
•
•
•
Gene database
Genome database
Disease related mutation database
………….
34
Genome Browsers
Easy “walk” through the genome
UCSC Genome Browser
http://genome.ucsc.edu/
35
Disease related database
36
Sickle Cell Anemia
• Due to 1 swapping an A for a T, causing inserted amino
acid to be valine instead of glutamine in hemoglobin
Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/
37
Healthy Individual
>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA
GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC
AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC
TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT
CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT
GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC
>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]
EEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG
MVHLTP
AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN
ALAHKYH
38
Diseased Individual
>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA
GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC
AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC
TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT
CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT
GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC
>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]
VEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG
MVHLTP
AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN
ALAHKYH
39
Structure Databases
• 3-dimensional structures of proteins, nucleic
acids, molecular complexes etc
• 3-d data is available due to techniques such
as NMR and X-Ray crystallography
40
41
Databases of Experimental Results
• Data such as experimental microarray
images- gene expression data
• Proteomic data- protein expression data
• Metabolic pathways, protein-protein
interaction data, regulatory networks
• ETC………….
42
Literature Databases
PubMed
http://www.ncbi.nlm.nih.gov/pubmed/
Service of the National Library of Medicine
43
Putting it all Together
• Each Database contains specific
information
• Like other biological systems also these
databases are interrelated
44
PROTEIN
PIR
DISEASE
ASSEMBLED
GENOMES
LocusLink
SWISS-PROT
OMIM
GoldenPath
OMIA
WormBase
MOTIFS
TIGR
BLOCKS
Pfam
GENOMIC DATA
Prosite
GenBank
ESTs
dbEST
DDBJ
GENES
EMBL
RefSeq
unigene
AllGenes
SNPs
GENE
EXPRESSION
dbSNP
STRUCTURE
PDB
MMDB
SCOP
PATHWAY
Stanford MGDB
KEGG
NetAffx
COG
ArrayExpress
GDB
LITERATURE
PubMed
45