Bioinformatics Centre

Download Report

Transcript Bioinformatics Centre

Plan for day 1
1. The course
1.
2.
3.
4.
Registration
Layout
Expectations
Evaluation and exam
2. What is bioinformatics?
3. Setup and connect computers
LUNCH
1.
2.
3.
4.
13:00. Setup and connect computers
Software overview
CLC Combined Workbench (presentation, installation, demo)
Install and play with general computer tools
What is bioinformatics?
Anders Krogh & Morten Lindow
The Bioinformatics Centre
Dept of Biology
University of Copenhagen
A big change in biology has taken place
Before
Measure the expression
of a single gene in a single
sample
Now
Measure the expression of all
genes in many samples
Mutations
Before you mapped mutations in
bacteria (lots of work, I think)
Now you sequence the whole
genome with ”next
generation sequencing”
Protein interactions
Before
After
Find interaction partners
for one protein
Find interaction partners for all proteins
Biology has become an
information science
Genome sequencing is just the beginning
1E+11
1E+10
1E+09
bps
Where are the genes?
How are they regulated?
What do they do?
How do they interact?
How did they evolve?
What about the rest of
the genome?
Growth of Genbank
1E+08
1E+07
1000000
TTTGTCTGTGATTTGCTTAACCGGGATATTTCTTCTCGACCTTTATCTGATGCTGATCGTG TTAAGATAAAAAAGGCTCTTAGAGGTGTCAAAGTTGAAGTGACTCATCGAGGAAACATGCGCCGGAAGTACCGCATTTCC GGTTTGACTGCTGTGGCCACTCGGGAATTGACATTCCCAGTAGATGAAAGAAATACTCAGAAATCTGTTGTAGAATACTT CCACGAAACATATGGTTTTCGCATTCAGCACACTCAACTACCATGCTTGCAAG
100000
1980
1985
1990
1995
Year
2000
2005
2010
Experimental labs need informatics
In many labs bioinformatics is the bottleneck.
Example:
• You want to study differences in miRNA expression in cancer
vs. normal tissue.
• Short RNAs are extracted. It is mailed to a company for
sequencing.
• You get a hard-disk in return full of short sequences.
• The rest is bioinformatics.
Definition
The book:
”bioinformatics involves the technology that
uses computers for analysis, storage, retrieval,
manipulation and distribution of information
related to biological macromolecules such as
DNA, RNA and proteins.”
Definition
Wikipedia: ”… The terms bioinformatics and
computational biology are often used interchangeably.
However bioinformatics more properly refers to the
creation and advancement of algorithms, computational
and statistical techniques, and theory to solve formal
and practical problems posed by or inspired from the
management and analysis of biological data.
Computational biology, on the other hand, refers to
hypothesis-driven investigation of a specific biological
problem using computers, carried out with experimental
and simulated data, with the primary goal of discovery
and the advancement of biological knowledge. “
Bioinformatics?
Search for homologs to a protein sequence
Retrieve information about a genome segment
Analyze experimental data in a spreadsheet
Predict the structure of an RNA molecule
Build a phylogentic tree connecting a set of proteins
Make an equation describing a neuronal action potential
Construct cardiac blood-flow model
Find differentially expressed genes using microarrays
Make a model of protein-protein interactions
Differential equations to describe prey/predator dynamics
Some challenges in bioinformatics
How to fully decipher the digital
content of the genome
How to predict protein structure
and function ab initio
How to analyze expression data
How to identify signatures for
cellular states (healthy vs
diseased)
How to extract regulatory
networks from the above
How to integrate multiple highthroughput data types
How to build hierachical models
across multiple scales of time and
space
How to visualize and explore
large scale multi-dimensional
data
How to reduce complex multidimensional models to underlying
principles
Inspired by
Leroy Hood
Example
In which you will learn a bit about:
accessing and searching for information in biodatabases
what microRNAs are
Prediction of RNA structure
Imagine
You are studying the oncogene c-Myc (a transcription factor)
You have isolated a complex containing the mRNA for c-Myc
In this complex you find a small RNA
You get excited!
You manage to clone and sequence it
• caaagugcuuacagugcagguagu
Now what?
Finding it in the genome
Is this a known molecule?
Since the human genome has been fully sequenced:
• We must be able to find out where it is encoded
• Go to a genome browser
Wow! It is a microRNA
Sidestep: What are microRNAs?
What are miRNAs?
The RNA revolution
Biology’s Big Bang
• 10 years ago: RNA was
considered uninteresting
messengers for the
proteins
• The non-coding part of the
genome (98%) was
considered junk
The Economist, June 2007
Beware of the RNA!
• It is your RNA that separates you from a worm
– not your proteins!
• It is the RNA that regulates your genes – as
much as proteins!
• New types of RNA are discovered every month
• Most of a genome is transcribed
• 98% of the genome is probably important (my
guess)
The RNA operating system
Imprinting – methylation
Splicing
Genome
siRNA/miRNA
Regulation
by proteins
Transcriptome
Proteome
Massive
regulation
by RNA
Ribozymes
MicroRNA
• Small (20-22nt) RNAs
• Pre-miRNA forms hairpin structure
• Involved in post-transcriptional regulation and
gene silencing (methylation)
• Important in development, brain, cancer, etc.
• Evolutionarily conserved (?)
miRNA logic
miRNA gene
AGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT
Pri-miRNA
Drosha
Dicer
Pre-miRNA
Export
A microRNA
Inhibit mRNA translation
Animal & Plant miRNA
Some miRNAs occur in clusters
miRNA targets
• Very few experimentally validated targets
mostly in fly and worm
• We have to rely on bioinformatic target
predictions. Probably very noisy.
– 10% of all genes regulated by miRNAs ( Enright et
al, 2003)
– 30% of all genes regulated by miRNAs (Lewis et al.
2004)
– ~all genes regulated to some degree (others)
Three main types of target sites
(a) Canonical sites: good
or perfect
complementarity
Characteristic bulge in
the middle.
(b) Dominant seed sites:
perfect seed (bases 28) match, but poor 3’
end complementarity.
(c) Compensatory sites:
mismatch or wobble in
seed region.
Compensate at the 3’
end.
From Mazière P,. and Enright, A , Drug
discovery today, 2007
Principal criteria to predict miRNA targets
• Seed complementary: seed regions (bases 2-8) of
miRNA sequences are complementary to the 3’ UTR.
• Target sites are conserved in other genome. (May
miss targets of recently evolved miRNAs )
• Target multiplicity: multiple binding sites for miRNA
• Thermodynamics of RNA-RNA duplex
• Target structure: lack of strong secondary structure
at miRNA-target binding site may be an important
feature
Overlap between methods
Hammell et al., Nature Methods 5:813-819
Finding it in the genome
Is this a known molecule?
Since the human genome has been fully sequenced:
• We must be able to find out where it is encoded
• Go to a genome browser
Wow! It is a microRNA
Sidestep: What are microRNAs?
Let’s assume this was NOT known already
RNA folding
Can it fold as a hairpin?
• Get the sequence with flanks (from genome
browser)
• Fold it at Vienna RNA
RNA
?
More details: RNA-lecture
Summary so far
Identify transcript
miRNA gene
ACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGT
RNA
A microRNA
Prediction of precursor
structure
protein
What controls the controller?
?
?-- miR-17 --?
ACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGT
Find the transcription start site
• Use and integrate existing data
• Genome browser: Known transcripts, genome annotation
(from cDNA data)
• Auxilary information (not yet in genome browser)
• Known 5’ ends (from RIKEN CAGE-tags)
• Known RNA polymerase II binding sites (from ChIP)
• Use or construct predictive models
• Machine learning / Inference (HMM, Neural Nets, SVM,
GLM)
• You need a bioinformatician for this!!
Summary so far
Prediction of binding sites
Identify transcript
miRNA gene
AGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT
RNA
A microRNA
Prediction of precursor
structure
protein
Prediction of transcription factor
binding sites
Does certain combinations of TFs
occur together?
In certain groups of genes?
Is this significant?
What biological meaning does it
make?
GACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT
UCSC transcription factor track
In another lecture: Motif Search
Prediction of microRNA targets
Transcriptional
ACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGT
unit
RNA
?
Prediction of microRNA targets
?
RNAs interact by forming base pairs (A-U C-G G-U)
Align microRNA and target (more details in Alignment-lecture)
Build in biology:
• Some part of the miRNA is more important than others
• Binding sites conserved in evolution tend to be more functional
MiRanda predictions
Regulatory systems
Transcriptional
ACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGT
unit
Maybe feedback regulation?
Regulates other RNA
( prevents them from being
translated to proteins )
A microRNA
RNA
A feedback loop? miR-155 and Bach
Bach2-binding sites
(repressor)
BIC - mir-155
AGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT
Bach2-proteins
“These results indicate that
BACH2 plays important roles in
regulation of B cell development.”
Oncogene. 2000 Aug 3;19(33):3739-49
miR-155
“Lack of BIC and microRNA miR-155 expression in
primary cases of Burkitt lymphoma.” Genes Chromosomes Cancer.
2006 Feb;45(2):147-53
Bioinformatics is like LEGO®
Build using different bricks to get Biological knowledge
• Databases of experimental data ( sequence, genome
annotation, molecule interactions etc, etc)
• Scan for transcription factor binding sites
• RNA folding and classification
• miRNA target prediction
Or design your own LEGO bricks!
• Enter the master’s program
Masters of Bioinformatics
What you have seen
Database
UCSC human genome browser
Using known information to find likely transcription start site
The horror of ids/names
Alignment and sequence search
Sequence search with BLAT against human genome
miRanda - Alignment to find miRNA targets
RNA
RNA folding of miRNA-precursor
Promoter analysis
Predicted transcription factor binding sites
Plan for day 1
1. The course
1.
2.
3.
4.
Registration
Layout
Expectations
Evaluation and exam
2. What is bioinformatics?
3. Setup and connect computers
LUNCH
1.
2.
3.
4.
13:00. Setup and connect computers
Software overview
CLC Combined Workbench (presentation, installation, demo)
Install and play with general computer tools