Transcript Document

Topics
• Basis of Bioinformatics
• Goals of Bioinformatics
• Bioinformatics Jargon 101
Lecture 1 CS566
1
Basis of Bioinformatics
• What makes Bioinformatics possible?
– Advances in Biotechnology
• PCR, Sequencing, Shotgun sequencing, Large
scale data generation
– Advances in Computer hardware
– Advent of the WWW
– Representation of problems amenable to
Statistics and Computer Science
– Evolutionary underpinnings of life
Lecture 1 CS566
2
Biotechnology: Polymerase Chain
Reaction
• The anti-thesis, happily, of NPcompleteness
– Used to form exact copies of section of DNA
– Doubling of template per cycle, i.e., after n
cycles, 2n copies of DNA
– Advantages:
• Precise subsequence can be selected using
appropriate primers
• Can create large amounts from small sample
• Sine qua none for DNA sequencing projects, and a
lot of experimental biology
Lecture 1 CS566
3
Biotechnology: Sequencing
• Analogy: Reading a phrase
– Assumption: Can read only letter at a time
– Start with copies of the phrase to be read
– Allow several cycles of PCR to proceed
– At any moment in time, entire set of partial
phrases is present (all having the same start
point)
– Freeze
– Arrange phrases by size and just read
terminal letter
Lecture 1 CS566
4
Biotechnology: Sequencing
“This is the best course I’ve ever taken”
Shotgun sequencing
T
This is the best cou
Th
the best course I’ve
I’ve ever taken
Thi
This is the best course I’ve ever taken
This
This
This i
This is
Lecture 1 CS566
5
Shotgun Sequencing
• Analogy: Reading a long sentence,
indirectly
– Fragment few copies of a sentence into
phrases, randomly
– Find the order of characters in each phrase
– Find overlaps between phrases
– Assemble phrases into original sentence
– ‘Shotgun’ refers to parallel sequencing of
multiple ‘phrases’
Lecture 1 CS566
6
Large Scale Data Generation
• Sequencing robots permit complete
sequences to be obtained in a short time
• Expression arrays allow for simultaneous
measurement of the activity of thousands
of genes
• Mass spectrometric pipelines allow for the
simultaneous identification of several
proteins
• Autoanalyzers allow the automation of
measurement of numerous chemicals
Lecture 1 CS566
7
Advances in Computer Hardware
• Exponential increase in biological data has been
matched by Moore’s law: Periodic doubling of CPU
speeds
• Memory and Disk sizes have kept pace with the increase
in data volumes (from 1.44 Kb to Petabytes)
• Clustering allows for handling of many of the parallel
problems in biology (IBM’s many shades of blue..)
Role of the WWW
• Wide range of data and analysis tools just a few clicks
away (oversimplification)
• Results and Ideas within and between disciplines
disseminated very fast
• Web offers potential for mining across several databases
Lecture 1 CS566
8
Meat for Statistics and Computer
Science
• A lot can be learnt from the string representation of
biological molecules
• Now have data volumes for reliable statistical inferencing
• Now have computer hardware to support implementation
of algorithms
• Challenges:
– Stimulus for creating and refining statistical and computational
approaches
– Emulating Biology, as well as learning strategies from it
• “Computer Science was invented for Bioinformatics”Ewan Birney, GRC 2003
Lecture 1 CS566
9
Evolutionary Stochasticity
• “The chimpanzee is our cousin, but so is
yeast, albeit billions of years removed”
• Building evolutionary trees has a lot of
academic interest
• But the simple fact of evolutionary
relationships is useful in many ways
– Comparison across species useful in
understanding biology of individual species
Lecture 1 CS566
10
Goals in Bioinformatics
• Understand Biology
– Cataloguing biomolecules
– Understand what they do, in isolation
– Understand how things work together, at different
levels of abstraction
• Cure disease
– Drug target approach – Classical
– Integrated approach – Futuristic
• Multiple drugs for non-linear effects
• Address source of problem, not effect
Lecture 1 CS566
11
Bioinformatics Jargon 101
•
•
•
•
•
•
Nucleotide/Base/Phosphate
DNA/cDNA
RNA/mRNA/tRNA/rRNA
Protein/Amino Acid
Sequence/Sequencing
Homology/Orthology/
Paralogy/Analogy
• Exon/Intron/Intergenic region
• Genetic code/Codon
• Splicing/Alternative splicing
• Species/System/Tissue/
Organ/Cell/Organelle
• Genome/Chromosome/
Chromatin/Histones/Gene/
Allele/Diploid/Haploid
• Recombination/Mutation
• Replication/Transcription/
Expression/Translation
• Eubacteria/Archaea/
Eukaryotes/Viruses
• Maternal Inheritance
Lecture 1 CS566
12