Some Biology that Computer Scientists Need for

Download Report

Transcript Some Biology that Computer Scientists Need for

Some Biology That Computer
Scientists Need for
Bioinformatics
Lenwood S. Heath
Virginia Tech
Blacksburg, VA 24061
[email protected]
University of Maryland
December 14, 2001
December 14, 2001
Slide 1
Overview
I.
Some Molecular Biology and Genomics
II. Language of the New Biology
III. Existing bioinformatics tools
IV. Bioinformatics challenges
V. Bioinformatics at Virginia Tech
December 14, 2001
Slide 2
I. Some Molecular Biology
• The instruction set for a cell is contained in its
chromosomes.
• Each chromosome is a long molecule called DNA.
• Each DNA molecule contains 100s or 1000s of genes.
• Each gene encodes a protein.
• A gene is transcribed to mRNA in the nucleus.
• An mRNA is translated to a protein on ribosomes.
December 14, 2001
Slide 3
Transcription and Translation
Transcription
DNA
December 14, 2001
Translation
mRNA
Protein
Slide 4
Elaborating Cellular Function
Regulation
Degradation
Transcription
DNA
Translation
mRNA
(Genetic Code)
Protein
Reverse
Transcription
Thousands
of Genes!
December 14, 2001
Functions:
• Structure
• Catalyze chemical reactions
• Respond to environment
Slide 5
Chromosomes
• Long molecules of DNA: 10^4 to 10^8 base pairs
• 26 matched pairs in humans
• A gene is a subsequence of a chromosome that
encodes a protein.
• Proteins associated with cell function, structure,
and regulation.
• Only a fraction of the genes are in use at any
time.
• Every gene is present in every cell.
December 14, 2001
Slide 6
DNA Strand
2’-deoxyribose (sugar)
5’ End
C
A
C
T
T
T
A
G
A
G
3’ End
C
G
Bases
A (adenine) complements T (thymine)
C (cytosine) complements G (guanine)
December 14, 2001
Slide 7
Complementary DNA Strands
C
A
C
T
T
T
A
G
A
G
C
G
G
T
G
A
A
A
T
C
T
C
G
C
C
A
C
T
T
T
A
G
A
G
C
G
G
T
G
A
A
A
T
C
T
C
G
C
Double-Stranded DNA
December 14, 2001
Slide 8
RNA Strand
Ribose (sugar)
5’ End
C
A
C
U
U
U
A
G
A
3’ End
G
C
G
Bases
U (uracil) replaces T (thymine)
December 14, 2001
Slide 9
Transcription of DNA to mRNA
Coding DNA Strand
C
A
C
T
T
T
A
G
A
G
C
G
G
T
G
A
A
A
T
C
T
C
G
C
Template DNA Strand
mRNA Strand
C
A
C
U
U
U
A
G
A
G
C
G
G
T
G
A
A
A
T
C
T
C
G
C
December 14, 2001
Template DNA Strand
Slide 10
Proteins and Amino Acids
• Protein is a large molecule that is a chain
of amino acids (100 to 5000).
• There are 20 common amino acids
(Alanine, Cysteine, …, Tyrosine)
• Three bases --- a codon --- suffice to
encode an amino acid.
• There are also START and STOP codons.
December 14, 2001
Slide 11
Genetic Code
December 14, 2001
Slide 12
Translation to a Protein
mRNA Strand
C
A
C
Histidine
U
U
U
Phenylalanine
A
G
A
Arginine
G
C
G
Alanine
Nascent Polypeptide: Amino Acids Bound Together by Peptide Bonds
Unlike DNA, proteins have three-dimensional
structure essential to protein function.
Protein folds to a three-dimensional shape that cannot
yet be predicted from the primary sequence.
December 14, 2001
Slide 13
Transcription and Translation
Transcription
DNA
December 14, 2001
Translation
mRNA
Protein
Slide 14
Transcription of DNA to mRNA
Coding DNA Strand
C
A
C
T
T
T
A
G
A
G
C
G
G
T
G
A
A
A
T
C
T
C
G
C
Template DNA Strand
mRNA Strand
C
A
C
U
U
U
A
G
A
G
C
G
G
T
G
A
A
A
T
C
T
C
G
C
December 14, 2001
Template DNA Strand
Slide 15
Translation to a Protein
mRNA Strand
C
A
C
Histidine
U
U
U
Phenylalanine
A
G
A
Arginine
G
C
G
Alanine
Nascent Polypeptide: Amino Acids Bound
Together by Peptide Bonds
December 14, 2001
Slide 16
Cell’s Fetch-Execute Cycle
• Stored Program: DNA, chromosomes, genes
• Fetch/Decode: RNA, ribosomes
• Execute Functions: Proteins --- oxygen
transport, cell structures, enzymes
• Inputs: Nutrients, environmental signals,
external proteins
• Outputs: Waste, response proteins, enzymes
December 14, 2001
Slide 17
II. The Language of the New Biology
A new language has been created. Words in the
language that are useful for today’s talks.
Genomics
Functional Genomics
Proteomics
cDNA Microarrays
Global Gene Expression Patterns
December 14, 2001
Slide 18
Genomics
• Discovery of genetic sequences and the ordering of
those sequences into
• individual genes;
• gene families;
• chromosomes.
• Identification of
• sequences that code for gene products/proteins;
• sequences that act as regulatory elements.
December 14, 2001
Slide 19
Genome Sequencing Projects
•
•
•
•
•
•
•
•
Drosophila
Yeast
Mouse
Rat
Arabidopsis
Human
Microbes
…
December 14, 2001
Slide 20
Drosophila Genome
December 14, 2001
Slide 21
Functional Genomics
• The biological role of individual genes.
• Mechanisms underlying the regulation of
their expression.
• Regulatory interactions among them .
December 14, 2001
Slide 22
Glycolysis, Citric Acid Cycle, and
Related Metabolic Processes
December 14, 2001
Slide 23
Gene Expression
• Only certain genes are “turned on” at any
particular time.
• When a gene is transcribed (copied to mRNA), it is
said to be expressed.
• The mRNA in a cell can be isolated. Its contents
give a snapshot of the genes currently being
expressed.
• Correlating gene expressions with conditions gives
hints into the dynamic functioning of the cell.
December 14, 2001
Slide 24
Responses to Environmental Signals
December 14, 2001
Slide 26
Intracellular Decision Making
December 14, 2001
Slide 27
Microarray Technology
• In the past, gene expression and gene
interactions were examined known gene by
known gene, process by process.
• With microarray technology:
– Simultaneous examination of large groups
of genes and associated interactions
– Possible discovery of new cellular
mechanisms involving gene expression
December 14, 2001
Slide 28
Flow of a Microarray Experiment
PCR
Select cDNAs
Replication and
Randomization
Robotic Printing
Hypotheses
Identify Spots
Intensities
Statistics
Hybridization
Test of
Hypotheses
Extract RNA
December 14, 2001
Clustering
Reverse
Transcription and
Fluorescent
Labeling
Data Mining, ILP
Slide 29
Relative Abundance
Detection
Detection
Treatment
1
1
1
Control
1
2
2
3
3
3
3
2
2
Mix
Spots: 1
2
(Sequences affixed to slide)
1
3
2
3
Hybridization
December 14, 2001
Slide 30
Gene Expression Varies
Cy5 to Cy3 ratios
December 14, 2001
Slide 31
III. Existing Computational
Tools in Bioinformatics
•
•
•
•
•
•
Sequence similarity
Multiple sequence alignments
Database searching
Evolutionary (phylogenetic) tree construction
Sequence assemblers
Gene finders
December 14, 2001
Slide 32
Existing Biological Databases
• Molecular Sequences: Genomic DNA,
mRNA, ESTs, proteins
• Protein domains, motifs, or blocks
• Protein families
• Genomes
• Nomenclature and ontologies
• Biological literature
December 14, 2001
Slide 33
IV. Challenges for Bioinformatics
• Analyzing and synthesizing complex
experimental data
• Representing and accessing vast quantities
of information
• Pattern matching
• Data mining --- whole genome analysis
• Gene discovery
• Function discovery
• Modeling the dynamics of cell function
December 14, 2001
Slide 34
V. Bioinformatics at Virginia Tech
Computer science interacts with the life sciences.
• Computer Science in Bioinformatics:
• Joint research with: plant biologists, microbial
biologists, biochemists, cell-cycle biologists, animal
scientists, crop scientists, statisticians.
• Projects: Expresso; Nupotato; MURI; Arabidopsis
Genome; Barista; Cell-Cycle Modeling
• Graduate option in bioinformatics
• Virginia Bioinformatics Institute (VBI)
December 14, 2001
Slide 35
Expresso: A Problem Solving Environment
(PSE) for Microarray Experiment Design
and Analysis
• Integration of design and procedures
• Integration of image analysis tools and statistical
analysis
• Data mining using inductive logic programming
(ILP)
• Closing the loop
• Integrating models
December 14, 2001
Slide 36
Getting Into Bioinformatics
• Learn some biology --- genetics, cell
biology
• Study computational (molecular) biology
• Get involved with bioinformatics
research in interdisciplinary teams
• Work with biologists to solve their
problems
December 14, 2001
Slide 42