Opportunities in Bioinformatics for Computer - People

Download Report

Transcript Opportunities in Bioinformatics for Computer - People

Opportunities in Bioinformatics
for Computer Science
Lenwood S. Heath
Virginia Tech
Blacksburg, VA 24061
[email protected]
University of Iowa
November 16, 2001
November 16, 2001
Slide 1
Overview
• The New Biology
• Existing bioinformatics tools
• Bioinformatics challenges
• Bioinformatics at Virginia Tech
November 16, 2001
Slide 2
Some Molecular Biology
• The instruction set for a cell is contained in its
chromosomes.
• Each chromosome is a long molecule called DNA.
• Each DNA molecule contains 100s or 1000s of genes.
• Each gene encodes a protein.
• A gene is transcribed to mRNA in the nucleus.
• An mRNA is translated to a protein in a ribosome.
November 16, 2001
Slide 3
Transcription and Translation
Transcription
DNA
November 16, 2001
Translation
mRNA
Protein
Slide 4
Elaborating Cellular Function
Regulation
Degradation
Transcription
DNA
Translation
mRNA
(Genetic Code)
Protein
Reverse
Transcription
Functions:
• Structure
• Catalyze chemical reactions
• Respond to environment
November 16, 2001
Slide 5
Chromosomes
• Long molecules of DNA: 10^4 to 10^8 base pairs
• 26 matched pairs in humans
• A gene is a subsequence of a chromosome that
encodes a protein.
• Proteins associated with cell function, structure,
and regulation.
• Only a fraction of the genes are in use at any
time.
• Every gene is present in every cell.
November 16, 2001
Slide 6
DNA Strand
A= adenine complements T= thymine
C = cytosine complements G=guanine
November 16, 2001
Slide 7
Complementary DNA Strands
Double-Stranded DNA
November 16, 2001
Slide 8
RNA Strand
U=uracil replaces T= thymine
November 16, 2001
Slide 9
Amino Acids
• Protein is a large molecule that is a chain of
amino acids (100 to 5000).
• There are 20 common amino acids
(Alanine, Cysteine, …, Tyrosine)
• Three bases --- a codon --- suffice to encode
an amino acid.
• There are also START and STOP codons.
November 16, 2001
Slide 10
Genetic Code
November 16, 2001
Slide 11
Translation to a Protein
Unlike DNA, proteins have three-dimensional structure
Protein folds to a three-dimensional shape that
minimizes energy
November 16, 2001
Slide 12
Cell’s Fetch-Execute Cycle
• Stored Program: DNA, chromosomes, genes
• Fetch/Decode: RNA, ribosomes
• Execute Functions: Proteins --- oxygen
transport, cell structures, enzymes
• Inputs: Nutrients, environmental signals,
external proteins
• Outputs: Waste, response proteins, enzymes
November 16, 2001
Slide 13
The Language of the New
Biology
A new language has been created. Words in the
language that are useful for today’s talk.
Genomics
Functional Genomics
Proteomics
cDNA microarrays
Global Gene Expression Patterns
November 16, 2001
Slide 14
Genomics
•Discovery of genetic sequences and the ordering of
those sequences into
• individual genes;
• gene families;
• chromosomes.
• Identification of
• sequences that code for gene products/proteins;
• sequences that act as regulatory elements.
November 16, 2001
Slide 15
Genome Sequencing Projects
•
•
•
•
•
•
•
•
Drosophila
Yeast
Mouse
Rat
Arabidopsis
Human
Microbes
…
November 16, 2001
Slide 16
Drosophila Genome
November 16, 2001
Slide 17
Functional Genomics
• The biological role of individual genes;
• mechanisms underlying the regulation of their
expression;
• regulatory interactions among them.
November 16, 2001
Slide 18
Glycolysis, Citric Acid Cycle, and
Related Metabolic Processes
November 16, 2001
Slide 19
Gene Expression
• Only certain genes are “turned on” at any particular
time.
• When a gene is transcribed (copied to mRNA), it is
said to be expressed.
• The mRNA in a cell can be isolated. Its contents give
a snapshot of the genes currently being expressed.
• Correlating gene expressions with conditions gives
hints into the dynamic functioning of the cell.
November 16, 2001
Slide 20
Gene Expression:
Control Points
November 16, 2001
Slide 21
Free Radicals
November 16, 2001
Slide 22
Responses to Environmental Signals
November 16, 2001
Slide 23
Effects of Drought Stress
Virginia Tech:
Plant Biologists: Ruth Alscher, Boris Chevone.
CS: Lenny Heath, Naren Ramakrishnan, and colleagues.
Statistics: Ina Hoeschele, Shun-Hwa Li.
NC State (Forest Biotechnology):
Ying-Hsuan Sun, Ron Sederoff, Ross Whetten
November 16, 2001
Slide 24
Intracellular Decision Making
November 16, 2001
Slide 25
Relative Abundance
Detection
Detection
Treatment
1
1
1
Control
1
2
2
3
3
3
3
2
2
Mix
Spots: 1
2
(Sequences affixed to slide)
November 16, 2001
3
Hybridization
1
2
3
Slide 26
Gene Expression Varies
November 16, 2001
Slide 27
Existing Computational Tools
in Bioinformatics
•
•
•
•
•
•
Sequence similarity
Multiple sequence alignments
Database searching
Evolutionary (phylogenetic) tree construction
Sequence assemblers
Gene finders
November 16, 2001
Slide 28
Challenges for Bioinformatics
• Analyzing and synthesizing complex
experimental data
• Representing and accessing vast quantities
of information
• Pattern matching
• Data mining
• Gene discovery
• Function discovery
• Modeling the dynamics of cell function
November 16, 2001
Slide 29
Bioinformatics at Virginia Tech
Computer science interacts with the life sciences.
• Computer Science in Bioinformatics:
• Joint research with: plant biologists, microbial
biologists, biochemists, cell-cycle biologists, animal
scientists, crop scientists, statisticians.
• Projects: Expresso; Nupotato; MURI; Arabidopsis
Genome; Barista; Cell-Cycle Modeling
• Graduate option in bioinformatics
• Virginia Bioinformatics Institute (VBI)
November 16, 2001
Slide 30
Expresso: A Problem Solving
Environment (PSE) for Microarray
Experiment Design and Analysis
• Integration of design and procedures
• Integration of image analysis tools and statistical
analysis
• Data mining using inductive logic programming
(ILP)
• Closing the loop
• Integrating models
November 16, 2001
Slide 31
Flow of a Microarray Experiment
PCR
Select cDNAs
Replication and
Randomization
Robotic Printing
Hypotheses
Identify Spots
Intensities
Statistics
Hybridization
Test of
Hypotheses
Extract RNA
November 16, 2001
Clustering
Reverse
Transcription and
Fluorescent
Labeling
Data Mining, ILP
Slide 32
Expresso: A Microarray Experiment
Management System
November 16, 2001
Slide 33
Nupotato
• Potatoes originated in the Andes, where
there are many varieties.
• Many varieties survive at high altitude in
cold, dry conditions.
• Microarray technology can be used to
investigate genes that are responsible for
stress resistance and that are responsible for
the production of nutrients.
November 16, 2001
Slide 34
MURI
• Some microorganisms have the ability to
survive drying out or intense radiation.
• Their genomes are just being sequenced.
• Using microarrays and proteomics, we will
try to correlate computationally the genes in
the genomes with the special traits of the
microorganisms.
• We are currently using multiple genome
analysis.
November 16, 2001
Slide 35
Arabidopsis Genome Project
• Arabidopsis is a model higher plant.
• It is the first higher plant whose genome has
been fully sequenced.
• Gene finder software has been used to
identify putative genes.
• We are computationally mining the
regulatory regions of these genes for
promoter patterns.
November 16, 2001
Slide 36
Barista
• Barista serves Expresso!
• Software development team across projects
to minimize duplication of effort.
• Work with Linux, Perl, C, Python, cvs,
Apache, PHP, …
November 16, 2001
Slide 37
Virginia Bioinformatics
Institute (VBI)
• Research institute based at Virginia Tech
• Established July 1, 2000, with $3 million
• Will occupy 2 building and have 100+
employees in 4 years
November 16, 2001
Slide 38
Getting Into Bioinformatics
• Learn some biology --- genetics, cell
biology
• Study computational (molecular) biology
• Get involved with bioinformatics research
in interdisciplinary teams
• Work with biologists to solve their problems
November 16, 2001
Slide 39