Bioinformatics: One Minute and One Hour at a Time

Download Report

Transcript Bioinformatics: One Minute and One Hour at a Time

Bioinformatics:
One Minute
and One Hour
at a Time
Laurie J. Heyer
L.R. King Asst. Professor of Mathematics
Davidson College
[email protected]
What is Bioinformatics?
Computer
Science
Mathematics
Bioinformatics
Biology
Genomics, Proteomics and
Systems Biology
• Primary audience
– Junior bio majors
• Prerequisites
– Bioinformatics and intro
molecular biology
or
– One of several 300-level
biology courses
• Course home page:
– http://www.bio.davidson.edu/
genomics
• “Math Minutes”
• Taught by A. Malcolm
Campbell (Biology)
Sample Topic:
DNA Microarrays
Plotting Expression Data
• One highlighted
gene is induced 16
fold
• One highlighted
gene is repressed
16 fold
• But induction looks
much more dramatic
Log Transformation
• Calculate log2 of each
ratio
• Ratio of 16 becomes
value of 4
• Ratio of .0833 (1/16)
becomes value of –4
• Induction and
repression look equal,
but opposite sign
Hierarchical Clustering
• Join two most similar genes
• Join next two most similar
“objects” (genes or clusters
of genes)
• Distance from one gene to a
set of genes is minimum of
all distances from the gene
to the individual members
(Single Linkage)
• Repeat until all genes have
been joined
Genome Consortium for
Active Teaching (GCAT)
http://www.bio.davidson.edu/GCAT
High School Chips
See Kathy Gabric’s page:
http://cstaff.hinsdale86.org/~kgabric/honorscalendar.html
Bioinformatics Course
• Prerequisites
– Genomics or experience with modeling and “algorithmic
thinking”
• Goals:
– To understand and apply various algorithms and statistical
tests for analyzing DNA, RNA and protein sequences, and
DNA microarray data.
– To gain practical experience with Perl, a programming
language widely used in molecular biology, web design, and
text processing.
• Course home page
– http://gcat.davidson.edu/bioinformatics/bioinf.html
Bioinformatics Topics
•
•
•
•
•
•
•
•
•
Determining sequences
Comparing sequences
Finding genes
Predicting structure
Comparing genomes
Inferring phylogenies
Analyzing images
Clustering gene expression patterns
Designing experiments
Bioinformatics Projects
Image Segmentation
• Locate spot (signal)
pixels
• Measure intensity of
signal and
background in each
channel
• Compute ratio
Adaptive Circle Algorithm
• Specify threshold % between
darkest and lightest pixel
• Pixels above threshold are
“on”, others are “off”
• Combine two binary images
– if pixel is “on” in either
image, it is “on” in combined
image
• Search for radius and center
that maximize percent of “on”
pixels
Adaptive Circle V2 (Dapple)
• Compute 4-neighbor
second-difference
approximation to the
Laplacian
• Find sharply defined
“upper” edge by
convolving
Laplacian with
annular filters
From “Dapple: Improved Techniques for Finding
Spots on DNA Microarrays” UW CSE Technical
Report UWTR-2000-08-05
Quality Clustering: QT Clust
1. Each gene builds a supervised cluster
2. Gene with “best” list, and genes in its list, becomes next cluster
3. Remove these genes from consideration, and repeat
4. Stop when all genes are clustered, or largest cluster is smaller than
user specified threshold
Why teach Bioinformatics?
• Critical thinking
• Interdisciplinary
• Integrative
–
–
–
–
–
Modeling
Data analysis
Computational science
Discrete math
Probability and statistics
• Student research opportunities