BioInformatics
Download
Report
Transcript BioInformatics
BioInformatics
Applying principles of computer
science in a biological context
Outline
Biological Background Information
Problem Description
My Project
Previous Work
Senior Thesis
Biological Data Sets
Raw DNA sequences
Macromolecular structures
Genomes
Protein Sequences
What is a Protein Sequence?
A string of amino acids, each represented by a
single letter
There are 20 different amino acids
Typical proteins are about 300 amino acids long
…ILVKMUTANKVKMU…
Examples of Amino Acids
Importance of Protein Sequences
Compare two or more sequences
Determine similarities in their functions
Multiple Alignment
Serves as input for my analysis
Web-based programs available
Maximizes areas of similarity
Multiple Alignment Example
Shaded areas show
regions of exact
match.
A dash is placed in
the smaller protein
sequence to achieve
the alignment.
Redundancies in
each column are then
removed.
Evaluating Usefulness of a Primer
Similar does not always mean useful
Why? Different ways of creating amino acids
Amino acids coded by nucleotide triplets
A, T, C, G
Triplet = Codon
Degeneracy Example
Current Methods
Client: Biology Professor Steven Horton
Manual search of primers
Manual calculation of degeneracy
Requirements
Automate the task of finding primers
Automate degeneracy calculation
Record and organize results
Analyze data to make predictions
Pattern Matching
Data mining
Summer Research
Analyzed solutions made by Software
Engineering class in Spring 2003
Combined the good design features from each
project
Made a prototype in Java
Senior Thesis
Fall Term
Finished the prototype
Multiple window design
Made algorithm more efficient using dynamic
programming
The Prototype
Primer List Window
Inspection Window
Senior Thesis
Winter Term
Incorporated a system to record to analyze
results produced from finding primers
Utilized data mining tools
Data Mining:
Association Rules
Have the form LHSRHS
Interpretation: If every item in LHS occurs, then it
is likely that all of the items in RHS will also occur
Example:
LHS = protein sequence A contains primers 1, 2 & 3
RHS = protein sequence A contains primer 4 & 5
Application: Find Association Rules based on
Horton’s data collected about primers
Protein Database
The End