BioInformatics

Download Report

Transcript BioInformatics

BioInformatics
Applying principles of computer
science in a biological context
Outline



Biological Background Information
Problem Description
My Project
Previous Work
 Senior Thesis

Biological Data Sets

Raw DNA sequences
Macromolecular structures
Genomes

Protein Sequences


What is a Protein Sequence?



A string of amino acids, each represented by a
single letter
There are 20 different amino acids
Typical proteins are about 300 amino acids long
…ILVKMUTANKVKMU…
Examples of Amino Acids
Importance of Protein Sequences



Compare two or more sequences
Determine similarities in their functions
Multiple Alignment
Serves as input for my analysis
 Web-based programs available
 Maximizes areas of similarity

Multiple Alignment Example
Shaded areas show
regions of exact
match.
A dash is placed in
the smaller protein
sequence to achieve
the alignment.
Redundancies in
each column are then
removed.
Evaluating Usefulness of a Primer



Similar does not always mean useful
Why? Different ways of creating amino acids
Amino acids coded by nucleotide triplets


A, T, C, G
Triplet = Codon
Degeneracy Example
Current Methods

Client: Biology Professor Steven Horton
Manual search of primers
 Manual calculation of degeneracy

Requirements




Automate the task of finding primers
Automate degeneracy calculation
Record and organize results
Analyze data to make predictions
Pattern Matching
 Data mining

Summer Research



Analyzed solutions made by Software
Engineering class in Spring 2003
Combined the good design features from each
project
Made a prototype in Java
Senior Thesis
Fall Term



Finished the prototype
Multiple window design
Made algorithm more efficient using dynamic
programming
The Prototype
Primer List Window
Inspection Window
Senior Thesis


Winter Term
Incorporated a system to record to analyze
results produced from finding primers
Utilized data mining tools
Data Mining:
Association Rules



Have the form LHSRHS
Interpretation: If every item in LHS occurs, then it
is likely that all of the items in RHS will also occur
Example:



LHS = protein sequence A contains primers 1, 2 & 3
RHS = protein sequence A contains primer 4 & 5
Application: Find Association Rules based on
Horton’s data collected about primers
Protein Database
The End