James-Smaldon
Download
Report
Transcript James-Smaldon
Bioinformatics and Data Mining with an Ant Colony Algorithm
The aim of the project is to discover new knowledge about post synaptic protein
activity with a new data mining method
Proteins are made up from a sequence of amino acids.
Predicting the functionality of a protein from its sequence is very difficult.
A synapse is the point where two nerve cells communicate with each other by transmission of a chemical known as neurotransmitter.
Study of post synaptic protein activity is important in understanding the nervous system.
Data mining is performed on a large set of bioinformatics data
Classification in data mining is the process of discovering, from training data, a set of rules predicting the class of a record.
In this project a record consists of data describing a protein, and the class to be predicted is the presence or absence of post-synaptic activity in
a protein. These rules can then be applied to new unclassified data to classify it.
Rules are of the form:
IF <Term1, Term2, ... ,Termn> THEN <class value>
An example of a rule predicting post synaptic activity:
IF (NEUROTR_ION_CHANNEL is present) THEN (post-synaptic activity is present)
The new data mining algorithm is based on the behaviour of
ants
To find the shortest path to food, or around an obstacle, ants release pheromones as they move towards
their goal, and other ants are attracted to this pheromone.
Although there may be more than one path to a goal, the shortest path will accumulate the largest
amount of pheromone within a given time, therefore becoming more attractive to subsequent ants.
An abstract model of this principle has been used in data mining to discover classification rules.
Results
The algorithm discovered a set of classification rules that were able to classify the data with an accuracy of over 98%.More importantly, the rules that were
discovered are very easily interpretable biologists as they have few terms and can be understood independently from one another. Creating comprehensible rules
is a key aim of data mining.
Mean Accuracy
Run (%)
Mean Rule
Count
Term To
Rule Ratio
1
98.26+-0.16
101.5+-0.19
1.21
2
98.26+-0.12
101.3+-0.17
1.21
This project was conducted by James Smaldon and Dr. Alex A.
Freitas, for more details please contact James Smaldon
([email protected])