A Tutorial of Sequence Matching and Alignment in Oracle

Download Report

Transcript A Tutorial of Sequence Matching and Alignment in Oracle

A Tutorial of Sequence
Matching in Oracle
Haifeng Ji* and Gang Qian**
*Oklahoma City Community College
** University of Central Oklahoma
Overview

Bioinformatics requires sequence matching


BLAST is one of the most widely-used tools
Oracle provides a Life Science Platform that
supports the BLAST functions



Oracle can manage a large amount of biological
data
The power of SQL can be utilized to build
complex queries for sequence matching
The BLAST interface in Oracle can help the rapid
implementation of bioinformatics applications
Introduction to BLAST

BLAST has algorithms to find similarities between
genome sequences (sequence homology )



BLAST is a heuristic algorithm for identifying local
alignments between sequences



Help to establish the evolutionary origin of genes
Help to predict protein structures and functions
A fast algorithm
measures the statistical significance of the alignment
scores with respect to a random sequence model
BLAST searches both nucleotide and amino acid
query sequences
Oracle BLAST Functions


BLAST is implemented in Oracle Database
10g
Oracle provides three MATCH functions:



BLASTN_MATCH -- nucleotide sequence
BLASTP_MATCH – amino acid sequences
TBLAST_MATCH -- sequence involving
translations
The Match Functions

The MATCH functions return the information below
for high scoring matches between a query sequence
and a sequence database
Attribute
Description
t_seq_id
Score
The identifier of the matched (target)
sequence
The score of the match
Value
The expected value
SAMPLE BLAST QUERY

BLASTN_MATCH()


The purpose of this table function is to perform a
BLASTN search of the ecoli_query sequence
against the selected portion of the ecoli10
nucleotide database
The database is selected using a standard SQL
select and passed into the function as a reference
cursor
Sample Query
Select * from TABLE(BLASTN_MATCH (
( select sequence from ecoli_query),
CURSOR(SELECT seq_id, seq_data FROM
ecoli10)))
Query Result
T_SEQ_ID
-------------1786181
SQL>
SCORE
---------560
EXPECT
----------0