Transcript Document

Lecture 6
Some useful bits&pieces that
every Perl programmer should
know
1
Strictness
Use strict;
This is a directive that, among other things, tells perl to
insist on all veriables being declared before they are
used, as well as requiring that all subrutines be declared
(or defined) before they are invoked.
# bestrict - demonstrating the effect of strictness.
use strict;
$message = "This is the message.\n";
print $message;
# bestrict - demonstrating the effect of strictness.
use strict;
my $message = "This is the message.\n";
print $message;
2
Results from bestrict
Global symbol "$message" requires explicit
package name at bestrict line 7.
Global symbol "$message" requires explicit
package name at bestrict line 9.
Execution of bestrict aborted due to compilation
errors.
Use “my” To Fix bestrict :
# bestrict - demonstrating the effect of
strictness.
use strict;
my $message = "This is the message.\n";
print $message;
3
use subs
• Perl provides the use subs directive that
can be used in combination with use
strict to declare a list of subroutines at
the top of the program
use strict;
use sub qw( drawline biod2mysql);
4
Unless you have a really good reason not
to, always switch on strictness at the top of
your program
5
Perl One-Liners
Perl usually starts with the following line:
#! /usr/bin/perl –w
w: warning
-e switch checks whether a module installed
correctly or not:
$ perl -e 'use ExampleModule‘
e: execute
$ perl -e 'print "Hello from a Perl one-liner.\n";'
$ perl -e 'printf "%0.2f\n", 30000 * .12;'
The ability to use the –e switch on the command-line
in this way creates what is known in the perl
world as a one-liner.
6
Perl One-Liners: Equivalents
Another useful switch is –n, which, when used with –e, treats
the one-liner as if it is enclosed with a loop.
$ perl -ne 'print if /ctgaatagcc/;' embl.data
Which is equivalent to the following program statement:
while ( <> )
{
print if /ctgaatagcc/;
}
Same function can also be imlemented using grep:
$ grep 'ctgaatagcc' embl.data
7
Perl One-Liners: More Options
When –n is combined with –p, loop has a print statement added to the
end. Here is a one-liner that prints only those lines from the
embl.data disk-file that do not end in four digits:
$ perl -npe 'last if /\d{4}$/;' embl.data
equivalent perl program:
while ( <> )
{
last if /\d{4}$/;
}
continue {
print $_;
}
$ grep -v '[0123456789][0123456789][0123456789][0123456789]$' embl.data
8
Running Other Programs From perl
#! /usr/bin/perl -w
# pinvoke - demonstrating the invocation of other programs
# from Perl.
use strict;
my $result = system( “dir *.*" );
print "The result of the system call was as follows:\n$result\n";
$result = `dir *.*`;
print "The result of the backticks call was as follows:\n$result\n";
$result = qx/dir *.*/;
9
Results from pinvoke ...
10
Recovering From Errors
Consider the following code:
my $first_filename = "itdoesnotexist.txt";
open FIRSTFILE, "$first_filename"
or die "Could not open $first_filename. Aborting.\n";
Exeption handling by using eval:
eval {
my $first_filename = "itdoesnotexist.txt";
open FIRSTFILE, "$first_filename"
or die "Could not open $first_filename. Aborting.\n";
};
if ( $@ )
{
print "Calling eval produced this message: $@";
}
11
Use eval to protect potentially erroneous
code
12
Sorting
#! /usr/bin/perl -w
# sortexamples - how Perl's in-built sort subroutine works.
use strict;
my @sequences = qw( gctacataat attgttttta aattatattc cgatgcttgg );
print "Before sorting:\n\t-> @sequences\n";
my @sorted = sort @sequences;
my @reversed = sort { $b cmp $a } @sequences;
my @also_reversed = reverse sort @sequences;
print "Sorted order (default):\n\t-> @sorted\n";
print "Reversed order (using sort { \$b cmp \$a }):\n\t->
@reversed\n";
print "Reversed order (using reverse sort):\n\t->
@also_reversed\n";
13
Results from sortexamples ...
Before sorting:
-> gctacataat attgttttta aattatattc
cgatgcttgg
Sorted order (default):
-> aattatattc attgttttta cgatgcttgg
gctacataat
Reversed order (using sort { $b cmp $a }):
-> gctacataat cgatgcttgg attgttttta
aattatattc
Reversed order (using reverse sort):
-> gctacataat cgatgcttgg attgttttta
aattatattc
14
Another Sorting Example
my @chromosomes = qw( 17 5 13 21 1 2 22 15 );
print "Before sorting:\n\t-> @chromosomes\n";
@sorted = sort { $a <=> $b } @chromosomes;
@reversed = sort { $b <=> $a } @chromosomes;
print "Sorted order (using sort { \$a <=> \$b
}):\n\t-> @sorted\n";
print "Reversed order (using sort { \$b <=> \$a
}):\n\t-> @reversed\n";
15
And its results ...
Before sorting:
-> 17 5 13 21 1 2 22 15
Sorted order (using sort { $a
<=> $b }):
-> 1 2 5 13 15 17 21 22
Reversed order (using sort { $b
<=> $a }):
-> 22 21 17 15 13 5 2 1
16
The sortfile Program
#! /usr/bin/perl -w
# sortfile - sort the lines in any file.
use strict;
my @the_file;
while ( <> )
{
chomp;
push @the_file, $_;
}
my @sorted_file = sort @the_file;
foreach my $line ( @sorted_file )
{
print "$line\n";
}
17
Results from sortfile ...
Zap! Zoom! Bang! Bam!
Batman, look out!
Robin, behind you!
Aaaaah, it's the Riddler!
$ perl sortfile sort.data
Aaaaah, it's the Riddler!
Batman, look out!
Robin, behind you!
Zap! Zoom! Bang! Bam!
$ sort sort.data
18
Learning More About Sorting
$ perldoc -f sort
$ man sort
19
Take the time to become familiar with the
utilities included in the operating system
20
HERE Documents
Shotgun Sequencing
This is a relatively simple method of reading
a genome sequence. It is ''simple'' because
it does away with the need to locate
individual DNA fragments on a map before
they are sequenced.
The Shotgun Sequencing method relies on
powerful computers to assemble the finished
sequence.
21
Without HERE Documents
print
print
print
print
print
print
print
print
print
"Shotgun Sequencing\n\n";
"This is a relatively simple method of reading\n";
"a genome sequence. It is ''simple'' because\n";
"it does away with the need to locate\n";
"individual DNA fragments on a map before\n";
"they are sequenced.\n\n";
"The Shotgun Sequencing method relies on\n";
"powerful computers to assemble the finished\n";
"sequence.\n";
22
With HERE Documents
my $shotgun_message = <<ENDSHOTMSG;
Shotgun Sequencing
This is a relatively simple method of reading
a genome sequence. It is ''simple'' because
it does away with the need to locate
individual DNA fragments on a map before
they are sequenced.
The Shotgun Sequencing method relies on
powerful computers to assemble the finished
sequence.
ENDSHOTMSG
print $shotgun_message;
23
Even Better HERE Documents
print <<ENDSHOTMSG;
Shotgun Sequencing
This is a relatively simple method of reading
a genome sequence. It is ''simple'' because
it does away with the need to locate
individual DNA fragments on a map before
they are sequenced.
The Shotgun Sequencing method relies on
powerful computers to assemble the finished
sequence.
ENDSHOTMSG
24
Where To From Here
25
Databases
Learning to talk database
26
Introducing Databases
• A database is a collection of one or more
related tables
• A table is a collection of one or more rows
of data
• A row is a collection of one or more data
items, arranged in columns
27
Structured Data
1960-12-21
1954-6-14
P. Barry
M. Moorhouse
-------------Discovery_Date
-------------1960-12-21
1954-6-14
1970-3-4
2001-12-27
--------Scientist
--------P. Barry
M. Moorhouse
J. Blow
J. Doe
----------Column name
----------Discovery_Date
Scientist
---------------Type restriction
---------------a valid Date
a String no longer than 64 characters
28
Relating tables
-------------Discovery_Date
-------------1960-12-21
1954-6-14
sunglasses
1970-3-4
2001-12-27
--------Scientist
--------P. Barry
M. Moorhouse
--------Discovery
--------Flying car
Telepathic
J. Blow
J. Doe
Self cleaning child
Time travel
----------Column name
----------Discovery_Date
Scientist
Discovery
---------------Type restriction
---------------a valid Date
a String no longer than 64 characters
a String no longer than 128 characters
29
Relating tables, cont.
----------Column name
----------Discovery_Date
Scientist
Discovery
Date_of_birth
Telephone_number
---------------Type restriction
---------------a valid Date
a String no longer than 64 characters
a String no longer than 128 characters
a valid Date
a String no longer than 16 characters
-------------Discovery_Date
-------------1960-12-21
1954-6-14
1970-3-4
2001-12-27
1974-3-17
1999-12-31
--------Discovery
--------Flying car
Telepathic sunglasses
Self cleaning child
Time travel
Memory swapping toupee
Twenty six hour clock
--------Scientist
--------P. Barry
M. Moorhouse
J. Blow
J. Doe
M. Moorhouse
M. Moorhouse
------------Date_of_birth
------------1966-11-18
1970-3-24
1955-8-17
1962-12-1
1970-3-24
1958-7-12
---------------Telephone_number
---------------353-503-555-91910
00-44-81-555-3232
555-2837
00-44-81-555-3232
416-555-2000
30
Solving the one table problem
-------------------------Column name
Type restriction
-------------------------Discovery_Date a valid Date
Scientist_ID
a String no longer than 8 characters
Discovery
a String no longer than 128 characters
----------Column name
----------Scientist_ID
Scientist
Date_of_birth
Address
Telephone_number
---------------Type restriction
---------------a String no longer
a String no longer
a valid Date
a String no longer
a String no longer
than 8 characters
than 64 characters
than 256 characters
than 16 characters
31
Solving the one table problem, cont.
-------------Discovery_Date
-------------1954-6-14
1960-12-21
1969-8-1
1970-3-4
1974-3-17
1999-12-31
2001-12-27
-----------Scientist_ID
-----------JB
JD
MM
MM2
PB
--------Scientist
--------J. Blow
J. Doe
M. Moorhouse
M. Moorhouse
P. Barry
-----------Scientist_ID
-----------MM
PB
PB
JB
MM
MM2
JD
------------Date_of_birth
------------1955-8-17
1962-12-1
1970-3-24
1958-7-12
1966-11-18
--------Discovery
--------Telepathic sunglasses
Flying car
A cure for bad jokes
Self cleaning child
Memory swapping toupee
Twenty six hour clock
Time travel
------Address
------Belfast, NI
Syndey, AUS
England, UK
Toronto, CA
Carlow, IRL
---------------Telephone_number
---------------555-2837
00-44-81-555-3232
416-555-2000
353-503-555-91910
32
A little database design goes a long way.
33
Database system: a definition
• A database system is a computer
program (or group of programs) that
provides a mechanism to define and
manipulate one or more databases
34
Available Database Systems
• Personal database systems: Designed to run
on PCs
Access, Paradox, FileMaker, dBase
• Enterprise database systems: Designed to
support efficient storage and retrieval of vast
amount of data
Interbase, Ingres, SQL Server, Informix, DB2,
Oracle
• Open source database systems: Free!!! (Linux
of course!!!)
PostgreSQL, MySQL
35
SQL: The Language of Databases
•
Defining data with SQL (structured query
language)
• SQL provides two facilities:
1. A database definition Language (DDL)
provides a mechanism whereby databases
can be created
2. A Data Manipulation Language (DML)
provides a mechanism to work with data in
tables
36
A Database Case Study: MER
The Swiss Institute of Bioinformatics maintains SWISS-PROT, an
annotated protein sequence database. SWISS-PROT data format
is described in detail in the manual, which is awailable at:
http://www.expasy.org/sprot/userman.html
EMBL Nucleotide Sequence Database, maintained by the EMBL
Outstation at the EUROPİAN Bioinformatics Institute:
http://www.ebi.ac.uk/embl/Documentation/User_manual/home.html
37
Extracted Sample Data
ID
AC
DT
DT
DT
DE
MERT_ACICA STANDARD; PRT; 116 AA.
Q52106;
01-NOV-1997 (Rel. 35, Created)
01-NOV-1997 (Rel. 35, Last sequence update)
15-JUN-2002 (Rel. 41, Last annotation update)
Mercuric transport protein (Mercury ion transport
protein).
GN MERT.
OS Acinetobacter calcoaceticus.
OG Plasmid pKLH2.
OC Bacteria; Proteobacteria; Gammaproteobacteria;
Pseudomonadales;
OC Moraxellaceae; Acinetobacter.
.
38
Installing a database system
http://www.mysql.com
$ chkconfig --add mysqld
$ chkconfig mysqld on
$ mysqladmin -u root password 'passwordhere'
$ mysql -u root -p
39
Creating the MER database
mysql> create database MER;
Query OK, 1 row affected (0.36 sec)
mysql> show databases;
+------------+
| Databases |
+------------+
| MER
|
| test
|
| mysql
|
+------------+
3 rows in set (0.00 sec)
mysql> use mysql;
Database changed
mysql> grant all on MER.* to bbp identified by 'passwordhere';
Query OK. 0 rows affected (0.00 sec)
40
Adding tables to the MER
database
create table proteins
(
accession_number
code
species
last_date
description
sequence_header
sequence_length
sequence_data
)
varchar (6) not null,
varchar (4) not null,
varchar (5) not null,
date not null,
text not null,
varchar (75) not null,
int not null,
text not null
$ mysql -u bbp -p MER < create_proteins.sql
41
Understand the data before designing the
tables
42
Example SWISS-PROT data-files
acica_ADPT.swp.txt
serma_abdprt.swp.txt
shilf_seq_ACDP.swp.txt
43
Preparing SWISS-PROT data for
importation
$ ./get_proteins *swp* > proteins.input
44
Importing tab-delimited data
into proteins
$ mysql -u bbp -p MER
mysql> load data local infile "proteins.input" into table proteins;
Query OK, 14 rows affected (0.07sec)
Records: 14 Deleted: 0, Skipped: 0, Warnings: 0
45
Working with the data in proteins
mysql> select accession_number, sequence_length
-> from proteins;
mysql> select accession_number, sequence_length
-> from proteins
-> order by accession_number;
mysql> select accession_number, sequence_length
-> from proteins
-> where sequence_length > 200
-> order by sequence_length;
46
Adding another table to the
MER database
create table dnas
(
accession_number
entry_name
sequence_version
last_date
description
sequence_header
sequence_length
sequence_data
)
varchar (8) not null,
varchar (9) not null,
varchar (16) not null,
date not null,
text not null,
varchar (75) not null,
int not null,
text not null
$ mysql -u bbp -p MER < create_dnas.sql
47
Preparing EMBL data for importation
48
Example EMBL data-files
AF213017.EMBL.txt
J01730.embl.txt
M15049.embl.txt
M24940.embl.txt
$ ./get_dnas *EMBL* *embl* > dnas.input
49
Importing tab-delimited data
into dnas
mysql> load data local infile "dnas.input" into table
dnas;
Query OK, 4 rows affected (0.01sec)
Records: 4 Deleted: 0, Skipped: 0, Warnings: 0
50
Working with the data in dnas
mysql> select accession_number, sequence_length
-> from dnas
-> where sequence_length > 4000
-> order by sequence_length;
mysql> select accession_number, sequence_length
-> from dnas
-> where sequence_length > 4000
-> order by sequence_length desc;
mysql> select accession_number, entry_name,
51
Adding the crossrefs table to
the MER database
create table crossrefs (
ac_protein
varchar (6) not null,
ac_dna
varchar (8) not null
)
$ mysql -u bbp -p MER < create_crossrefs.sql
52
Preparing cross-references for
importation
$ ./get_protein_crossrefs *swp* > protein_crossrefs
$ ./get_dna_crossrefs *embl* *EMBL* > dna_crossrefs
$ ./unique_crossrefs protein_crossrefs dna_crossrefs >
unique.input
53
Importing tab-delimited data
into crossrefs
mysql> load data local infile "unique.input" into table
crossrefs;
Query OK, 22 rows affected (0.04 sec)
Records: 22 Deleted: 0 Skipped: 0 Warnings: 0
54
Working with the data in
crossrefs
mysql> select * from crossrefs;
mysql> select proteins.sequence_header, dnas.sequence_header
-> from proteins, dnas, crossrefs
-> where proteins.accession_number = crossrefs.ac_protein
-> and dnas.accession_number = crossrefs.ac_dna
-> order by proteins.sequence_header;
mysql> select proteins.code, proteins.species,
dnas.entry_name
-> from proteins, dnas, crossrefs
55
Working with the data in
crossrefs, cont.
mysql> select
-> proteins.code as 'Protein Code',
-> proteins.species as 'Protein Species',
-> dnas.entry_name as 'DNA Entry Name'
-> from proteins, dnas, crossrefs
-> where proteins.accession_number = crossrefs.ac_protein
-> and dnas.accession_number = crossrefs.ac_dna
-> order by proteins.code;
56
Adding the citations table to the
MER database
create table citations (
accession_number
number
author
title
location
annotation
)
varchar (8) not null,
int not null,
text not null,
text not null,
text not null,
text
$ mysql -u bbp -p MER < create_citations.sql
57
Preparing citation information for
importation
$ ./get_citations * > citations.input
58
Importing tab-delimited data
into citations
mysql> load data local infile "citations.input" into table
citations;
Query OK, 34 rows affected (0.08 sec)
Records: 34 Deleted: 0 Skipped: 0 Warnings: 0
59
Working with the data in
citations
mysql> select
-> proteins.code as 'Protein Code',
-> proteins.species as 'Protein Species',
-> dnas.entry_name as 'DNA Entry Name',
-> citations.location as 'Citation Location'
-> from proteins, dnas, crossrefs, citations
-> where proteins.accession_number =
crossrefs.ac_protein
-> and dnas.accession_number = crossrefs.ac_dna
-> and dnas.accession_number =
citations.accession_number
-> order by proteins.code;
60
The SELECT query can do no harm.
61
Where To From Here
62
Databases and Perl
Using Perl to talk to databases
63
Why Program Databases?
• Customised output handling
• Customized input handling
• Extending SQL
• Integrating MySQL into custom
applications
64
If at all possible, avoid the use of database
driver ``enhancements''
65
Preparing Perl
DBI and DBD::mysql modules need to be installed
$ man DBI
$ man DBD::mysql
$ find `perl -Te 'print "@INC"' ` -name '*.pm' -print | grep 'DBI.pm'
$ find `perl -Te 'print "@INC"' ` -name '*.pm' -print | grep
'mysql.pm'
$ locate DBI.pm
$ locate mysql.pm
66
Checking the DBI installation
#! /usr/bin/perl -w
# check_drivers - check which drivers are installed with DBI.
use strict;
use DBI;
my @drivers = DBI->available_drivers;
foreach my $driver ( @drivers )
{
print "Driver: $driver installed.\n";
}
67
Programming Databases With DBI
#! /usr/bin/perl -w
# show_tables - list the tables within the MER database.
# Uses "DBI::dump_results" to display results.
use strict;
use DBI qw( :utils );
use constant DATABASE => "DBI:mysql:MER";
use constant DB_USER => "bbp";
use constant DB_PASS => "passwordhere";
my $dbh = DBI->connect( DATABASE, DB_USER, DB_PASS )
or die "Connect failed: ", $DBI::errstr, ".\n";
my $sql = "show tables";
my $sth = $dbh->prepare( $sql );
$sth->execute;
68
• DATABASE – Identifies the data source
to use
• DB_USER – Identifies the username to
use when connecting to the data source
• DB_PASS – Identifies the password to
use when authenticating to the data
source
69
Be sure to adhere to any established
naming conventions within a programming
community
70
Results from show_tables ...
$ chmod +x show_tables
$ ./show_tables
'citations'
'crossrefs'
'dnas'
'proteins'
4 rows
4
71
Avoid littering programs with
username/password combinations
72
Developing A Database Utility
Module
package DbUtilsMER;
# DbUtilsMER.pm - the database utilities module from
"Bioinformatics,
# Biocomputing and Perl".
#
# Version 0.01: module created to hold MERconnectDB.
require Exporter;
our @ISA = qw( Exporter );
our @EXPORT = qw( MERconnectDB );
our @EXPORT_OK = qw();
our %EXPORT_TAGS = ();
our $VERSION = 0.01;
73
Developing A Database Utility
Module, cont.
use constant DATABASE => "DBI:mysql:MER";
use constant DB_USER => "bbp";
use constant DB_PASS => "passwordhere";
sub MERconnectDB {
#
# Given: nothing.
# Return: a "connected" database handle to the MER database or
# if no connection is possible, return "undef".
#
return DBI->connect( DATABASE, DB_USER, DB_PASS );
}
1;
74
Using the Database Utility
Module
use lib "$ENV{'HOME'}/bbp/";
use DbUtilsMER;
my $dbh = MERconnectDB
or die "Connect failed: ", $DBI::errstr, ".\n";
use constant DATABASE => "DBI:mysql:MER";
use constant DB_USER => "bbp";
use constant DB_PASS => "passwordhere";
my $dbh = DBI->connect( DATABASE, DB_USER, DB_PASS )
or die "Connect failed: ", $DBI::errstr, ".\n";
75
Improving upon dump_results
#! /usr/bin/perl -w
# show_tables2 - list the tables within the MER database.
# Uses "fetchrow_array" to display results.
use strict;
use DBI;
use lib "$ENV{'HOME'}/bbp/";
use DbUtilsMER;
my $dbh = MERconnectDB
or die "Connect failed: ", $DBI::errstr, ".\n";
76
Improving upon dump_results,
cont.
$sth->execute;
print "The MER database contains the following tables:\n\n";
while ( my @row = $sth->fetchrow_array )
{
foreach my $column_value ( @row )
{
print "\t$column_value\n";
}
}
$sth->finish;
$dbh->disconnect;
77
Results from show_tables2 ...
The MER database contains the following tables:
citations
crossrefs
dnas
proteins
78
Customizing Output
There are 22 cross references in the database.
The
The
The
The
The
The
The
The
The
The
The
protein
protein
protein
protein
protein
.
.
protein
protein
protein
protein
protein
protein
P04336
P08332
P08654
P20102
Q52107
is
is
is
is
is
cross
cross
cross
cross
cross
referenced
referenced
referenced
referenced
referenced
with
with
with
with
with
J01730.
J01730.
M15049.
X03405.
AF213017.
P03830
Q52109
P20102
Q52106
P04337
P08664
is
is
is
is
is
is
cross
cross
cross
cross
cross
cross
referenced
referenced
referenced
referenced
referenced
referenced
with
with
with
with
with
with
J01730.
AF213017.
J01730.
AF213017.
K03089.
M15049.
79
Customizing Output, cont.
#! /usr/bin/perl -w
# which_crossrefs - nicely displayed list of protein->dna
# cross references.
use strict;
use DBI;
use lib "$ENV{'HOME'}/bbp/";
use DbUtilsMER;
my $dbh = MERconnectDB
or die "Connect failed: ", $DBI::errstr, ".\n";
my $sql = "select * from crossrefs";
my $sth = $dbh->prepare( $sql );
80
Customizing Output, cont.
$sth->execute;
print "There are ", $sth->rows, " cross references in the
database.\n\n";
while ( my @row = $sth->fetchrow_array )
{
print "The protein $row[0] is cross referenced with $row[1].\n";
}
$sth->finish;
$dbh->disconnect;
81
Alternatives to fetchrow_array
while ( my ( $protein, $dna ) = $sth->fetchrow_array )
{
print "The protein $protein is cross referenced with $dna.\n";
}
while ( my $row = $sth->fetchrow_hashref )
{
print "The protein $row->{ ac_protein } is cross referenced ";
print "with $row->{ ac_dna }.\n";
}
82
Use fetchrow_hashref to guard against
changes to the structure of a database
table
83
Customizing Input
Provide a protein accession number to cross reference ('quit' to end): p03377
Not found: there is no cross reference for that protein in the database.
Provide a protein accession number to cross reference ('quit' to end): p04337
Found: P04337 is cross referenced with J01730.
Provide a protein accession number to cross reference ('quit' to end): q52109
Found: Q52109 is cross referenced with AF213017.
Provide a protein accession number to cross reference ('quit' to end): x6587
Not found: there is no cross reference for that protein in the database.
Provide a protein accession number to cross reference ('quit' to end): quit
84
Customizing Input
#! /usr/bin/perl -w
# specific_crossref - allow for the "interactive" checking
# of crossrefs from the command-line.
# Keep going until the user enters "quit".
use strict;
use DBI;
use lib "$ENV{'HOME'}/bbp/";
use DbUtilsMER;
use constant TRUE => 1;
my $dbh = MERconnectDB
or die "Connect failed: ", $DBI::errstr, ".\n";
my $sql = qq/ select ac_dna from crossrefs where ac_protein = ? /;
my $sth = $dbh->prepare( $sql );
85
Customizing Input, cont.
while ( TRUE )
{
print "\nProvide a protein accession number to cross ";
print "reference ('quit' to end): ";
my $protein2find = <>;
chomp $protein2find;
$protein2find = uc $protein2find;
if ( $protein2find eq 'QUIT' )
{
last;
}
86
Customizing Input, cont.
$sth->execute( $protein2find );
my $dna = $sth->fetchrow_array;
$sth->finish;
if ( !$dna )
{
print "Not found: there is no cross reference for that
protein ";
print "in the database.\n";
}
else
{
print "Found: $protein2find is cross referenced with
$dna.\n";
}
}
87
Extending SQL
#! /usr/bin/perl -w
# The 'db_match_embl' program - check a sequence against each EMBL
# database entry stored in the dnas
# table within the MER database.
use
use
use
use
use
use
strict;
DBI;
lib "$ENV{'HOME'}/bbp/";
DbUtilsMER;
constant TRUE => 1;
constant FALSE => 0;
my $dbh = MERconnectDB
or die "Connect failed: ", $DBI::errstr, ".\n";
my $sql = qq/ select accession_number, sequence_data,
sequence_length from dnas /;
my $sth = $dbh->prepare( $sql );
88
Extending SQL, cont.
while ( TRUE )
{
my $sequence_found = FALSE;
print "Please enter a sequence to check ('quit' to end): ";
my $to_check = <>;
chomp( $to_check );
$to_check = lc $to_check;
if ( $to_check =~ /^quit$/ )
{
last;
}
$sth->execute;
while ( my ( $ac, $sequence, $sequence_length ) = $sth>fetchrow_array )
{
$sequence =~ s/\s*//g;
89
Extending SQL, cont.
if ( $sequence =~ /$to_check/ )
{
$sequence_found = TRUE;
print "The EMBL entry in the database: ",
$ac, " contains: $to_check.\n";
print "[Lengths: ", length $sequence,
"/$sequence_length]\n\n";
}
}
if ( !$sequence_found )
{
print "No match found in database for: $to_check.\n\n";
}
$sth->finish;
}
$dbh->disconnect;
90
Results from db_match_embl ...
Please enter a sequence to check ('quit' to end): aattgc
The EMBL entry in the database: AF213017 contains: aattgc.
[Lengths: 6838/6838]
Please enter a sequence to check ('quit' to end): aatttc
The EMBL entry in the database: AF213017 contains: aatttc.
[Lengths: 6838/6838]
The EMBL entry in the database: J01730 contains: aatttc.
[Lengths: 5747/5747]
Please enter a sequence to check ('quit' to end):
accttaaatttgtacgtg
No match found in database for: accttaaatttgtacgtg.
Please enter a sequence to check ('quit' to end): quit
91
Where To From Here
92