Transcript Arrays

96-Summer
生物資訊程式設計實習(二)
Bioinformatics with Perl
8/13~8/22 蘇中才
8/24~8/29 張天豪
8/31 曾宇鳯
1
Schedule
Date
Time
Subject
8/13 一 13:30~17:30 Perl Basics
8/15 三 13:30~17:30 Programming Basics
8/17 五 13:30~17:30 Regular expression
Speaker
蘇中才
蘇中才
蘇中才
8/20 一 13:30~17:30 Retrieving Data from Protein
蘇中才
Sequence Database
8/22 三 13:30~17:30 Perl combines with Genbank, BLAST 蘇中才
8/24 五 13:30~17:30 PDB database and structure files
張天豪
8/27 一 8:30~12:30 Extracting ATOM information
張天豪
8/27 一 13:30~17:30 Mapping of Protein Sequence IDs and 張天豪
Structure IDs
8/31五 13:30~17:30 Final and Examination
2
曾宇鳳
Reference Books

Learning Perl
(Perl 學習手冊)

Beginning Perl for Bioinformatics

Bioinformatics Biocomputing
and Perl: An Introduction to
Bioinformatics Computing Skills
and Practice
3
4
Learning Perl
5
Perl
Practical Extraction and Report Language
 Created by Larry Wall in the middle 1980`s.
 Suitable for “quick-and-dirty”
 Suitable for string-handling
 Powerful regular expression

6
Preparation
Downloading putty.exe / pietty.exe
 Getting materials for this course:



http://gene.csie.ntu.edu.tw/~sbb/summer-course/
Server:



ssh 140.112.28.186
Id : course1 ~ course20
Password:
7
Installing Perl on Windows

Download package from
 http://www.activestate.com/
 http://downloads.activestate.com/ActivePerl/Windows/
5.8/ActivePerl-5.8.8.820-MSWin32-x86-274739.msi

Versions of Perl
 Unix,
Linux, Windows (ActivePerl), Mac (MacPerl)
 http://www.perl.com/
8
Text Editors




A convenient (text) editor for programming
Ultraedit: good for me
Notepad: just an editor
Vim: UNIX/Linux lover
 http://lpi.indicator-online.net/vim.html
 http://homepage.ttu.edu.tw/u9106240/page_main/vim
_menu.html

Joe : easy to use for Unix beginner
9
Finding Help


Best resource finding tool –
On-line Resources, use



http://www.perl.com/
http://www.perl.org/
http://www.cpan.org/
HTML Help in ActivePerl
Command Line (highly recommended)




perldoc
perldoc
perldoc
perldoc
–f <function>
–q <faqkeywork>
<module>
perldoc
# search function
# search FAQ
# search module
10
Perl Basic
Starting
11
Program: run thyself!
$ vi welcome
#! /usr/bin/perl -w
print “Hello, world\n”;
$ chmod +x welcome
$ ./welcome
Hello, world
$ perl welcome
Hello, world
[sbb@gene perl]$ ls -al
-rw-rw-r-- 1 sbb sbb 20 Jul 2 15:27 welcome
[sbb@gene perl]$ chmod +x welcome
[sbb@gene perl]$ ls -al
-rwxrwxr-x 1 sbb sbb 20 Jul 2 15:27 welcome
12
Using the Perl while construct
#! /usr/bin/perl -w
# The 'forever' program - a (Perl) program,
# which does not stop until someone presses Ctrl-C.
use constant TRUE => 1;
use constant FALSE => 0;
while ( TRUE )
{
print "Welcome to the Wonderful World of Bioinformatics!\n";
sleep 1;
}
13
Running forever ...
$ chmod +x forever
$ ./forever
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
.
.
14
Perl Basic
Variables
15
Variables

Scalar ($)
 Number

1; 1.23; 12e34
 String

“abc”; ‘ABC’ ; “Hello, world!”;

Array / List (@)

Hash (%)
16
Introducing variable containers
The simplest type of variable container is
the scalar (純量).
 In Perl, scalars can hold, for example, a
number, a word, a sentence or a disk-file.

$name
$_address
$programming_101
$z
$abc
$swissprot_to_interpro_mapping
$SwissProt2InterProMapping
Variable naming is ART !
17
scalar
#!/usr/bin/perl -w
# lower case for user defined ; upper case for system default
my $ARGV = “example.pl";
my $number = 1.2;
my $string = "Hello, world!";
my $123 = 123;
#error
my $abc = "123";
my $_123 = '123';
my $O000OoO00 = 1;
my $OO00Oo000 = 2;
my $OO00OoOOO = 3;
$abc = $O000OoO00 * $OO00Oo000 - $OO00OoOOO;
print $abc x 4 . "\n";
print 5 x 4 . "\n";
print 5 * 4 . "\n";
18
Number

Format (range: 1e-100 ~ 1e100 ?)
 2000
 1.25
 -6.5e45
(-6.5*10^45)
 123456789
 123_456_789

Other format
 0377
#octal (decimal 255)
 0xFF
#hexadecimal
 0b11111111 #binary
19
number
$integer = 12;
$real = 12.34;
$oct = 0377;
$bin = 0b11111111;
$hex = 0xff;
$long = 123456789;
$long_ = 123_456_789;
$large = 1E100;
$small = 1E-100;
#1E200
#1E-200
print "integer : $integer\n";
print "real : $real\n";
print "oct=$oct bin=$bin hex=$hex\n";
#printf("oct=0%o bin=0b%b hex=0x%x\n",$oct,$bin,$hex);
20
parameters of printf (ref : number)
specifier
c
d or i
e
E
f
g
G
o
s
u
x
X
p
n
%
Output
Character
Signed decimal integer
Scientific notation (mantise/exponent) using e character
Scientific notation (mantise/exponent) using E character
Decimal floating point
Use the shorter of %e or %f
Use the shorter of %E or %f
Signed octal
String of characters
Unsigned decimal integer
Unsigned hexadecimal integer
Unsigned hexadecimal integer (capital letters)
Pointer address
Nothing printed. The argument must be a pointer to a signed int, where
the number of characters written so far is stored.
Example
a
392
3.9265e+2
3.9265E+2
392.65
392.65
392.65
610
sample
7235
7fa
7FA
B800:0000
A % followed by another % character will write % to stdout.
21
operator
2+3
 5.1 – 2.4
 3 * 12
 14 / 2
 10.2 / 0.3
 10 / 3
 10 % 3

#5
#2.7
#36
#7
#34
#3.333…
#1
22
Operator
Operator
Function
Operator
Function
=
Normal Assignment
Subtraction, Negative
Numbers, Unary
Negation
+=
Add and Assign
-=
Subtract and Assign
*
Multiplication
*=
Multiply and Assign
/
Division
/=
Divide and Assign
%
Modulus
%=
Modulus and Assign
**
Exponent
**=
Exponent and Assign
+
Addition
-
$number = $number + 100;
$number += 100;
23
Take a break …

modulus
 10.5

% 3.2 = ?
exponentiation
 2^3
=?
24
string

Format
 Single quotes
 ‘hello’
 ‘hello\nhello’
 ‘hello,$name’
#!/usr/bin/perl –w
print ‘hello’;
print “hello”;
 Double quotes
 “hello”
 “hello\nhello”
 “hello,$name”

Exceptions
 ‘\’\\’
 “\”\\”
25
Backslash escapes
Escape
Description or Character
Sequences
Escape
Sequences
\b
\e
\f
\n
\r
Backspace
\t
Tab
\u
\v
\$
Vertical Tab
\\
Escape
Form Feed
New line
Carriage Return
\@
\0nnn
\xnn
\cn
\l
Description or Character
Ampersand
Any Octal byte
Any Hexadecimal byte
Any Control character
Change the next
character to lowercase
Change the next
character to uppercase
Backslash
Dollar Sign
26
conversion between String and
number






$answer = “Hello ” . “ “ . “ world\n”;
$answer = “12” . “3”;
$answer = “12” * “3”;
$answer = “12Hello34” * “3”;
$answer = “A” . 3*5;
$answer = “A” x (3*5);
#warning !!!
$answer = “12”x”3”;
27
Variable containers and loops
#! /usr/bin/perl -w
# The 'tentimes' program - a (Perl) program,
# which stops after ten iterations.
use constant HOWMANY => 10;
$count = 0;
while ( $count < HOWMANY )
{
print "Welcome to the Wonderful World of Bioinformatics!\n";
$count++;
}
28
Running tentimes ...
$ chmod +x tentimes
$ ./tentimes
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
Welcome to the Wonderful World of Bioinformatics!
29
Using the Perl if construct
#! /usr/bin/perl -w
# The 'fivetimes' program - a (Perl) program,
# which stops after five iterations.
use constant TRUE => 1;
use constant FALSE => 0;
use constant HOWMANY => 5;
$count = 0;
while ( TRUE )
{
$count++;
print "Welcome to the Wonderful World of Bioinformatics!\n";
if ( $count == HOWMANY )
{
last;
}
}
30
The oddeven program
#! /usr/bin/perl -w
# The 'oddeven' program.
use constant HOWMANY => 4;
$count = 0;
while ( $count < HOWMANY )
{
$count++;
if ( $count % 2 == 0 )
{
print “$count : even\n";
}
else # $count % 2 is not zero.
{
print “$count : odd\n";
}
}
31
Comparison operator
Comparison
Number
String
Equal
==
eq
Not equal
!=
ne
Less than
<
lt
Greater than
>
gt
Less than or equal
<=
le
Greater than or equal
>=
ge
Comparison
<=>
cmp
32
Variable Interpolation
#! /usr/bin/perl -w
# The ‘interpolation' program which interpolate variables by variable.
$language = “Perl”;
$string = “I love $language”; print $string.”\n”;
$string = ‘I love $language”; print $string.”\n”;
$string = ‘I love ‘.$language; print $string.”\n”;
$string = “I love \$language”; print $string.”\n”;
$string = “I love $languages”; print $string.”\n”;
#${language}s
33
Arrays: Associating Data With
Numbers
@list_of_sequences
@totals
@protein_structures
( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' )
@list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );
34
The @list_of_sequences Array
35
Working with array elements
print "$list_of_sequences[1]\n";
GCTCAGTTCT
$list_of_sequences[1] = 'CTATGCGGTA';
$list_of_sequences[3] = 'GGTCCATGAA';
36
The Grown @list_of_sequences Array
37
How big is the array?
print "The array size is: ", $#list_of_sequences+1, ".\n";
print "The array size is: ", scalar @list_of_sequences, ".\n";
The array size is: 4.
38
Adding elements to an array
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );
@sequences = ( @sequences, 'CTATGCGGTA' );
print "@sequences\n";
TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );
@sequences = ( 'CTATGCGGTA' );
print "@sequences\n";
CTATGCGGTA
39
Adding more elements to an
array
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );
@sequences = ( @sequences, ( 'CTATGCGGTA', 'CTATTATGTC' ) );
print "@sequences\n";
TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA CTATTATGTC
@sequence_1 = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );
@sequence_2 = ( 'GCTCAGTTCT', 'GACCTCTTAA' );
@combined_sequences = ( @sequence_1, @sequence_2 );
print "@combined_sequences\n";
TTATTATGTT GCTCAGTTCT GACCTCTTAA GCTCAGTTCT GACCTCTTAA
40
Removing elements from an
array
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',
'TTATTATGTT' );
@removed_elements = splice @sequences, 1, 2;
print "@removed_elements\n";
print "@sequences\n";
GCTCAGTTCT GACCTCTTAA
TTATTATGTT TTATTATGTT
#clean all elements of an array
@sequences = ();
41
The slices program
#! /usr/bin/perl -w
# The 'slices' program - slicing arrays.
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',
'CTATGCGGTA', 'ATCTGACCTC' );
print "@sequences\n\n";
@seq_slice = @sequences[ 1 .. 3 ];
print "@seq_slice\n";
print "@sequences\n\n";
@removed = splice @sequences, 1, 3;
print "@sequences\n";
print "@removed\n";
42
Results from slices ...
TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC
GCTCAGTTCT GACCTCTTAA CTATGCGGTA
TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC
TTATTATGTT ATCTGACCTC
GCTCAGTTCT GACCTCTTAA CTATGCGGTA
43
Processing every element in an
array
#! /usr/bin/perl -w
# The 'iterateW' program - iterate over an entire array
# with 'while'.
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',
'CTATGCGGTA', 'ATCTGACCTC' );
$index = 0;
$last_index = $#sequences;
while ( $index <= $last_index )
{
print "$sequences[ $index ]\n";
++$index;
}
44
Results from iterateW ...
TTATTATGTT
GCTCAGTTCT
GACCTCTTAA
CTATGCGGTA
ATCTGACCTC
45
The iterateF program
#! /usr/bin/perl -w
# The 'iterateF' program - iterate over an entire array
# with 'foreach'.
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',
'CTATGCGGTA', 'ATCTGACCTC' );
foreach $value ( @sequences )
{
print "$value\n";
}
46
Making lists easier to work with
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',
'CTATGCGGTA', 'ATCTGACCTC' );
@sequences = ( TTATTATGTT, GCTCAGTTCT, GACCTCTTAA,
CTATGCGGTA, ATCTGACCTC );
@sequences = qw( TTATTATGTT GCTCAGTTCT GACCTCTTAA
CTATGCGGTA ATCTGACCTC );
47
Quoted words
#!/usr/bin/perl -w
# The ‘quoted_words’ program
@list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );
@list_of_sequences = qw/TTATTATGTT GCTCAGTTCT GACCTCTTAA/;
@list_of_sequences = qw{TTATTATGTT GCTCAGTTCT GACCTCTTAA};
@list_of_sequences = qw!TTATTATGTT GCTCAGTTCT GACCTCTTAA!;
@list_of_sequences = qw[TTATTATGTT GCTCAGTTCT GACCTCTTAA];
@list_of_sequences = qw<TTATTATGTT GCTCAGTTCT GACCTCTTAA>;
@list_of_sequences = qw#TTATTATGTT GCTCAGTTCT GACCTCTTAA#;
print "@list_of_sequences\n";
print "The array size is: ", $#list_of_sequences+1, ".\n";
48
pop/push/shift/unshift
#!/usr/bin/perl -w
#The “array_operator” program
@array = 5..9;
print "array = [@array]\n";
$item = pop @array;
print "item = [$item]\n";
print "array = [@array]\n";
push @array, 9;
print "array = [@array]\n";
$item = shift @array;
print "item = [$item]\n";
print "array = [@array]\n";
unshift @array, 1..5;
print "array = [@array]\n";
49
pop/push/shift/unshift
array = [5 6 7 8 9]
==========pop==========
item = [9]
array = [5 6 7 8]
==========push 9==========
array = [5 6 7 8 9]
==========shift==========
item = [5]
array = [6 7 8 9]
==========unshift 1..5==========
array = [1 2 3 4 5 6 7 8 9]
50
reverse / sort
#!/usr/bin/perl -w
#The “array_operator1” program
@array = qw /5 4 9 8 1 3 6 2 7 10/;
print "array = [@array]\n";
@array_reverse = reverse @array;
print "reverse array = [@array_reverse]\n";
@array_sorted = sort @array;
print "sort array = [@array_sorted]\n";
@array_reversesorted = reverse sort @array;
print "reverse sort array = [@array_reversesorted]\n";
@array_sortedreverse = sort reverse @array;
print "sort reverse array = [@array_sortedreverse]\n";
51
reverse / sort
array = [5 4 9 8 1 3 6 2 7 10]
========================================
reverse array = [10 7 2 6 3 1 8 9 4 5]
========================================
sort array = [1 10 2 3 4 5 6 7 8 9]
========================================
reverse sort array = [9 8 7 6 5 4 3 2 10 1]
========================================
sort reverse array = [1 10 2 3 4 5 6 7 8 9]
52
split/join
#!/usr/bin/perl -w
#The “array_operator2” program - join / split
$string = "5 4 9 8 1 3 6 2 7 10";
@array = split/ /, $string;
print "array = [@array]\n";
$string = join ",", @array;
print "array = [$string]\n";
array = [5 4 9 8 1 3 6 2 7 10]
array = [5,4,9,8,1,3,6,2,7,10]
53
How to map between IP and
domain name ?
IP
140.112.28.186
140.112.28.191
140.112.28.190
Domain name
gene.csie.ntu.edu.tw
biominer.csie.ntu.edu.tw
knn.csie.ntu.edu.tw
54
Use 2 array to map between IP and
domain name ?
@IP
140.112.28.186 [0]
140.112.28.191 [1]
140.112.28.190 [2]
@Domain_name
[0]
gene.csie.ntu.edu.tw
biominer.csie.ntu.edu.tw [1]
[2]
knn.csie.ntu.edu.tw
55
How to search a certain ip or
domain name ?
@IP
140.112.28.186 [0]
140.112.28.191 [1]
140.112.28.190 [2]
@Domain_name
[0]
gene.csie.ntu.edu.tw
biominer.csie.ntu.edu.tw [1]
[2]
knn.csie.ntu.edu.tw
56
Why Hash ?
Key
[140.112.28.186]
[140.112.28.191]
[140.112.28.190]
Value
%Domain_name
gene.csie.ntu.edu.tw
biominer.csie.ntu.edu.tw
knn.csie.ntu.edu.tw
57
How to get a certain domain name?
Key
[140.112.28.186]
[140.112.28.191]
[140.112.28.190]
Value
%Domain_name
gene.csie.ntu.edu.tw
biominer.csie.ntu.edu.tw
knn.csie.ntu.edu.tw
$Domain_name{“140.112.28.186”}
58
Examples of Hash
59
Hashes: Associating Data
With Words
%nucleotide_bases
%nucleotide_bases = ( A, Adenine, T, Thymine );
%nucleotide_based = ( A => Adenine,
T => Thymine);
key
value
60
Working with hash entries
print "The expanded name for 'A' is
$nucleotide_bases{ 'A' }\n";
The expanded name for 'A' is Adenine
61
How big is the hash?
%nucleotide_bases = ( A, Adenine, T, Thymine );
@hash_names = keys %nucleotide_bases;
print "The names in the %nucleotide_bases hash are: @hash_names\n";
The names in the %nucleotide_bases hash are: A T
%nucleotide_bases = ( A, Adenine, T, Thymine );
$hash_size = keys %nucleotide_bases;
print "The size of the %nucleotide_bases hash is: $hash_size\n";
The size of the %nucleotide_bases hash is: 2
62
Adding entries to a hash
$nucleotide_bases{ 'G' } = 'Guanine';
$nucleotide_bases{ 'C' } = 'Cytosine';
%nucleotide_bases = ( A => Adenine, T => Thymine,
G => Guanine, C => Cytosine );
63
The Grown %nucleotide_bases Hash
64
Removing entries from a hash
delete $nucleotide_bases{ ‘C' };
$nucleotide_bases{ 'C' } = undef;
65
Slicing hashes
#! /usr/bin/perl -w
# The ‘slicing_hashes' program – extract a certain subset among a hash
%gene_counts = ( Human => 31000,
'Thale cress' => 26000,
'Nematode worm' => 18000,
'Fruit fly' => 13000,
Yeast => 6000,
'Tuberculosis microbe' => 4000 );
@counts = @gene_counts{ Human, “Fruit fly”, 'Tuberculosis microbe' };
print "@counts\n";
31000 13000 4000
66
Working with hash entries: a
complete example
#! /usr/bin/perl -w
# The 'bases' program - a hash of the nucleotide bases.
%nucleotide_bases = ( A => Adenine, T => Thymine,
G => Guanine, C => Cytosine );
$sequence = 'CTATGCGGTA';
print "\nThe sequence is $sequence, which expands to:\n\n";
while ( $sequence =~ /(.)/g )
{
print "\t$nucleotide_bases{ $1 }\n";
}
67
Results from bases ...
The sequence is CTATGCGGTA, which expands to:
Cytosine
Thymine
Adenine
Thymine
Guanine
Cytosine
Guanine
Guanine
Thymine
Adenine
68
Processing every entry in a
hash
#! /usr/bin/perl -w
# The 'genes' program - a hash of gene counts.
use constant LINE_LENGTH => 60;
%gene_counts = ( Human => 31000,
'Thale cress' => 26000,
'Nematode worm' => 18000,
'Fruit fly' => 13000,
Yeast => 6000,
'Tuberculosis microbe' => 4000 );
69
The genes program, cont.
print '-' x LINE_LENGTH, "\n";
while ( ( $genome, $count ) = each %gene_counts )
{
print "`$genome' has a gene count of $count\n";
}
print '-' x LINE_LENGTH, "\n";
foreach $genome ( sort keys %gene_counts )
{
print "`$genome' has a gene count of $gene_counts{ $genome }\n";
}
print '-' x LINE_LENGTH, "\n";
70
Results from genes ...
-----------------------------------------------------------'Human' has a gene count of 31000
'Tuberculosis microbe' has a gene count of 4000
'Fruit fly' has a gene count of 13000
'Nematode worm' has a gene count of 18000
'Yeast' has a gene count of 6000
'Thale cress' has a gene count of 26000
-----------------------------------------------------------'Fruit fly' has a gene count of 13000
'Human' has a gene count of 31000
'Nematode worm' has a gene count of 18000
'Thale cress' has a gene count of 26000
'Tuberculosis microbe' has a gene count of 4000
'Yeast' has a gene count of 6000
-----------------------------------------------------------71
How to sort by the values
?
72
Exercise
Protein sequences
73
FASTA format
>P53_HUMAN (P04637) Cellular tumor antigen p53 (Tumor
suppressor p53) (Phosphoprotein p53) (Antigen NY-CO-13) Homo sapiens (Human).
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP
DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK
SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE
RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP
PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG
GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
74
Read a FASTA file
#!/usr/bin/perl -w
my ( $line, $queryname, $queryseq );
while ( $line = <> )
{
if ( $line =~ />(.+?)\s.+/)
{
$queryname = $1 ;
}
else
{
chomp $line;
$queryseq = $queryseq . $line;
}
}
75
Exercise
 Read more then one sequence
 Store the protein names and sequences
from disorder.fa by 2 array
 Show all of protein names and
sequences.
 Show the number of proteins and
residues.
($len = length $seq;)
76
Exercise
 Read more then one sequence
 Store the protein names and sequences
from disorder.fa by a hash
 Show the protein names and sequences
sorted by protein name
 Find the longest sequence
77