powerpoint slides (perl) - UCLA Chemistry and Biochemistry

Download Report

Transcript powerpoint slides (perl) - UCLA Chemistry and Biochemistry

Welcome to lecture 3:
An introduction to programming in PERL
IGERT – Sponsored Bioinformatics Workshop Series
Michael Janis and Max Kopelevich, Ph.D.
Dept. of Chemistry & Biochemistry, UCLA
Last time…
• We covered a bit of material…
• Try to keep up with the reading – it’s all in there!
• How’s it coming along?
– regex examples? (TATA box, palindrome)…
• > grep -E --color 'TA(TAAA|TAAT|TATT|ATAA|ATAT)' *.fsa
• > grep
-E --color '(.)(.).\2\1'
– Using emacs?
– Let’s ignore the long version of the prosite match for
now… we’ll deal with that soon…
Shell scripting is useful, but…
It does not port or scale well; complex data structures may be
somewhat challenging. Having said that,
Shell scripting skills have many applications, including:
– Ability to automate tasks, such as
•
•
•
•
Backups
Administration tasks
Periodic operations on a database via cron
Any repetetive operations on files
– Increase your general knowledge of UNIX
• Use of environment
• Use of UNIX utilities
• Use of features such as pipes and I/O redirection
For bioinformatics, we need a fully
featured programming language
There’s a problem with our search of fasta files – can you guess what?
We’ll be dealing with this using a programming language with arbitrarily
complex data structures
Perl is a scriptable, portable, interpreted and compiled language:
– Scriptable and portable and networks well
•
•
•
•
The code remains in text format
The code is interpreted and compiled at runtime
The interpreter has been written for use on every (?) platform
Can control a vast number of other devices (files, programs, either local or
remote)
– Drawbacks of the language
• Since it’s compiled to C code, it will always run slower than C code
• There’s a double edged sword called TMTOWTDI
• Not truly OO; not the most elegant language for algorithm implementation
(arguable!)
PERL: starting point for
bioinformatics
• Easy to learn (a bit forgiving)
• Easy to process text files; good language for pattern
searching
– Most biological file formats are text files
– Most sequence analysis tasks deal with pattern finding at some
point
• Easy to run other programs and process their results
– Similar to shell programming in this regard!
Extending the shell: Creating Our
Own Commands
• Use programming language to create the new command
• We will use perl
• TASK: write a PERL program that
–
–
–
–
A.) reads a fasta sequence file
B.) reverse complements the sequence
C.) prints the output to STDOUT
D.) Then modify program to write to a file
• 1. Using command line REDIRECTION
• 2. Using PERL to open and write to OUTPUT FILE
PERL vocabulary – similar to
bash functionality
•
•
•
•
•
•
•
•
•
•
print
chomp
while
open
close
$ARGV[0], $ARGV[1]
$_
if. . .else
=~
/^>/
PERL vocabulary. . .EXPLAINED
•
•
•
•
•
•
•
•
•
•
print works like echo command
chomp removes the ‘newline character’
while repetitive loop until breaking condition met
open ,used to open a file
close used to close a file
$ARGV[0], $ARGV[1] command line arguments
$_ variable that holds current line from in-file
if. . .else [if true perform a, else perform b]
=~ binding operator (compare text w/ reg. exp)
/^>/ match “>” at very beginning of line ONLY
Running a perl script
1. Create a file
–
–
Specify location of perl
Write program
2. Make it executable
3. Run it!
Example: “Hello world!”
• Write the program:
#!/usr/bin/perl
The location of PERL
A PERL command
print("Hello, world!\n");
• Make it executable:
>chmod 744
>
Tells the computer to
allows the user to read,
write AND execute it.
Others can only read it.
• Run it:
Run the program
>hello.pl
Hello, world!
>
The output
Data
• Data is stored in variables.
• A variable is like a box.
• We put values in it.
• There are three ways of storing data:
– Scalar variables
– Arrays
– Hashes
• A single variable (a ‘scalar variable’) can be called
anything, but must start with a ‘$’
Scalar variables: example
#!/usr/bin/perl
Defining a variable
$dna = “TGACT”;
Print(“$dna\n”);
>printVariable.pl
TGACT
>
Using it
Scalar variables (cont.)
• PERL doesn’t differentiate between strings (e.g.
“Fred”), integers (e.g. “13”) or floating point
numbers (e.g. “16.9”).
• If there’s one piece of information, it’s a scalar
variable.
• PERL understands the context you’re working
in.
Scalar variables (cont.)
#!/usr/bin/perl
$dna = “TGACT”;
print(“$dna\n”);
$dna = 11;
print($dna+2.”\n”);
Redefine variable
Defining a variable
(here it’s a string)
Using it
Use it in an integer
context
>printVariable.pl
TGACT
13
Perl worked out what
>
to do
Limitations of scalar variables
Imagine we want to find the average of a list of numbers
• we could do it like this:
program 1
$number1 =
$number2 =
$number3 =
$average =
3;
5.4;
7.3;
4.1;
( $number1 + $number2 + $number3 ) /
but this is obviously extremely limited
Lists
Of course there is a way to make lists in Perl. You
can always enclose a list of items in parentheses...
(
(
(
(
(
(
5.6, 8.22, 14.9 );
# list of floating point numbers
"hello", "Canada" );
# list of strings
"hello", $country );
# mixed list
"blah", 18, 22, 'x', 3.14 );
# mixed list
0 .. 5 );
# list of integers between 0 and 5
'a' .. 'z' );
# list of strings a,b,c,d......
Array variables
There is a special type of variable in perl which can hold
lists - The array
• Perl knows a variable is an array when we use a special
character @
– Remember, scalars (single valued variables) start with a dollar ($)
sign, arrays start with an @ sign.
• Arrays can have as many elements as you need (up to the
limits of your available memory, anyway)
@numbers = (5.6, 8.22, 14.9); # list of floating point numbers
Printing arrays
@words = ("Hello", "Canada!");
print "@words" # prints Hello Canada!
print @words # prints HelloCanada!
• Double quoted strings will print array elements
with spaces in between them.
– No quotes will print array elements all smashed
together. !
Accessing array elements
An array wouldn't be very useful if we couldn't look at the
individual members of the list.
print "Enter an index number between 0 and 25\n";
$index = <STDIN>;
chomp $index;
@letters = ('A'..'Z'); print "letter index $index =
$letters[$index] \n";
What does it mean?
Accessing array elements
•
Arrays are stored in perl's memory in order.
–
–
•
•
Each position (element) in the array has a number
This number is called the index
Each element in an array is a single (scalar) value
There is magic syntax for addressing individual array
elements.
–
•
This syntax can be a bit bewildering.
To access an element we type:
–
•
$array_name[element_number]
Elements are numbered starting at zero, not one!!
Setting the values in an array
Remember ‘ls –1’? We’ll use that here…
@files=`ls –1 *.CEL`; # BACKQUOTE here
-
this is an \n separated list
Any delimiter is ok
Any element can be accessed as a scalar and any
function that acts upon a scalar can be
introduced ($file=$files[2];)
Indexing arrays with
negative numbers
You can index from the end of an array backwards
by using negative numbers:
@letters = ('A'..'Z');
print "last letter = $letters[-1] \n";
print "penultimate letter = $letters[-2] \n";
Getting the length of an
array
•
You can use the function scalar to turn an array
into a single valued scalar variable;
– the value of this variable will be the number of
elements in the array.
@numbers = (0..100);
print scalar(@numbers);
# prints 101
Functions that act on arrays
push
Adds a value (or values) to the end of an array
@numbers = (1, 2, 3);
push(@numbers, 4, 5);
print "@numbers \n"; # prints 1 2 3 4 5
Functions that act on arrays
pop
Removes a single value from the end of an array
@words = ('the', 'quick', 'brown', 'fox');
print pop(@words); # fox print
pop(@words);
# brown
print pop(@words); # quick
Functions that act on arrays
shift
Removes a single value from the beginning of an array
@words = ('the', 'quick', 'brown', 'fox');
print shift(@words); # the
print shift(@words); # quick
Functions that act on arrays
unshift
Pushes a value (or values) onto the front of an array
Functions that act on arrays
reverse
@words = ('the', 'quick', 'brown', 'fox');
print reverse(@words), "\n";
# foxbrownquickthe
Functions that act on arrays
sort
sort does what you think it does. You give it a list (or array),
and it returns a list that is sorted in some way.
@words = ('The', 'quick', 'brown', 'fox', 'jumped');
@sorted = sort(@words);
print "sorted words = @sorted\n";
# The brown fox jumped quick
Functions that act on arrays
join
@words = ('The', 'quick', 'brown', 'fox', 'jumped');
print join("+", @words), "\n";
# The+quick+brown+fox+jumped
You specify what string you want to join with as the first
argument. You can use anything.
Array summary
• An array is a variable that has multiple values
simultaneously.
• We refer to the different values using a number
called the index.
Array example
Note square brackets
enclose index
#!/usr/bin/perl
$dna[0] = “TATA”;
$dna[1] = “ATG”;
print(“$dna[0]\n”);
print(“$dna[1]\n”);
>arrayExample.pl
TATA
ATG
>
Defining different
entries of an array
Print them both
What is a hash?
Hashes are similar to arrays in many respects.
Remember, arrays are simple lists stored as a series of
elements, and each element has a number (index). The
elements are stored in numeric order. It is a bit like a
shopping list.
Arrays are limited, in that you need to know which index
position contains your value of interest. It might be nice
if we could give these index positions names of our
choice.
What is a hash?
Perl has a way to do this, it is called a hash. Perl denotes a
hash with a % (percent) sign.
If arrays are shopping lists, hashes are telephone directories.
You look up phone numbers by a person's name, not a
unique number. They look something like this
%astronomy
value
key
to get the value:
--------------------------------| 'string' | 'word' | $astronomy{'word'}
Making a hash
%re_lookup = (
'Eco47III'=> 'AGCGCT',
'EcoNI' => 'CCTNNNNNAGG',
'EcoRI' => 'GAATTC',
'EcoRII' => 'CCWGG',
'HincII' => 'GTYRAC',
'HindII' => 'GTYRAC',
'HindIII' => 'AAGCTT',
'HinfI' => 'GANTC' );
Accessing a hash
print "Enter restriction enzyme name\n"; $re=<STDIN>;
chomp $re;
$seq = $re_lookup{$re};
if (defined($seq))
{ print "RE sequence for $re is: $seq\n"; }
else
{ print "Sorry, I don't know about \"$re\""; }
Changing values in a hash
Just like we can change individual elements in an
array by referring to them by number, we can
change values in a hash by referring to them by
their key.
$space{'moon'} = 'Titan';
# change "Luna" to "Titan"
Useful Hash Functions
The keys function takes a hash as argument and
returns a list of keys in that hash
The values function takes a hash as argument and
returns a list of values in that hash
Useful Hash Functions
KEYS
%accession_hash = (
"BACR01A01" => "AC005555",
"BACR48E02" => "AC005577",
"BACR24K17" => "AC005101", );
# get all the keys in the hash
@clones = keys %accession_hash;
print "Clone IDs: @clones\n";
# prints BACR01A01 BACR48E02 BACR24K17
Useful Hash Functions
VALUES
# get all the values in the hash (hash is a lookup for
accessions):
@accs = values %accession_hash;
print "GenBank Accessions: @accs\n";
# prints AC005555 AC005577 AC005101
Removing elements from a
hash
To remove a key value pair from a hash, you can use the
delete function
delete $re_lookup{"EcoRI"}
If you just want to delete a value, but keep the key, you
could do this:
$re_lookup{"EcoRI"} = “”;
# set value to the empty string
Counting things with a hash
One of the most popular things to do with a hash is
to count the number of times something has
been seen.
Counting things with a hash
@things = qw(YOR382W YML383W YML280W);
# a list of accession numbers
%counting = (); # initialize a hash
foreach $item (@things){
$counting{$item}++;
# increment the value
associated with the key
}
foreach $key (keys %counting) {
print "$key is found $counting{$key} times \n";}
Hashes summary
• Hashes are like arrays except instead of a
numerical index, we use keys.
• A key can have any value. It can be a string, an
integer – anything.
• Until you learn to use hashes, you aren’t really
using Perl!
Hashes: example
Note curly braces
enclose key
#!/usr/bin/perl
$wife{“Fred”} = “Hannah”;
$wife{“Bill”} = “Josephine”;
print($wife{“Bill”}.”\n”);
print($wife{“Fred”}.”\n”);
>testHash.pl
Josephine
Hannah
>
Defining different
entries of the hash
More stuff on variables
• We’ve used the ‘$’ to talk about individual entries for
hashes or arrays.
• But referring to the whole array, we use ‘@’.
• Referring to the whole hash, we use ‘%’.
More stuff on variables
• This becomes useful when looking at properties of
an entire array or hash
• For example, the length of an array:
#!/usr/bin/perl
‘@’ means we’re
referring to the whole
array
$names[0] = “Bill”;
$names[1] = “Fred”;
$names[2] = “Bartholomew”;
print(scalar(@names).”\n”);
>testScalar.pl
3
>
Control structures
• All out programs so far have run from start to
finish. Each line has been executed in turn.
• What if we only want to run some lines some of
the time?
• This is where control structures come in.
Control structures
• PERL has a number of control structures.
• I’ll talk about four:
– if
– while
– for & foreach
• There are others (e.g. unless)
‘if’ control structure
#!/usr/bin/perl
$name = “Bill”;
if ($name eq “Bill”)
{
print(“The name is Bill!\n”);
}
else
{
print(“The name isn’t Bill!\n”);
}
>testIf.pl
The name is Bill!
>
‘if’ control structure
#!/usr/bin/perl
$name = “Fred”;
if ($name eq “Bill”)
{
print(“The name is Bill!\n”);
}
else
{
print(“The name isn’t Bill!\n”);
}
>testIf.pl
The name isn’t Bill!
>
Perl has great regular expression
support
•
Usually, we compare two strings of characters
using an equality test:
#!/usr/bin/perl
if ($name eq “Bill”)
{
print(“The name is Bill!\n”);
}
The real world is fuzzier…
• Maybe we want to see if the name is ‘Bill’ OR
‘bill’.
• The if statement would need to be more complex:
#!/usr/bin/perl
if (($name eq “Bill”) || ($name eq “bill”))
{
print(“The name is Bill!\n”);
}
This is where regular expressions
come in.
• Regular expressions describe generalised patterns of strings instead of
exact strings.
• For example, the first problem was:
if (($name eq “Bill”) || ($name eq “bill”))
{
print(“The name is Bill!\n”);
}
• But can be re-written:
if ($name =~ /[Bb]ill/)
{
print(“The name is Bill!\n”);
}
Another example…
• The phone number pattern from before (using GREP) problem can
also easily be tackled in perl:
if ($number =~ /([0-9]{3} ){0,1}[0-9]{3} [0-9]{4}/)
{
print(“The number is a valid phone number!\n”);
}
• (clearly the pattern syntax is very similar… we only need to specify to
perl that the syntatical expression should be a regular expression)
– We do this by prepending and appending ‘/’ (forward slashes) to the
expression
First principles of regex in perl
Variable
Regular
expression
if ($name =~ /red/)
{
print(“Name contains the text ‘red’!\n”);
}
Special characters (metachars)
(the following is a review of what we learned for
grep!)
‘.’ is a wildcard and matches any character
$input = $ARGV[0];
if ($input =~ /.ed/)
{
print(“Yes!\n”);
}
>testRegExp.pl
Yes!
>testRegExp.pl
Yes!
>testRegExp.pl
>testRegExp.pl
Yes!
>
bed
red
head
edward
Special characters
(‘metacharacters’)
‘*’ means ‘zero or more of the previous character’.
$input = $ARGV[0];
if ($input =~ /be*d/)
{
print(“Yes!\n”);
}
>testRegExp.pl
Yes!
>testRegExp.pl
>testRegExp.pl
Yes!
>testRegExp.pl
Yes!
>
bed
red
beeeed
bd
Special characters
(‘metacharacters’)
‘+’ means ‘one or more of the previous character’.
$input = $ARGV[0];
if ($input =~ /be+d/)
{
print(“Yes!\n”);
}
>testRegExp.pl
Yes!
>testRegExp.pl
>testRegExp.pl
Yes!
>testRegExp.pl
>
bed
red
beeeed
bd
Start and end of line
‘^’ is designates the start of the line, ‘$’ the end.
$input = $ARGV[0];
if ($input =~ /bed/)
{
print(“Yes!\n”);
}
$input = $ARGV[0];
if ($input =~ /^bed$/)
{
print(“Yes!\n”);
}
>testRegExp.pl bed
Yes!
>testRegExp.pl bedbed
Yes!
>testRegExp.pl xxxbedxxx
Yes!
>
>testRegExp.pl bed
Yes!
>testRegExp.pl bedbed
>testRegExp.pl xxxbedxxx
>
Grouping with parentheses
Parentheses group characters
$input = $ARGV[0];
if ($input =~ /(bed)+/)
{
print(“Yes!\n”);
}
>testRegExp.pl bed
Yes!
>testRegExp.pl bedbed
Yes!
>testRegExp.pl beddd
>
Character classes
• The square brackets are used to denote whole
groups of characters
$input = $ARGV[0];
if ($input =~ /[brf]ed/)
{
print(“Yes!\n”);
}
>testRegExp.pl bed
Yes!
>testRegExp.pl red
Yes!
>testRegExp.pl led
>
Character classes (cont)
• A hyphen designates a range:
$input = $ARGV[0];
if ($input =~ /[a-z]ed/)
{
print(“Yes!\n”);
}
>testRegExp.pl bed
Yes!
>testRegExp.pl fed
Yes!
>testRegExp.pl Bed
>
Character class shortcuts
• Some character classes are so common there are
in-built shortcuts:
– [0-9]
– [A-Za-z0-9]
– [\f\t\n\r ]
=
=
=
\d
\w
\s
Negating a character
• ‘^’ negates a character. Note the context determines
whether ‘^’ is negation or start-of-line!
$input = $ARGV[0];
if ($input =~ /[^b]ed/)
{
print(“Yes!\n”);
}
$input = $ARGV[0];
if ($input =~ /^bed/)
{
print(“Yes!\n”);
}
>testRegExp.pl red
Yes!
>testRegExp.pl bed
>
>testRegExp.pl red
>testRegExp.pl bed
Yes!
>
Quantifying
• Curly brackets quantify repeats better than ‘*’
(0+) or ‘+’ (1+)
a{3,5}
=
three, four or five ‘a’’s.
$input = $ARGV[0];
if ($input =~ /la{3,5}d/)
{
print(“Yes!\n”);
}
>testRegExp.pl laaaad
Yes!
>testRegExp.pl laaaaaaad
>
Using parentheses as memory
• Remember that parentheses group things? What
they match is stored in variables $1, $2, $3…
$input = $ARGV[0];
if ($input =~ /^(.*)e(.)$/)
{
print(“$1\n$2\n”);
}
>testRegExp.pl fred
fr
d
>testRegExp.pl bad
>
Interpolating variables
• We can place variables inside regular expressions
$input = $ARGV[0];
$name = “fred”;
if ($input =~ /$name/)
{
print(“Contains $name!\n”);
}
>testRegExp.pl fred
Contains fred!
>testRegExp.pl bill
>
Using regular expressions to
substitute parts of strings.
• Another useful thing with regular expressions is to use them to
substitute parts of a string for other parts.
• My favourite use: strip trailing backslashes from a path:
$input = $ARGV[0];
$input =~ s/\/$//;
print(“$input\n”);
>testRegExp.pl /usr/bin/tmp/
/usr/bin/tmp
The ‘for’ control structure
• The ‘for’ control structure is ideal for looping
through arrays
For Loops
Consider the standard while loop in pseudocode:
initialization code
while ( Test code ) {
Code to execute in body
} continue {
Update code
}
For Loops
This can be generalized into the concise for loop:
for ( initialization code; test code;
update code ) {
body code
}
‘for’ example
#!/usr/bin/perl
$name[0] = “Bill”;
$name[1] = “Fred”;
$name[2] = “Bartholomew”;
For ($nameIndex = 0; $nameIndex < scalar(@name); $nameIndex++)
{
print(“$name[$nameIndex]\n”);
}
>testFor.pl
Bill
Fred
Bartholomew
>
Foreach Loop has similar
application
foreach will process each element of an array or
list:
foreach $loop_variable ('item1','item2','item3') {
print $loop_variable,"\n";
}
‘foreach’ example
#!/usr/bin/perl
$name[0] = “Bill”;
$name[1] = “Fred”;
$name[2] = “Bartholomew”;
foreach $currentName (@name)
{
print(“$currentName\n”);
}
>testForeach.pl
Bill
Fred
Bartholomew
>
$currentName is
assigned each value in
the array @name in
turn.
Opening files
• We can open other files with our PERL script.
• This is the real strength of PERL: processing text
files.
• It’s easy!
Opening files (cont.)
• To open a file, we need to assign it a ‘file handle’ – this is
the unique identifier we use to refer to the file with:
open(INPUTFILE, “names.txt”);
Filehandle
The name of the file we want to open
and assign to the filehandle
• When we’re finished, we should close the file:
close(INPUTFILE);
While Loops
A while loop has a condition at the top. The code within the
body will execute until the code becomes false.
while ( TEST ) {
Code to execute
} continue {
Optional code to execute at the end of each loop
}
The ‘while’ control structure
• The ‘while’ control stucture keeps looping while a
given condition is satisfied
#!/usr/bin/perl
while (1 == 1)
{
print(“This is a really annoying infinite loop\n”);
}
>whileTest.pl
This is a really
This is a really
This is a really
This is a really
This is a really
annoying
annoying
annoying
annoying
annoying
infinite
infinite
infinite
infinite
infinite
loop
loop
loop
loop
loop
Ad nauseum…
Combining while loops with
opening files
• ‘while’ and open files go together very well:
#!/usr/bin/perl
open(INPUTFILE, “names.txt”);
while ($inputLine = <INPUTFILE>)
{
print(“$inputLine\n”);
}
close(INPUTFILE);
>whileTest.pl
Fred
Bill
Bartholomew
>
Fred
Bill
Bartholomew
(names.txt looks like
this)
split
$input = <STDIN>;
@lineContents = split(/\t/, $input);
Print($lineContents[0].”\n”);
>testRegExp.pl < data.txt
X
Y
Z
>
X
Y
Z
1
3
6
(data.txt)
• A good use for regular expressions is to use them to define
delimiting character(s).
• My favorite use: separating tab-delimited lines into an
array:
Until Loops
Sometimes you want to loop until some condition becomes
true, rather than until some condition becomes false. The
until loop is easier to read than the equivalent while
(!TEST).
my $counter = 5;
until ( $counter < 0 ) {
print $counter--,"\n";
}
Executing external programs
• Another strength of PERL is that it can be used
to run external programs.
• For example, say we have a C++ program that
takes a PDB file and calculates inter-Cα
distances, outputting them like this:
The other Cα
One Cα
1 10
9.23
(tab seperated)
Distance between
them in angstroms
Example
• We could write a PERL script to calculate the
average inter-Cα distances:
#!/usr/bin/perl
$PDBFile
@results
$total
$count
=
=
=
=
“1a8l.pdb”;
`getDistances $PDBFile`;
0;
0;
These little reverse
quotes tell PERL to
execute the program
and collect the results
in the array ‘@results’
The ‘split’ command
splits the line at every
tab.
foreach $line (@results)
{
chomp;
($carbon1, $carbon2, $distance) = split(/\t/, $line);
$total = $total + $distance;
$count++;
}
print(“Average distance: “ . ($total / $count) . “\n”);
Our FASTA pattern problem
• Our problem with pattern matching across
FASTA files is the lack of cohesive sequence (it
runs across many lines)
• Furthermore, our DNA sequence download only
has one strand direction (why? Think
programmatically!)
• We need to solve that
– To do so, we need to read in the file and choose a data
structure appropriate for our needs
– Which one should we use?
PERL data stuctures we can use
• $stringName
– scalars – strings, perl handles datatype conversions
• @arrayName
– arrays – indexed by position, starting at 0
• Function(@arrayName)
– manipulation of arrays
• $($array)
– scalar conversion of an array element
• % hashes
– index non-sequentially (aka “associative arrays”) – we’ll talk
more about these in coming lectures
Basic concept for our task
Read Command Line Arguments
Open Fasta File
While open {
– Read each line of Fasta File
– If line starts with “>”,
• print to out file
– Else,
• reverse complement the line
}
Close Fasta File
Use Control Structures
To Impose Logic
Emacs commands
(in your reading material -> copious emacs cmds)
• http://sip.clarku.edu/tutorials/intro_emacs.html
• http://www.badgertronics.com/writings/cvs/emacs.html
Emacs text editor
• Use either term or GUI
– (‘> emacs –nw’)
– (‘> emacs’)
• Able to load ASCII and binary files and show
metadata (windows conversions)
• Spell check, search, replace (see readings)
• Markup language handling for all file types,
formatting (LaTeX, etc.)
Write Seq File & Programdate & version
program
description
explanation
of major steps
place holder
for the
remaining steps
/^>/ vs. /^>?
WATCH OUT FOR TYPOS!!
Homework problem 3
• Finish writing the perl program for reverse complementing a fasta
sequence
• Use cat “file_of_fields” | awk . . .
– To reorder the first and last field on each line
– To select just the 1st and 5th fields of each line
– To select 1st and 5th field and add “human” as a field between the 1st and
5th fields
• Use cat “file_of_fields” | awk . . . | grep . . .
To select only lines containing ‘trans_factors’
Use redirection operator to write the output to a file called “human
disease genes”
Estimated time
–perl
15 – 90 mins
- cat,awk,grep 5 to 15 mins
Homework Set 4
• Use STDIN instead of command line argument to read
file, make the program work using STDIN. (Hint. cat
seq.fa | revcomp.pl)
while(<STDIN>) {
.
.
.
}
(Estimated time: 15 – 60 minutes)
Homework Set #5
• Modify the output portion of the program to
make a 2nd command line argument ($ARGV[1])
provide the name of an output file for the reverse
complemented sequence.
• open (OUTPUT, “>$out_put_file_name”);
• print OUTPUT “$_\n”;
• close (OUTPUT);
(Estimated time: 15 – 60 mins)
Important Advice!!!
• Save your program frequently!!
• cp revcomp.pl revcomp_BKUP.pl
• Save intermediate versions
– cp revcomp.pl revcomp_STDIN.pl
– cp revcomp.pl revcomp_FILEOUT.pl
– Etc……