Bioinformatics

Download Report

Transcript Bioinformatics

Bioinformatics
Lecture 7: Introduction to Perl
Introduction
•
• Basic concepts in Perl syntax:
– variables, strings, input and output
– Conditional and iteration
– File handling and error handling
– Arrays, lists and hashes
First program
• a basic Strings program: Test.pl
– #!/usr/bin/perl
– print "Hello boys and girs!\n this is introduction to
perl";
• Open with notepad and type the above
• Save file as hello.pl
• Ensure that hide file extensions option is
unchecked.
• Run via the command line
Variables declarations
• $variable name : intergers, floats, strings.
• @ arrays
• Arithmetic operators:
– +, -, *, / , **( exponentation); % modulus
• Double v single quotation marks
– $x = ‘ I am from Cork ‘
– print “the value of $x is $x\n”
– print ’the value of $x is $x\n’
– print “the value of \$x is $x\n” # note the \$x
– #evaluating expressions in print (# comment line symbol)
– $ x = 15;
– Print “the value of x is “, $x + 3, “\n” (ArithmeticExample.pl)
•
Input , output and files handling
Input
– $var = <> (input a line of text and assign it to $var): also iputs return character
– Chomp $var removes the return character from the #also used the word chop
– Alternatively chomp($var = <>);
– $line = <DATA> reads in “hardcoded data”
•
Output
– print (already covered)
•
File Handling
–
–
–
–
–
open MYFILE , ‘data.txt’ (open file for reading;)
open MYFILE, ‘>data.txt’ (open file for writing)
Open MYFILE, ‘>> data.txt’ (open file for appending)
$line = <MYFILE > #read one line from file
@entire_file = <MYFILE> ; (called slurping) #reads all the file into an array
–
print MYFILE “Do you like computers….”, $number/3, “\n” # write out to file
– close MYFILE;
Conditional Operator
•
•
•
•
•
•
•
== Equality
!= Not equal
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
! Logical not
$a == $b
$a != $b
$a < $b
$a > $b
$a <= $b
$a >= $b
$ = !$b
String conditional operator
•
•
•
•
•
•
•
•
eq
ne
lt
gt
le
ge
.
=~
Equality
$a eq $b
Not equal
$a ne $b
Less than
$a lt $b
Greater than
$a gt $b
Less than or equal to
$a le $b
Greater than or equal to
$a ge $b
Concatenation
$a.$c
Pattern match
$a =~ /gatc/
Conditional statements
• If and elseif and else if_else.pl
•
#!/usr/bin/perl
•
•
print “Enter your age: ”;
$age = <>;
•
•
•
•
•
•
•
•
•
•
if ($age <= 0) {
print “You are way too young to be using a computer.\n”;
}
elseif ($age >= 100)
{
print “Not in a dog’s life!\n”;
} else
{
print “Your age in dog years is ”,$age/7,“\n”;
}
•
Iteration: loops
• While-loops
–
–
–
–
–
–
#!/usr/bin/perl
$count = 1;
while ($count <= 5) {
print “$count potato\n”;
$count = $count + 1;
}
• Until-loops
–
–
–
–
–
–
#!/usr/bin/perl
$count = 1;
until ($count > 5) {
print “$count potato\n”;
$count = $count + 1;
}
Loops with defined
•
•
•
•
#!/usr/bin/perl
# defined fnt is true if $line assigned a value
print “Type something. ‘quit’ to finish\n ”;
while ( defined($line = <>) ) {
– chomp $line;
– last if $line eq ‘quit’; # breaks out of loop at quit
– print “You typed ‘$line’\n\n”;
– print “Type something> ”;
• }
• print “goodbye!\n”;
loops_defined.pl
Shorthand input notation
• #!/usr/bin/perl
• print “Type something. ‘quit’ to finish\n ”;
• while (<>) {
– chomp; # $_ generic variable name
– last if $_ eq ‘quit’;
– print “You typed ‘$_ ’\n\n”;
– print “Type something> ”;
• }
• print “goodbye!\n”;
Change Standard input/ output
• redirect Sdout to a file
– U:\test test.pl
> stdout.txt [produces a text file ]
• print file goes to file and not to screen
• Run Loops_defined to redirect to output to file
• The <> input has one feature where if a file
name is on the command line it beings to read
from it otherwise it reads from keyboards
– U:\test commandline.pl stdin.txt
Finding length of file
•
•
•
•
#!/usr/bin/perl
#File_size_1.pl
# file size.pl
$length = 0; # set length counter to zero
$lines = 0; # set number of lines to zero
• print “enter text one line at a time and press (ctrl z) to quit”;
• while (<>) { # read file one line at a time
– chomp; # remove terminal newline
– $length = $length + length $_ ;
– $lines = $lines + 1;
• }
• print “LENGTH = $length\n”;
• print “LINES = $lines\n”;
• Try using keyboard as Stdin (ctrl Z) and file name on command line
Dynamic Arrays
• Declaration of an array in perl
– @sequences = (‘123a’, ‘23ed4’, ‘2334d’);
– Array contains 3 strings!!!
• Array operations:
–
–
–
–
–
–
$one_seq = @sequences[2] {zero based array}
@seq = @sequences; assigns arrays
@seq = (@seq, ‘125f’); adding an value
@combined = (@seq, @seq2)
Removing (splice) @removed = splice @seq, 1, 2
slicing : @slice = @seq[1,2];
• Splice_slice_array.pl
Dynamic Arrays
– push @sequences, ‘2345d’; (adds element to end
of array)
– Pop @sequences removes and returns (function
returns) last element of array
– Shifting: removes and returns the first element of
an array.
– Unshifting: Adds an element or list of elements
onto the beginning of an array.
Shift Pop push unshift example
•
#! /usr/bin/perl
•
# The 'pushpop' program - pushing, popping, shifting and unshifting.
•
•
•
•
•
•
•
•
•
•
•
•
@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',
'CTATGCGGTA', 'ATCTGACCTC' );
•
What is the expected output (run code to confirm)
print "@sequences\n";
$last = pop @sequences;
print "@sequences\n";
$first = shift @sequences;
print "@sequences\n";
unshift @sequences, $last;
print "@sequences\n";
push @sequences, ( $first, $last );
print "@sequences\n";
Arrays: two more functions
•
Substr (extracting a substring from a string)
– $sub = substr ($string, offset position[position to begin extraction], size of substring)
•
Substr and index:
•
To obtain the reverse complement of a DNA sequence: assume the sequence
is stored in array: (GGGGTTTT becomes AAAACCCC)
Iterating through an array:
– foreach $dna (@dna)
– {
• $dna = reverse $dna; # reverse the contents of a scalar $dna
• $dna =~ tr/gatcGATC/ctagCTAG/;
•
– # tr (translate first set into second; e.g. g becomes c ) complement (replace)
– }
Questions
• how would you read in a file of DNA sequence
into an array and print both the original and
reverse complementary copy
• What use could this program have? (biology
related answer)
Array and lists
• Lists are an array of constants or variables
– Values of a list assigned to any array
•
@clones = (’192a8’,’18c10’,’327h1’,’201e4’);
– Values in an array assigned to a list
– ($first,$second,$third) = @clones;
Hashes: associative arrays
• Similar arrays but elements are unordered
– Two parts: the identifer (name), a scalar value
(string)
– Add Elements are referred to by strings:
•
•
•
•
%oligos = ();
$oligos{’192a8’} = ‘GGGTTCCGATTTCCAA’;
$oligos{’18c10’} = ‘CTCTCTCTAGAGAGAGCCCC’;
$oligos{’327h1’} = ‘GGACCTAACCTATTGGC’;
– Note in the name part use ‘ ‘
– Removing elements:
• Delete $oligos{’192a8’};
Hashes
• Outputting hash results
• $s = $oligos{’192a8’};
• print “oligo 192a8 is $s\n”;
• print “oligo 192a8 is ”,length $oligos{’192a8’},“ base
pairs long\n”;
• print “oligo 18c10 is $oligos{’18c10’}\n”;
• Expected output: input_output_hash.pl
• oligo 192a8 is GGGTTCCGATTTCCAA
• oligo 192a8 is 16 base pairs long
• oligo 18c10 is CTCTCTCTAGAGAGAGCCCC
Hashes
• Example of the use of a Hash table
– hash_bases.pl program
• For loops and hash tables
–
–
–
–
–
foreach $clone (’327h1’,’192a8’,’18c10’) {
print “$clone: $oligos{$clone}\n”;
}
%oligos is refers to the hash table
$oligos is used to refer to elements
• $size = keys %oligo; returns the number of entries
Displaying all entries in a hash table
• while ( ( $genome, $count ) = each %gene_counts )
• {
• print "`$genome' has a gene count of $count\n";
}
•
•
•
•
•
foreach $genome ( sort keys %gene_counts )
{
print "`$genome' has a gene count of $gene_counts
{ $genome }\n";}
Refer to genes.pl
Error Handling
• die function:
• open myfile, ‘stdin.txt’ or
• Die “could not open file aborting…\n”;
– If file does not exits the program terminates with the
above message
• Write a program to read in data from a file to an array
and when all the data is input to output in reverse
order
• Create a hash table that performs the condon to AA
conversion and use it to convert codons {entered
from the key board} into their corresponding Amino
Acids