Transcript Hash tables

Hash Tables lecture
Lecture 5
What is a hash table
• An array is a set of “elements” where each “element” is
referred to by a subscript:
• Consider array : @array
• You refer to each element by:
– $array[$index]; or $array[2] …..
• A hash table, or associative array, however, is more
representative of an index, e.g. in a book, where each
entry has a “value” and a corresponding “identifier”
[key] .
• Unlike arrays in the Hash table the identifier are
“strings”
• Moreover, the identifiers are, by default, unordered. [a
sort function is utilised if required]
Declare and define Hash Table
• Declare an empty hash table:
– %nucleotides ();
• To declare and give values you do the following:
–
–
–
–
–
%nucleotides =
( A => Adenine,
T => Thymine,
G => Guanine,
C => Cytosine );
– A is the identifier (key) and Adenine is the value [entry]
Add / delete entries
– To add “entries” need to specify the identifier
and the Entry.
•
•
•
•
%oligos = ();
$oligos{’192a8’} = ‘GGGTTCCGATTTCCAA’;
$oligos{’18c10’} = ‘CTCTCTCTAGAGAGAGCCCC’;
$oligos{’327h1’} = ‘GGACCTAACCTATTGGC’;
– Note: the identifier must use ‘ ‘; e.g. ‘192a8’
– Removing elements:
• Delete $oligos{’192a8’};
Output and Assign “entry” values
• hash results
– Print the value of the entry:
• print “oligo 18c10 is $oligos{’18c10’}\n”;
– Assign the entry to a variable
• $s = $oligo{’192a8’};
• print “oligo 192a8 is $s\n”;
– Determing the size (number of enteries in the
table)
• $size = keys %oligo;
– Print “ the number of enteries are: $size\n”;
Displaying all entries in a hash table
• Determine the size of the “value” part of the entry: can prove useful in
bioinformatics
– print “oligo 192a8 is ”,length $oligos{’192a8’},“ base pairs long\n”;
• Input_output_hashtable.pl
• Sort and print all the enteries (identifier and value) in the hash table
– foreach $genome ( sort keys %gene_counts )
– {
•
print "`$genome' has a gene count of $gene_counts
– { $genome }\n";}
• Printing all entries can be done via a while loop [no sorting]
–
–
–
–
while ( ( my $genome, my $count ) = each %gene_counts )
{
print "`$genome' has a gene count of $count\n";
}
• Refer to genes.pl
Hashes:
• A Hash table can be often used like an
reference index ; e.g. “code of life” translation
table :
– hash_base.pl shows what the nucleotide base
letter stands for.
– Moreover Hash tables could be use, as it the
exercise, to create a DNA codon conversion table
so that when a codon is encountered as input it
converts it to an amino acid; e.g. name, letter …..
Error prevention: exists
• It is important to try and prevent errors occurring; such as entering
a key that is not in the table.
• if this happens you need to effectively deal with it by using the
exists function.
• The function returns a value of true if the entry is there or false it is
not.
–
–
–
–
–
–
if (exists $nucleotide_bases{$_})
{
print “exists\n”;
}
else
{
• print “does not exis\nt”;
– }
– hash_base_ErrorPrevention.pl
Sample question: translator
• Create a hash table for converting DNA condons
(3 bases) into amino acids
• Display all the enteries to the user
• Continually ssk the user to entered three bases
and display the corresponding Amino Acid (AA);
e.g. aug  met…. Repeat the process until a
“stop” codon is entered. Display the AA
sequence, as single characters, that was
[entered as bases by the user]; note: you must
use another has table for this part.