Perl Programming for Biologists

Download Report

Transcript Perl Programming for Biologists

Perl Programming for
Biologists
A bold experiment into the unknown…
PART 1: Tue Aug 21st 2007
update 8/22/2007
Yannick Pouliot, PhD
Bioresearch Informationist
Lane Medical Library & Knowledge Management Center
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
Class Requirements

You must






be registered for this workshop
have a PC (sort of)
have a power supply
have wireless access
have the admin password to your machine
Please put your cell phone/pager on vibrate

No cell calls in class, please 
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
2
To Dos



Close all programs other than IE on your laptop
Log into virtual room
YP: log into Safari
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
3
To Do - 2

Please download all class materials from
http://lane.stanford.edu/howto/index.html?id=_2593
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
4
Class Focus



Creating, writing and reading Excel files
Reformatting data files for input to an
analysis program
Writing and reading from a database such as
MS Access or other locally installed relational
database, as well as from databases
available on the Internet.
And remember: Ask LOTS OF QUESTIONS
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
5
Cautions

All examples pertain to MS Office 2003


Unclear what is to be expected for MS Office 2007
All contents pertain to Perl 5.x, not 6.x


V.5 and 6 are NOT compatible
V.5 is far far more common, so not much of an
issue
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
6
So Why Perl?



Perl = Practical Extraction and Reporting Language
Free
Very widely used



Very flexible and portable
Not the only language of this type


E.g., Python
Not the absolute easiest


Especially in biological community
… but pretty easy
Not suited for everything

E.g., for ultra-fast mathematically-oriented code, C is still
best
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
7
Today’s session:
- Installing and understanding what is
required to run Perl
- Understanding the basics of a Perl
program
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
8
Part 1: Installation
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
9
Components to Install & Configure
1.
Perl itself


More accurately, the Perl interpreter
We’ll use ActiveState Perl 5.8x (ActivePerl)

2.
Additional Perl modules


3.
Module = extra functions not part of the interpreter
Described at Comprehensive Perl Archive Network (CPAN)
Open Perl IDE

IDE = integrated development environment:




4.
www.activestate.com/store/freedownload.aspx?prdGuid=81fbce82-6bd5-49bc-a91508d58c2648ca
Editor  to write/edit your program
Debugger  to find bugs
A compiler/interpreter  to run your program from within the IDE
sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440
Configuring the ODBC manager (next week)


Part of Windows
Allows different programs to interact with databases on your machine or
anywhere on the Web via single “doorway”
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
10
What is an Interpreter?

= A program that translates an instruction into
the computer’s language and executes it
before proceeding to the next instruction


= compiled and executed once instruction at a
time
Perl is usually used in interpreted mode

Can also be compiled once (= faster)
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
11
Installing Perl from ActiveState
Go to
1.
www.activestate.com/store/freedownload.aspx?p
rdGuid=81fbce82-6bd5-49bc-a91508d58c2648ca
Select Windows MSI package for Perl 5.8x
Run the installer
2.
3.

Install under c:\Perl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
12
Installing Additional Perl Modules
The fountain of all things Perl: CPAN


= Comprehensive Perl Archive Network
http://www.cpan.org/

What does a module look like?

Why modules?

PPM for downloading & installing modules

What modules are in MY Perl?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
13
Perl
Modules
We’ll Be
Using
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
When to install
Name
Function
8/21/07
File::Copy
manipulating files
8/21/07
File::Find
manipulating files
8/21/07
File::Path
manipulating files
8/21/07
IO::File
accessing the insides of files
8/21/07
Spreadsheet::WriteExcel
writing into an MS Excel spreadsheet
8/21/07
Spreadsheet::ParseExcel
parsing an MS Excel spreadsheet
8/21/07
Spreadsheet::BasicRead
reading the contents of an MS Excel spreadsheet
8/21/07
Win32::OLE
provides easy access to Windows (e.g., launching Excel)
you do it
DBI
provides access to relational databases
you do it
DBD::ODBC
provides access to relational databases
URI
accessing URLs
you do it
LWP::Simple
interacting with a Web site via http
you do it
Array::Unique
returns unique elements of an array
you do it
List::Unique
returns unique elements of a list
you do it
Data :: Dumper
dumping data out of a data structure
you do it
Switch
switch function ("multiple if-else-then")
14
Why an IDE?

IDE = integrated development environment:




IDEs provide facilities to facilitate writing &
debugging


Editor  to write/edit your program
Debugger  to find bugs
A “runner” (compiler/interpreter)  to run your program from
within the IDE
E.g., automatic code highlighting
We’ll use Open Perl IDE

Free, open source, portable

sourceforge.net/project/showfiles.php?group_id=23334&relea
se_id=91440
IDE: Definition, description
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
15
Installing Open Perl IDE
Go to
sourceforge.net/project/showfiles.php?group_id=
23334&release_id=91440
and download the code
2. Create folder Program Files/OpenPerlIDE
3. Unzip into Program Files/OpenPerlIDE
4. Update Path (under System Properties,
Advanced, Environment Variables, System
Variables)
→ this makes it possible to run Open Perl IDE
from anywhere on your machine…
1.
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
16
BREAK
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
17
Part 2: What does it all do?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
18
Example Short Program
1.
2.
3.
Start Open Perl IDE
Load Simple1.pl
Run Simple1.pl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
19
Learning by Looking

Simple2.pl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
20
Exploring Perl’s Major Language
Elements

http://en.wikipedia.org/wiki/Perl#Data_types
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
21
Going Further: Programming Tips

Plan your program

Write down how you intend to process the data in more-or-less plain
language



Goal: making sure that it really does make sense
Hacking doesn’t really pay…
Have documentation handy



eBooks
ActivePerl documentation (searchable)
Perl language reference
→ eBooks: help served on a silver platter


Lane FAQ
When you’re stuck: Search the Web

Google can answer almost any programming question

… though quality documentation is still best
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
22
Excel3.pl: Introducing Object
Programming

Purpose: From an Excel worksheet that lists public
identifiers for DNA sequences associated with
genes, the program retrieves:






UniGene cluster ID
Gene symbol
NCBI Gene ID
… and writes the result into another Excel worksheet
Mix of procedural and object programming
Relevant links:


http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_
db=unigene
Entrez Utilities
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
23
Gene symbols &
descriptions
Sequence identifier
Search
UniGene for
cluster ID
UniGene
ESearch
Result ID
Excel report
write
Retrieve UniGene
description for that
cluster
UniGene
ESummary
What Excel3.pl Does
Cluster ID
Search Gene
with Gene
Gene
ESearch
Result ID
Excel report
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
write
Retrieve
Gene
description
for that gene
Gene
ESummary
24
Toying with Excel3.pl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
25
Some Key Books/Resources




Perl Programming for Biologists
Perl Cookbook
Perl Quick Reference Guide
My favorite: Perl Quick Reference
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
26
Assignments


Install reminder of Perl modules from list
Look at code for Example3.pl


Modify it, break it
Write down at least one question  so we can talk
about it next week
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
27
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
28
eBooks Rule
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
29
What Does A Module Look Like?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
30