Perl Programming for Biologists
Download
Report
Transcript Perl Programming for Biologists
Perl Programming for
Biologists
A bold experiment into the unknown…
PART 1: Tue Aug 21st 2007
update 8/22/2007
Yannick Pouliot, PhD
Bioresearch Informationist
Lane Medical Library & Knowledge Management Center
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
Class Requirements
You must
be registered for this workshop
have a PC (sort of)
have a power supply
have wireless access
have the admin password to your machine
Please put your cell phone/pager on vibrate
No cell calls in class, please
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
2
To Dos
Close all programs other than IE on your laptop
Log into virtual room
YP: log into Safari
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
3
To Do - 2
Please download all class materials from
http://lane.stanford.edu/howto/index.html?id=_2593
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
4
Class Focus
Creating, writing and reading Excel files
Reformatting data files for input to an
analysis program
Writing and reading from a database such as
MS Access or other locally installed relational
database, as well as from databases
available on the Internet.
And remember: Ask LOTS OF QUESTIONS
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
5
Cautions
All examples pertain to MS Office 2003
Unclear what is to be expected for MS Office 2007
All contents pertain to Perl 5.x, not 6.x
V.5 and 6 are NOT compatible
V.5 is far far more common, so not much of an
issue
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
6
So Why Perl?
Perl = Practical Extraction and Reporting Language
Free
Very widely used
Very flexible and portable
Not the only language of this type
E.g., Python
Not the absolute easiest
Especially in biological community
… but pretty easy
Not suited for everything
E.g., for ultra-fast mathematically-oriented code, C is still
best
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
7
Today’s session:
- Installing and understanding what is
required to run Perl
- Understanding the basics of a Perl
program
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
8
Part 1: Installation
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
9
Components to Install & Configure
1.
Perl itself
More accurately, the Perl interpreter
We’ll use ActiveState Perl 5.8x (ActivePerl)
2.
Additional Perl modules
3.
Module = extra functions not part of the interpreter
Described at Comprehensive Perl Archive Network (CPAN)
Open Perl IDE
IDE = integrated development environment:
4.
www.activestate.com/store/freedownload.aspx?prdGuid=81fbce82-6bd5-49bc-a91508d58c2648ca
Editor to write/edit your program
Debugger to find bugs
A compiler/interpreter to run your program from within the IDE
sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440
Configuring the ODBC manager (next week)
Part of Windows
Allows different programs to interact with databases on your machine or
anywhere on the Web via single “doorway”
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
10
What is an Interpreter?
= A program that translates an instruction into
the computer’s language and executes it
before proceeding to the next instruction
= compiled and executed once instruction at a
time
Perl is usually used in interpreted mode
Can also be compiled once (= faster)
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
11
Installing Perl from ActiveState
Go to
1.
www.activestate.com/store/freedownload.aspx?p
rdGuid=81fbce82-6bd5-49bc-a91508d58c2648ca
Select Windows MSI package for Perl 5.8x
Run the installer
2.
3.
Install under c:\Perl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
12
Installing Additional Perl Modules
The fountain of all things Perl: CPAN
= Comprehensive Perl Archive Network
http://www.cpan.org/
What does a module look like?
Why modules?
PPM for downloading & installing modules
What modules are in MY Perl?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
13
Perl
Modules
We’ll Be
Using
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
When to install
Name
Function
8/21/07
File::Copy
manipulating files
8/21/07
File::Find
manipulating files
8/21/07
File::Path
manipulating files
8/21/07
IO::File
accessing the insides of files
8/21/07
Spreadsheet::WriteExcel
writing into an MS Excel spreadsheet
8/21/07
Spreadsheet::ParseExcel
parsing an MS Excel spreadsheet
8/21/07
Spreadsheet::BasicRead
reading the contents of an MS Excel spreadsheet
8/21/07
Win32::OLE
provides easy access to Windows (e.g., launching Excel)
you do it
DBI
provides access to relational databases
you do it
DBD::ODBC
provides access to relational databases
URI
accessing URLs
you do it
LWP::Simple
interacting with a Web site via http
you do it
Array::Unique
returns unique elements of an array
you do it
List::Unique
returns unique elements of a list
you do it
Data :: Dumper
dumping data out of a data structure
you do it
Switch
switch function ("multiple if-else-then")
14
Why an IDE?
IDE = integrated development environment:
IDEs provide facilities to facilitate writing &
debugging
Editor to write/edit your program
Debugger to find bugs
A “runner” (compiler/interpreter) to run your program from
within the IDE
E.g., automatic code highlighting
We’ll use Open Perl IDE
Free, open source, portable
sourceforge.net/project/showfiles.php?group_id=23334&relea
se_id=91440
IDE: Definition, description
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
15
Installing Open Perl IDE
Go to
sourceforge.net/project/showfiles.php?group_id=
23334&release_id=91440
and download the code
2. Create folder Program Files/OpenPerlIDE
3. Unzip into Program Files/OpenPerlIDE
4. Update Path (under System Properties,
Advanced, Environment Variables, System
Variables)
→ this makes it possible to run Open Perl IDE
from anywhere on your machine…
1.
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
16
BREAK
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
17
Part 2: What does it all do?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
18
Example Short Program
1.
2.
3.
Start Open Perl IDE
Load Simple1.pl
Run Simple1.pl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
19
Learning by Looking
Simple2.pl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
20
Exploring Perl’s Major Language
Elements
http://en.wikipedia.org/wiki/Perl#Data_types
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
21
Going Further: Programming Tips
Plan your program
Write down how you intend to process the data in more-or-less plain
language
Goal: making sure that it really does make sense
Hacking doesn’t really pay…
Have documentation handy
eBooks
ActivePerl documentation (searchable)
Perl language reference
→ eBooks: help served on a silver platter
Lane FAQ
When you’re stuck: Search the Web
Google can answer almost any programming question
… though quality documentation is still best
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
22
Excel3.pl: Introducing Object
Programming
Purpose: From an Excel worksheet that lists public
identifiers for DNA sequences associated with
genes, the program retrieves:
UniGene cluster ID
Gene symbol
NCBI Gene ID
… and writes the result into another Excel worksheet
Mix of procedural and object programming
Relevant links:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_
db=unigene
Entrez Utilities
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
23
Gene symbols &
descriptions
Sequence identifier
Search
UniGene for
cluster ID
UniGene
ESearch
Result ID
Excel report
write
Retrieve UniGene
description for that
cluster
UniGene
ESummary
What Excel3.pl Does
Cluster ID
Search Gene
with Gene
Gene
ESearch
Result ID
Excel report
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
write
Retrieve
Gene
description
for that gene
Gene
ESummary
24
Toying with Excel3.pl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
25
Some Key Books/Resources
Perl Programming for Biologists
Perl Cookbook
Perl Quick Reference Guide
My favorite: Perl Quick Reference
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
26
Assignments
Install reminder of Perl modules from list
Look at code for Example3.pl
Modify it, break it
Write down at least one question so we can talk
about it next week
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
27
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
28
eBooks Rule
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
29
What Does A Module Look Like?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
30