Perl Programming for Biologists - Part 1

Download Report

Transcript Perl Programming for Biologists - Part 1

Introductory Perl
Programming for Biologists
Part 1: 2/3/2009
PRELIMINARY VERSION
Yannick Pouliot, PhD
Bioresearch Informationist
Lane Medical Library & Knowledge Management Center
© 2008 The Board of Trustees of The Leland Stanford Junior University
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
The Bioresearch Informationist: At Your Service


Yannick Pouliot, PhD, Lane Medical Library &
Knowledge Management Center
Bioresearch Informationist ≈ computational biologist in
residence



Lane Library service
Closely coordinated with CMGM
Role: Support laboratory researchers regarding
biocomputational resources and their use

…especially postdocs
Contact: [email protected]
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
2
Class Requirements

You must


…have wireless access
…have the admin password to your machine (or
the ability to install software on it)
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
3
Please Log Into WebEx


Go to workshop description to log into Webex
(under Resources)
Password = ‘lanelib’
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
4
To Do

Please download all class materials from
http://lane.stanford.edu/howto/index.html?id=_3098
into C:\course
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
5
Class Goals
Understanding enough Perl for:



Creating, writing and reading Excel files
Reformatting data files for input to an analysis
… and on a procedural note,
program
we’ll beand
using
anonymous
Writing
reading
from a database such as MS
polling to
Access
or determine
other locallywhether
installed relational
database,
as well
from databases available on
you’re happy
withasthe
the
Internet
material
and speed of
delivery …
Remember: Ask LOTS OF QUESTIONS
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
6
Contents
Session 1
 Installing what you need to write and run Perl programs
 Understanding simple Perl programs
 Intro to programming concepts
 Where to get help with Perl
Session 2
 Delving into Perl language elements



more programs; understanding a “real” program
Regular expressions
Interacting with MS Excel, Access database
Session 3
 Understanding “Object Oriented” programming – enough to be
dangerous…
 An example of OO programming: BioPerl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
7
Some Cautions

All examples pertain to MS Office 2003



All examples pertain to Perl 5.x, not 6.x



Examples still work in MS Office 2007 when imported
However, Perl modules used here do not work with MS
Office 2007-formatted documents
V.5 and 6 are NOT compatible
V.5 is far more common, so not much of an issue
Your mileage may vary if you are using Windows
Vista

My recommendation: Switch back to XP
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
8
So Why Perl?



Perl = Practical Extraction and Reporting Language
Free
Very widely used



Very flexible and portable
Not the only language of this type…


E.g., Python
Not the absolute easiest


Especially in biology community
… but pretty easy
Not suited for everything

E.g., for ultra-fast mathematically-oriented code, C is still
best
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
9
Today’s session:
- Installing and understanding what is
required to run Perl
- Understanding the basics of a Perl
program
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
10
Part 1: Installation
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
11
Components to Install & Configure
1.
Perl itself


More accurately, the Perl interpreter
We’ll use ActiveState Perl 5.10x (ActivePerl)

http://downloads.activestate.com/ActivePerl/Windows/5.
10/ActivePerl-5.10.0.1004-MSWin32-x86-287188.msi

Additional Perl modules

Module = extra functions not part of the interpreter
Described at Comprehensive Perl Archive Network (CPAN)

2.
Open Perl IDE

IDE = integrated development environment:




3.
Editor  to write/edit your program
Debugger  to find bugs
A compiler/interpreter  to run your program from within the IDE
sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440
Configuring the ODBC manager (next week)


Part of Windows
Allows different programs to interact with databases on your machine or
anywhere on the Web via single “doorway”
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
12
So, what is an Interpreter?
An interpreter is a program that…
1.
2.
3.
Translates a human-understandable instruction into the
computer’s language
Executes it
Repeats the cycle until no instructions remain
→ “compiled” and executed one instruction at a time
Perl is usually used in interpreted mode



Instructions read and executed one at a time
Can also be compiled once (= faster)
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
13
Installing Perl from ActiveState
Installation for Windows – if Mac, you already have Perl!
We’ll be installing Perl 5.10x for Windows X86:

Go to
http://downloads.activestate.com/ActivePerl/
Windows/5.10/ActivePerl-5.10.0.1004MSWin32-x86-287188.msi


Run the installer
Install under c:\Perl
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
14
Installing Additional Perl Modules
The fountain of all things Perl: CPAN


= Comprehensive Perl Archive Network
http://www.cpan.org/

What does using a module inside a Perl program look like?

Why modules?

If you find yourself struggling with a problem, chances are someone
has already dealt with it, and you can use their code for free!

Downloading & installing modules: The Perl Package Manager
(PPM)

Perl is in constant evolution


Different modules become part of the standard Perl distribution
What modules are in MY Perl?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
15
The PPM Module: Installing Perl
Modules the Easy Way
Two ways to install Perl modules:
1.
Hard way: Perl modules can downloaded and installed
manually from e.g., CPAN
2.
Easy way: They can also be installed via the Perl Package
Manager: PPM
What’s the difference?
1.
There are bits of code that need to be moved into various
directories
2.
Modules often have dependencies on other modules →
more installing
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
16
Perl
Modules
We’ll Be
Using
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
Name
Function
Included
File::Copy
manipulating files
Included
File::Find
manipulating files
Included
File::Path
manipulating files
You do it!
File::Rename
Manipulating files
Included
IO::File
accessing the contents of files
Included
Spreadsheet::WriteExcel
writing into an MS Excel spreadsheet
Included
Spreadsheet::ParseExcel
parsing an MS Excel spreadsheet
Included
Spreadsheet::BasicRead
reading the contents of an MS Excel spreadsheet
Included
Win32::OLE
provides easy access to Windows (e.g., launching Excel)
Included
URI
accessing URLs
Included
LWP::Simple
interacting with a Web site via http
Included
Array::Unique
returns unique elements of an array
Included
List::Uniq
returns unique elements of a list
Included
Switch
switch function ("multiple if-else-then")
17
Polling Time: How’s the speed?
1: Too fast
2. Too slow
3. More or less OK
4. I feel nauseous
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
18
Installing an environment to
run and edit Perl:
Integrated
Development
Environment (IDE)
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
19
Why an IDE?
IDEs make writing code much easier/faster because you can…




Edit  to write/edit your program
Debug  to find bugs
Run your program from within the IDE
IDEs provide special facilities to facilitate writing & debugging




E.g., automatic code highlighting, easily seeing the value of
variables
We’ll use Open Perl IDE
Free, open source, but Win only (sorry)

http://open-perl-ide.sourceforge.net/
For our Mac friends: Affrus



Not free, but reasonably inexpensive
Evaluation version available
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
20
Installing Open Perl IDE
1. Go to http://open-perl-ide.sourceforge.net/ and download the code
the main file and the patch.
2. Create folder Program Files/OpenPerlIDE
3. Unzip into Program Files/OpenPerlIDE
4. Update the Path variable
under System Properties→Advanced→Environment
Variables→System Variables
→ this makes it possible to run the Open Perl IDE program from
anywhere on your machine…
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
21
BREAK
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
22
Part 2: What does it all do?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
23
Simple1.pl: Your First Perl Program
1.
2.
3.
Start Open Perl IDE
Load Simple1.pl (File Open…)
Run Simple1.pl
Simple1.pl demonstrates:
1.
OS directive
2.
Modules
3.
Main section
4.
Variable declaration
5.
Reserved variables
6.
Variable types: arrays
7.
Subroutines
8.
Running from command line using input parameters
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
24
A Second Example Program:
Simple2.pl
Understanding data (= variable) types:
http://en.wikipedia.org/wiki/Perl#Data_types
… and more generally, understanding the lingo
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
25
Exploring Perl’s Major Language
Elements

*** Norman Matloff’s introduction to Perl:
http://heather.cs.ucdavis.edu/~matloff/Perl/PerlIntro.pdf


Perl language reference
ActivePerl documentation
Stuck? Google is incredible for programming
problems…
Also handy:
 LaneConnex search engine → search with “Perl”
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
26
Key Books & Resources




*** Learning by example: Perl Cookbook
Learning Perl
Perl Quick Reference Guide
My favorite: Perl Quick Reference
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
27
The Next Step: Programming Tips

PLAN your program

Write down how you intend to process the data using more-or-less plain
language (“pseudo-code”)



Goal: ensure that it really does make sense
Hacking doesn’t really pay…
Have documentation handy


ActivePerl documentation (searchable)
Perl language reference
→ eBooks: help served on a silver platter


Lane FAQs
When you’re stuck: Search the Web

Google can answer almost any programming question

… though quality documentation is still best
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
28
Polling Time: How’s the speed?
1: Too fast
2. Too slow
3. More or less OK
4. I feel nauseous
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
29
Toying with Excel3.pl, a “real”
program
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
30
Excel3.pl: A “Real” Program
What it does:
Reads input from an Excel worksheet containing public identifiers
for DNA sequences associated with genes
Uses Entrez Utilities provided by NCBI to retrieve:
1.
2.



3.
UniGene cluster ID
Gene symbol
NCBI Gene ID
Writes the result into another Excel worksheet
Features a mix of procedural and object programming →
Session 3 of workshop

Relevant links:


http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=uni
gene
Entrez Utilities
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
31
Gene symbols &
descriptions
Sequence identifier
Search
UniGene for
cluster ID
UniGene
ESearch
Result ID
Excel report
write
Retrieve UniGene
description for that
cluster
UniGene
ESummary
What Excel3.pl Does
Cluster ID
Search Gene
with Gene
Gene
ESearch
Result ID
Excel report
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
write
Retrieve
Gene
description
for that gene
Gene
ESummary
32
Assignments

Look at Simple2.pl



Modify it, break it
Come up with a modification, e.g., divide instead
of multiply
Write down at least one question  so we can talk
about it next week
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
33
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
34
eBooks Rule
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
35
What Does A Module Look Like?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
36