Slides - Edwards Lab
Download
Report
Transcript Slides - Edwards Lab
Advanced Python
Concepts: Modules
BCHB524
2015
Lecture 15
10/28/2015
BCHB524 - 2015 - Edwards
Modules
We've already used these from the python
standard library and from BioPython
import sys
filename = sys.argv[1]
file_handle = open(filename)
import Bio.SeqIO
seq_records = Bio.SeqIO.parse(file_handle, "swiss")
import gzip
file_handle = gzip.open(filename)
import urllib
file_handle = urllib.urlopen(url)
import math
x = math.sqrt(y)
10/28/2015
BCHB524 - 2015 - Edwards
2
Modules
Modules contain previously defined functions,
variables, and (soon!) classes.
Store useful code for re-use in many
programs
import sys
import MyNucStuff
Structure
seqfilename = sys.argv[1]
related code seq = MyNucStuff.read_seq_from_filename(seqfilename)
cseq = MyNucStuff.complement(seq)
together
rcseq = MyNucStuff.reverseComplement(seq)
print "
Sequence:",seq
print " C sequence:",cseq
print "RC sequence:",rcseq
10/28/2015
BCHB524 - 2015 - Edwards
3
Modules
In MyNucStuff.py
def read_seq_from_filename(seq_filename):
seq_file = open(seq_filename)
dna_seq = ''.join(seq_file.read().split())
dna_seq = dna_seq.upper()
seq_file.close()
return dna_seq
compl_dict = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
def complement(seq):
return ''.join(map(compl_dict.get,seq))
def revseq(seq):
return ''.join(reversed(seq))
def reverseComplement(seq):
return complement(revseq(seq))
10/28/2015
# plus anything elseBCHB524
you like...
- 2015 - Edwards
4
Modules
We can import specific functions directly…
And reference them without the module name
import sys
from MyNucStuff import read_seq_from_filename
from MyNucStuff import complement
from MyNucStuff import reverseComplement
seqfilename = sys.argv[1]
seq
= read_seq_from_filename(seqfilename)
cseq = complement(seq)
rcseq = reverseComplement(seq)
print "
Sequence:",seq
print " C sequence:",cseq
print "RC sequence:",rcseq
10/28/2015
BCHB524 - 2015 - Edwards
5
Modules
We can even import all the functions from a
module…
import sys
from MyNucStuff import *
seqfilename = sys.argv[1]
seq
= read_seq_from_filename(seqfilename)
cseq = complement(seq)
rcseq = reverseComplement(seq)
print "
Sequence:",seq
print " C sequence:",cseq
print "RC sequence:",rcseq
10/28/2015
BCHB524 - 2015 - Edwards
6
Packages
Packages are collections of modules, grouped
together. All equivalent:
import Bio.SeqIO
Bio.SeqIO.parse(handle, "swiss")
from Bio import SeqIO
SeqIO.parse(handle, "swiss")
from Bio.SeqIO import parse
parse(handle, "swiss")
Implemented using files and folders/directories.
10/28/2015
BCHB524 - 2015 - Edwards
7
What can go wrong?
Sometimes our own .py files can "collide"
with Python's packages.
Test what happens with an "empty" module in
files:
10/28/2015
xml.py (and then try to import ElementTree)
Bio.py (and then try to import SeqIO)
etc…
BCHB524 - 2015 - Edwards
8
A module for codon tables
Module is called: codon_table
Functions:
read_codons_from_filename(filename)
amino_acid(codon_table,codon)
Returns the single amino-acid consistent with ambiguous codon
(containing N's), or X.
translate(codon_table,seq,frame)
10/28/2015
returns true if codon is an initiation codon, false, otherwise
get_ambig_aa (codon_table,codon)
returns amino-acid symbol for codon
is_init(codon_table,codon)
returns dictionary of codons –
value is pair: (amino-acid symbol, initiation codon true/false)
returns amino-acid sequence for DNA sequence seq
BCHB524 - 2015 - Edwards
9
A module for codon tables
from MyNucStuff import *
from codon_table import *
import sys
if len(sys.argv) < 3:
print "Require codon table and DNA sequence on command-line."
sys.exit(1)
table = read_codons_from_filename(sys.argv[1])
seq = read_seq_from_filename(sys.argv[2])
if is_init(table,seq[:3]):
print "Initial codon is an initiation codon"
for frame in (1,2,3):
print "Frame",frame,"(forward):",translate(table,seq,frame)
10/28/2015
BCHB524 - 2015 - Edwards
10
A module for codons
In codon_table.py:
def read_codons_from_filename(codonfile):
# magic
return codon_table
def amino_acid(table,codon):
# magic
return aa
def is_init(table,codon):
# magic
return init
def get_ambig_aa(table,codon):
# magic
return aa
def translate(table,seq,frame):
# magic
return aaseq
10/28/2015
BCHB524 - 2015 - Edwards
11
A module for codons
In codon_table.py:
def read_codons_from_filename(codonfile):
f = open(codonfile)
data = {}
for l in f:
sl = l.split()
key = sl[0]
value = sl[2]
data[key] = value
f.close()
b1
b2
b3
aa
st
=
=
=
=
=
data['Base1']
data['Base2']
data['Base3']
data['AAs']
data['Starts']
codon_table = {}
n = len(aa)
for i in range(n):
codon = b1[i] + b2[i] + b3[i]
isInit = (st[i] == 'M')
codon_table[codon] = (aa[i],isInit)
return codon_table
10/28/2015
BCHB524 - 2015 - Edwards
12
Exercise
Rework the lecture, and your solutions (or mine) from the
homework exercises #1 through #3, to make a MyDNAStuff
module.
Put as many useful nucleotide functions as possible into
the module...
Rework the lecture, and your solutions (or mine) from
homework exercises #4 and #5, to make the codon_table
module with functions specified in this lecture.
Demonstrate the use of these modules to translate an aminoacid sequence in all six-frames with just a few lines of code.
The final result should look similar to Slide 10.
Your program should handle DNA sequence with N’s in it.
10/28/2015
BCHB524 - 2015 - Edwards
13
Class Project: Expectations
40% of your grade!
Project Report
Long version of your homework write-up
Project Presentation
10/28/2015
Demo your program
Describe your project solution
BCHB524 - 2015 - Edwards
14
Class Project: Blast Database
1.
2.
3.
Write a program that computes all pairwise
blast alignments for two species' proteomes
and stores the alignments in a relational
database.
Write a program that retrieves the blast
alignment for two proteins (specified by their
accessions) from the relational database.
Write a program that finds pairs of
orthologous proteins that are mutually best
hits in the species' proteomes.
10/28/2015
BCHB524 - 2015 - Edwards
15
Class Project: MS/MS Viewer
Write a program to display peptide
fragmentation spectra from an mzXML file.
10/28/2015
The program will take an mzXML file, a scan
number, and a peptide sequence as input.
The peptide's b-ion and y-ion m/z values should
be computed, and peaks matching these m/z
values annotated with appropriate labels.
The output figure/plot should aid the user in
determining whether or not the peptide is a good
match to the spectrum.
BCHB524 - 2015 - Edwards
16
Class Project: Protein Digest
Write a simple web-server application using
TurboGears to carry out an in silico
enzymatic digest of a user-provided protein
sequence.
10/28/2015
Users should be able to specify min and max
length, min and max molecular weight, # of
missed cleavages, and specific enzyme.
Output should be a table of peptides, with their
length, molecular weight, # of missed cleavages,
and amino-acids to left and right of each peptide
in the protein sequence.
BCHB524 - 2015 - Edwards
17