Wyeth Wasserman Jonathan Lim

Download Report

Transcript Wyeth Wasserman Jonathan Lim

ACCESSING AND EXTRACTING CHIP -SEQ AND
TF-GENE INTERACTIONS FROM PAZAR
Jonathan Lim
With Introduction by Wyeth Wasserman
www.pazar.info/
Welcome
•If you encounter any technical difficulties during the webinar, type a report
using the chat option
•Slide presentation ~25 min
•Compile Questions as they are submitted and answer them during the final
Q&A/discussion period
•During the discussion session, we’ll allow audience speaking
2
Topics
•
PAZAR Overview
•
Data Retrieval Through Web Interface
•
Data Files and Formats
•
PAZAR Application Programming Interface (API)
•
Q&A
3



Topics will increase in complexity as webinar progresses
Data file formats will be presented in order of complexity, beginning with
the simplest
PAZAR API will be the most technical topic presented today and is geared
toward those with programming knowledge
4
www.pazar.info
• Software framework for the construction and maintenance of regulatory
sequence data annotation
•Allows multiple boutique databases to function independently within a
larger system
• Public repository for regulatory data
• Each group manages its own deposit and distribution of data
• Envisioned as tool for capturing deep experimental annotation
• Species, cell line, treatment
5
Browsing Data on PAZAR
Link to project details
6
Project Information
Gene View
Link to sequence details
8
Sequence Information
TF View
Data File Formats
11
Data Files available for Download
12
TF – Target Gene Format
• Provides listing of TFs and the genes that they putatively regulate
• In some cases, the gene is the most proximal to the TF binding site especially true for ChIP-Seq regulatory sequences.
• PubMed ID and Analysis method provided as interaction evidence when
available
• Files automatically exported for all public projects
• Updated weekly
13
14
TF – Target Gene File Example
PAZAR
TF ID
TF Name
PAZAR
Gene ID
Ensembl Gene Chromosome
Accession
Gene Start Gene End
Coordinate Coordinate
TF0001078 E2F4_HUMAN
GS00121862 ENSG00000187634
1
860260 879955
Homo sapiens
E2F4_Lee
21247883
PROTEIN BINDING ASSAY::CHROMATIN
IMMUNOPRECIPITATION (CHIP)
Species
Project Name
Evidence PMID
15
Analysis Method
ChIP-Seq Peak Format
• For users who are only interested in ChIP-Seq peak data
• Provides peak information in a simple delimited format that is easy to work
with
• Files will be exported for public projects containing ChIP-Seq data and
updated weekly
16
17
ChIP-Seq Peak File Example
Peak start
coordinate
Chromosome
Peak max
coordinate
Peak end
coordinate
Score
Score type
TF ID
chr1 915920 916350 916127 195.45 MAXHEIGHT ENSG00000187961
Homo sapiens 21258399 Human Lymphoblastoid cells
Species
PMID
Cell or Tissue
18
TF Name
E2F4
PAZAR GFF Format
• GFF format describes genes and other features associated with DNA, RNA
and Protein sequences
• The PAZAR GFF format is intended to represent simple annotations
• One annotation record per line, one annotation for one sequence
• Not as comprehensive as XML files; represents a subset of total data, but
may be easier for some people to work with
• Projects containing only artificial sequences (eg. jaspar_core) follow slightly
different format. Refer to GFF format documentation for details.
• Files automatically exported for all public projects
• Updated weekly
19
20
PAZAR GFF File Example
Chromosome
Project
Name
PAZAR
Feature ID
Sequence Sequence
start
end
coordinate coordinate
Score
Strand
chr12 E2F4_Lee RS0293021
82752225
82752317
. + .
sequence="CA…AT";db_seqinfo="ENSEMBL:60_37E";db_geneinfo="ENSEM
BL:ENSG00000127720:C12ORF26 ";species="HOMO SAPIENS";
db_tfinfo="EnsEMBL_transcript:ENST00000394351:E2F4_HUMAN";analysis
_name="ANALYSIS 1";analysis_comment="0";cell_type="HUMAN
LYMPHOBLASTOID (GM06990) CELLS :HOMO
SAPIENS";pmid="21258399";method="PROTEIN BINDING
ASSAY::CHROMATIN IMMUNOPRECIPITATION
(CHIP)";evidence="CURATED"
21
Frame
Mandatory
Attributes
Optional
Attributes
PAZAR XML Format
• Extensible Markup Language (XML) is a markup language that defines a set
of rules for encoding documents
• PAZAR XML schema defined for capturing data and relationships
• Comprehensive and flexible enough to capture many types of data
• Files automatically exported for all public projects
• Updated weekly
22
23
Sample PAZAR XML File
<reg_seq
pazar_id="rs_0022"
quality="TESTED"
sequence="CGGGCTCTCCGACCCACGGGTCACTTTTGACAGCTGGCCTGAGTCCTGCCTGGTGGAAACCCCTCCTGGGAGGCTGGAGCCAGCACCAGGGCCCACGTGTGCTT
CACCTTGAAGCCTGAGGACACAGACTCTCCGGCAATCACATAGCCCATGTTGAGGACGCTGCCTTCAATGGAGCACGTGATCATGGACGCCACGCCAGTGCCCATGAGGGTGA
GGGTGAGCGTGCCTCTCTTGGTGATGATGTCCAG"
tfbs_name="">
….
<peak
maxcoord="1857289"/>
</reg_seq>
<funct_tf
funct_tf_name=“E2F4_HUMAN"
 Understanding of schema and
parsing work required to
extract ChIP-Seq data
pazar_id="fu_001">
<tf_unit
pazar_id="tu_0001"
tf_id="tf_001"/>
</funct_tf>
<interaction
pazar_id="in_00063"
quantitative="299.77"
scale="MAXHEIGHT"/>
24
PAZAR API Overview
• Application Programming Interface (API) facilitates programmatic retrieval of
data contained in 'Published' or 'Open' projects as well as user's own
restricted projects.
• Provides a mechanism for automating bulk data retrieval in a customized
fashion
• Provides Methods to make it easier to work with data once it has been
retrieved
• Object – oriented -> data types within the system can be mirrored as
objects in code
• Uses the perl programming language
• Uses SOAP communication protocol for transferring data
25
SOAP
• Simple Object Access Protocol (SOAP) is a protocol for exchanging
structured information between networked computers. It relies on
Extensible Markup Language (XML) for its message format.
• Communication done using http as transport layer, can be used on any
Simple Object Access Protocol (SOAP) is a protocol for exchanging structured
network that permits web browsing
information between networked computers. It relies on Extensible Markup
Language
(XML)sends
for its message
• Client
computer
requestsformat.
to server, which performs functions to
- Communication
over http,
be used
onclient
any network that permits web
retrieve
data fromdone
database
andcan
return
it to
browsing
• Code to perform functions resides on the server, so client only needs to
send requests in order to receive data
26
Benefits of using SOAP
• Users do not have to worry about installing the API code on their computer
• Updates to newer API releases involve minimal effort
• Can be used across firewalls where only web browsing is permitted
• Transparent – users don't have to learn new syntax or change the way they
code
• Language independent, but yet to be further developed and tested with
programming languages other than perl in mind
• Data privacy can be managed by the PAZAR team since authentication is
done on the server side
27
Data Privacy
• Access to data through API same as through website. PAZAR username and
password must be supplied to retrieve data from personal restricted projects.
• Authentication is performed on PAZAR server
Request Parameters
Access To Restricted Data
Access To Public Data
Correct user/password
and user is a member of
specified project
Results from specific restricted
project being queried
Results from all public projects
Incorrect username / password
combination
or invalid project name
or user not a member of
specified project
Results from all public projects
only
Project status is open or
published
Results from specific public
project being queried and all
other public projects
(authentication not required)
28
PAZAR API Classes
pazar class - handles authentication and contains general methods for retrieving data and creating
instance objects of other classes
- a PAZAR object must always be created first. It is supplied to all methods in other classes.
pazar::project - handles project information
pazar::dbsource - handles information source data
pazar::gene - handles gene information
pazar::reg_seq - handles regulatory sequence information
pazar::tf - Transcription Factor meta information and general methods for retrieving TF-related
information
pazar::tf::tfcomplex - handles Transcription Factor complex information
pazar::tf::subunit - handles Transcription Factor subunit information
pazar::tf::target - handles Transcription Factor target (regulatory sequence, artificial sequence or
binding site matrix) information
pazar::transcript - handles transcript information
pazar::tsr - handles transcription start region information
29
PAZAR API Documentation
www.pazar.info/apidocs
30
API Setup
1. Install perl library SOAP::Lite v 0.60a by Paul Kulchenko .
• Later versions of SOAP::Lite maintained by different author and not compatible with the PAZAR
API.
• Can be downloaded from link in the PAZAR API user guide at
http://www.pazar.info/apidocs/userguide.html
• Also available for download from CPAN at http://search.cpan.org/~byrne/SOAP-Lite-0.60a
• SOAP::Lite installation should follow standard procedures
2. Include the following at the top of your script, before any code that makes use of the API
use SOAP::Lite +autodispatch =>
uri => 'http://www.pazar.info/pazar',
proxy => 'http://www.pazar.info/cgi-bin/API0.01/pazarserv.cgi';
• Any code that follows will automatically make use of PAZAR API modules via SOAP; no additional
modules need to be installed on the client side.
• API0.01 may be replaced by a newer release number when available (eg. API0.02), to use the
newer API
• Older API releases will continue to be in service after newer releases have been made available
31
Sample Perl Code Using PAZAR API
#!/usr/bin/perl
use SOAP::Lite +autodispatch =>
uri => 'http://www.pazar.info/pazar',
Setup
proxy => 'http://www.pazar.info/cgi-bin/API0.01/pazarserv.cgi';
# change [email protected] and yourpass to values for your own PAZAR account
my $pazar = new pazar(-pazar_user=>'[email protected]', -pazar_pass=>'yourpass');
my $proj = pazar::project::get_by_name(‘Demo',$pazar);
print $proj -> status ."\n";
print $proj -> id ."\n";
print $proj -> project_name ."\n";
print $proj -> description ."\n";
my $project_name = $proj -> project_name;
my $project_num=$proj->id;
my @funct_tfs = $pazar->get_all_complex_ids($project_num);
print "num tf complexes: ".scalar(@funct_tfs)."\n";
32
Future PAZAR API Development
• API testing with other programming languages such as Java and Python
• Expansion of variety of classes and methods offered
• Further support for ChIP-Seq data handling
• Update and import of data
33
Recap
Browsing through current data online
•
Web interface
Data Files Available for download
•
TF- target Gene list
•
ChIP-Seq peak files
•
GFF
•
XML (all data)
Bulk retrieval of most current data in customized way through programmatic
approach
•
PAZAR API
34
Q&A


Please take a moment to type PAZAR-related questions/comments into the
Chat box.
The questions will be answered shortly.
35