An introduction to Oreganno

Download Report

Transcript An introduction to Oreganno

An Introduction to ORegAnno
Stephen Montgomery, WTSI
[email protected]
Open Regulatory Annotation Database
http://www.oreganno.org/

Community-driven annotation of essential
information regarding regulatory sequences
reported in scientific literature.

Open access, open source,
programmatically-amenable
Community components





User Management
Literature Management
Record Annotation
Quality Control
Data Access
User Management



>600 different IP addresses / month
154 user accounts, 7% contribute annotation
Users are authenticated by email, autogenerated image recognition and passwords
are immediately encrypted.
Literature Management

Publication Queue

PubMED IDs are added by any user in any
quantity.
Literature Management
Literature Management

Searching

Web Services



getQueueSearchFields()
searchQueue(String search_field_restriction, String
search_query_restriction, String user_restriction, String
state_restriction, String evidence_restriction, String
tf_type_restriction, String score_restriction)
fetchQueuedPublication(String pubmed_id)
Record Annotation
Database contents (ORegAnno, Nov 2006)
735 unique publications annotated (+15, Friday 24th November)
REGULATORY
REGION
TRANSCRIPTION
FACTOR BINDING
SITE
REGULATORY
POLYMORPHISM
REGULATORY
HAPLOTYPE
Homo sapiens
817 (+5)
251 (+21)
174
6
Caenorhabditis elegans
13
191
Mus musculus
20 (+1)
138 (+4)
Rattus norvegicus
14 (+5)
80 (+6)
Xenopus laevis
1
1
Xenopus tropicalis
2
Gallus gallus
8
39
Drosophila melanogaster
677
1350 (+4)
Danio rerio
2
Caenorhabditis briggsae
Saccharomyces cerevisiae
24
1
Oryctolagus cuniculus
Bos taurus
7
1
1
1
Record Annotation
Annotating records 1 of 3 (TFBS)
Annotating records 2 of 3
Annotating records 3 of 3
Record Annotation

Searching

Web Services



getSearchFields()
searchRecords(String search_field, String query)
fetchRecord(String record_stable_id)
Quality Control

User roles: USER, VALIDATOR, ADMIN

USER



VALIDATOR



Add new records
Comment on all records
Score records
Deprecate records
ADMIN



Add new evidence and meta terms
Add new datasets
Batch upload data
Data Access





Open MySQL access
XML file dumps
Web Services (Queue and Records)
Search Engine (Lucene search)
Entire Website code available as Open
Source on CVS

https://oreganno.dev.java.net/
Extra assistance

Tools



Contains basic tools for fetching sequence data
from NCBI and EnsEMBL.
Example: TFBS exists at -543 to -538, sequence
CCGCCC, use NCBIFETCH or ENSFETCH
Help

Contains walkthroughs and descriptions of
various components of ORegAnno.
ORegAnno Technology


291 classes, 42 resources, 80 web pages
Use state-of-the-art in open source web application
technology to support rapid deployment and
development:








AJAX (seamless web interaction),
Struts (MVC architecture),
Hibernate (minimal SQL, database management,
connection pooling),
Lucene (well-established, sophisticated search engine),
JAAS (secure user authentication system),
Tomcat (well-used and scalable web application engine),
Quartz (event scheduling).
Integration with DAS for EnsEMBL and embedded in
UCSC
Ongoing Challenges


Encouraging annotation
Annotator effort versus outcome





~1-2 hour per publication
Principals of MIAME
Building consensus
Capturing new publications, larger input sets
Availability of standardized data
Acknowledgements





Obi Griffith (CMSGSC)
Bryan Chu (UBC)
Casey Bergman (U. of Manchester)
Dominique Vlieghe (Pronota NV)
Steven Jones (CMSGSC)










Steven Gallo (Univ. at Buffalo)
Marc Halfon (Univ. at Buffalo)
Manolis Dermitzakis (WTSI)
Monica Sleumer (CMSGSC)
Misha Bilenky (CMSGSC)
Erin Pleasance (WTSI)
Yuliya Prychyna (UBC)
Xin Zhang (UBC)
Belinda M. Giardine (PSU)
Ross Hardison (PSU)
Open source data providers, annotators and user community
FUNDING SUPPORT
CIHR, NSERC, MSFHR, EMBO, Genome BC,
Genome Canada, and the BC Cancer Foundation.