An introduction to Oreganno
Download
Report
Transcript An introduction to Oreganno
An Introduction to ORegAnno
Stephen Montgomery, WTSI
[email protected]
Open Regulatory Annotation Database
http://www.oreganno.org/
Community-driven annotation of essential
information regarding regulatory sequences
reported in scientific literature.
Open access, open source,
programmatically-amenable
Community components
User Management
Literature Management
Record Annotation
Quality Control
Data Access
User Management
>600 different IP addresses / month
154 user accounts, 7% contribute annotation
Users are authenticated by email, autogenerated image recognition and passwords
are immediately encrypted.
Literature Management
Publication Queue
PubMED IDs are added by any user in any
quantity.
Literature Management
Literature Management
Searching
Web Services
getQueueSearchFields()
searchQueue(String search_field_restriction, String
search_query_restriction, String user_restriction, String
state_restriction, String evidence_restriction, String
tf_type_restriction, String score_restriction)
fetchQueuedPublication(String pubmed_id)
Record Annotation
Database contents (ORegAnno, Nov 2006)
735 unique publications annotated (+15, Friday 24th November)
REGULATORY
REGION
TRANSCRIPTION
FACTOR BINDING
SITE
REGULATORY
POLYMORPHISM
REGULATORY
HAPLOTYPE
Homo sapiens
817 (+5)
251 (+21)
174
6
Caenorhabditis elegans
13
191
Mus musculus
20 (+1)
138 (+4)
Rattus norvegicus
14 (+5)
80 (+6)
Xenopus laevis
1
1
Xenopus tropicalis
2
Gallus gallus
8
39
Drosophila melanogaster
677
1350 (+4)
Danio rerio
2
Caenorhabditis briggsae
Saccharomyces cerevisiae
24
1
Oryctolagus cuniculus
Bos taurus
7
1
1
1
Record Annotation
Annotating records 1 of 3 (TFBS)
Annotating records 2 of 3
Annotating records 3 of 3
Record Annotation
Searching
Web Services
getSearchFields()
searchRecords(String search_field, String query)
fetchRecord(String record_stable_id)
Quality Control
User roles: USER, VALIDATOR, ADMIN
USER
VALIDATOR
Add new records
Comment on all records
Score records
Deprecate records
ADMIN
Add new evidence and meta terms
Add new datasets
Batch upload data
Data Access
Open MySQL access
XML file dumps
Web Services (Queue and Records)
Search Engine (Lucene search)
Entire Website code available as Open
Source on CVS
https://oreganno.dev.java.net/
Extra assistance
Tools
Contains basic tools for fetching sequence data
from NCBI and EnsEMBL.
Example: TFBS exists at -543 to -538, sequence
CCGCCC, use NCBIFETCH or ENSFETCH
Help
Contains walkthroughs and descriptions of
various components of ORegAnno.
ORegAnno Technology
291 classes, 42 resources, 80 web pages
Use state-of-the-art in open source web application
technology to support rapid deployment and
development:
AJAX (seamless web interaction),
Struts (MVC architecture),
Hibernate (minimal SQL, database management,
connection pooling),
Lucene (well-established, sophisticated search engine),
JAAS (secure user authentication system),
Tomcat (well-used and scalable web application engine),
Quartz (event scheduling).
Integration with DAS for EnsEMBL and embedded in
UCSC
Ongoing Challenges
Encouraging annotation
Annotator effort versus outcome
~1-2 hour per publication
Principals of MIAME
Building consensus
Capturing new publications, larger input sets
Availability of standardized data
Acknowledgements
Obi Griffith (CMSGSC)
Bryan Chu (UBC)
Casey Bergman (U. of Manchester)
Dominique Vlieghe (Pronota NV)
Steven Jones (CMSGSC)
Steven Gallo (Univ. at Buffalo)
Marc Halfon (Univ. at Buffalo)
Manolis Dermitzakis (WTSI)
Monica Sleumer (CMSGSC)
Misha Bilenky (CMSGSC)
Erin Pleasance (WTSI)
Yuliya Prychyna (UBC)
Xin Zhang (UBC)
Belinda M. Giardine (PSU)
Ross Hardison (PSU)
Open source data providers, annotators and user community
FUNDING SUPPORT
CIHR, NSERC, MSFHR, EMBO, Genome BC,
Genome Canada, and the BC Cancer Foundation.