Two Examples of Open Source Software Developed

Download Report

Transcript Two Examples of Open Source Software Developed

RMLL visits at CERN – July 2012
What is it used for?
•
•
•
•
Depositing
Archiving
Organizing
Disseminating
•
~350GB of PDFs at CERN
~20TB of images and videos
1M records
Any type of
document
What is
‣
Integrated Digital Library / Repository software
‣
A platform of choice for managing documents in HEP
‣
also adopted in other fields (medium to big
repositories)
‣
Web application
‣
Open-source GPL-2 project
‣
LAMP stack: Python (mostly), MySQL and Apache
‣
Based on open standards
MARCXML, OAI-PMH, OpenURL, OpenSearch, etc.
‣
Flexible, scriptable
Invenio’s gears
•
•
•
•
•
Lots of Python, with a sprinkle of C and Lisp(!)
630K lines of Python code
MySQL ISAM for storing data
Native indexing engine
Apache + mod_wsgi + mod_xsendfile
Invenio’s History
1954 CERN library starts paper dissemination of preprints (early Open
Access)
1965 First computers at CERN library to help with cataloging
1990 Electronic distribution of preprints via FTP
1993 CERN Preprint Server, web front-end of electronic preprint
catalogue. Institutional repository
1996 CERN Library Server (weblib): added books, periodicals and
"other material”.
2000 CERN Document Server: multimedia material, internal notes
2002 First public release of the software under GNU-GPL.
Worldwide installations and collaborations
Open Access at CERN
•
“Consistent with the stated position of the Collaborations and the General Conditions applicable
to Experiments at CERN, every effort will be made to publish papers under Open Access
conditions, as defined by the SCOAP3 initiative. As at the date of this document, the Creative
Commons Attribution ("cc by") license meets these conditions.”
•
OA at CERN has a long history, the CERN Convention of 1953 states:
"...the results of its experimental and theoretical work shall be published or otherwise made
generally available".
Our development Environment
•
•
•
•
Git distributed version control system
Trac for ticket tracking
VirtualBox + Vagrant for testing
deployment
We develop on SLC5/6 (based on
RHEL5/6), on Ubuntu, on Debian…
Quality Assurance
•
Coding standards
•
•
Documentation
•
•
"If the code and the comments disagree, then both are probably wrong."
– attributed to Norm Schryer
Test suite
•
•
Eg. PEP8 (Style Guide for Python), etc.
~1,000 unit/regression/web tests
Security
•
XSS, CSRF, SQL injection, etc.
•
Code review
•
Kwalitee check: "measuring" quality
•
"It looks like quality, it sounds like quality, but it’s not quite quality.”
– CPAN Testing Service (quoting Michael Schwern)
Our community
•
30 institutions worldwide
• CERN + DESY + Fermilab + SLAC
• EPFL …
• ADS and arXiv joining forces
• Translated so far into 26 languages
• 45 committers (in the last year)
• Free + Paid support
An example installation
•
•
1 Load balancer (HAProxy + Apache
mod_proxy + mod_evasive)
5 Worker nodes:
•
•
•
•
•
2 VMs for static files
3 Real machines for Python handled requests
2 DB nodes (MySQL master + MySQL
replica)
AFS distributed FS for backups and file storage
Sustained recent Higgs announcement load
(230 requests per second with peaks of 800
req/s)
What’s next?
•
•
•
•
Werkzeug/Flask + Jinja2 + WTForms for the
web framework
SQLAlchemy for DB abstraction
Twitter Bootstrap + jQuery for the style
Optional Solr indexing
•
•
•
History and Features
Technologies
Development
What is Indico ?
•
•
•
Web-based event organization
Archive of events metadata and related
documents (minutes, slides, etc)
Booking service and collaboration hub
•
•
•
Rooms
Videoconference
Webcast
What is Indico ?
•
Started as an European Project - 2002
•
•
•
•
First time used in 2004
In production at CERN: http://indico.cern.ch
And in >100 institutions around the world
•
GSI, DESY, Fermilab,…
•
http://indico-software.org/wiki/IndicoWorldWide
Free and Open Source
Indico @ CERN
•
•
•
> 170.000 events
> 700.000 presentations
> 900.000 files
Event Management with Indico
•
All kinds of events
Managing Simple Events
Managing Meetings
Managing Conferences
Managing Conferences
•
Full Lifecycle
Managing Conferences
Collaboration Hub
•
Room Booking
Collaboration Hub
•
Collaboration service requests:
Videoconference, webcast, recording
Technology
•
Python >2.6 + WSGI
•
babel, webassets, pytz, zope.index,
zope.interface, simplejson, suds, lxml, zc.queue, ​py
thon-dateutil, pypdf, pyatom, reportlab, etc
•
​Mako 0.4.1+ as template engine
ZODB as underlying database
(http://www.zodb.org/)
•
Web frameworks:
•
•
•
jQuery
Backbone.js
Infrastructure
Compatibility
•
Many browsers compatibility: IE8+, FF3.6+,
GChrome, Safari, etc
•
Working on mobile version
Development Tools
•
•
•
•
•
•
•
Git as Control Version
System
~ Eclipse + PyDev
Unit and Selenium Test +
Jenkins (Continuous
Integration Server)
Sphinx for Documentation
Trac as Project Site
Github: http://github.com/indico
Transifex for i18n:
https://www.transifex.com/projects/
p/indico/
What’s Next ?
•
•
Enhance the software: v1.0 end of 2012
Enlarge the community: more advertising
Questions?