Transcript Slide 1

If you have the Content,
then Apache has the
Technology!
A whistle-stop tour of the
Apache content related projects
Nick Burch
Software Engineer
Alfresco
Apache Projects
• 79 Top Level Projects
• 40 Incubating Projects
• 30 “Content Related” Main Projects
• 7 “Content Related” Incubating
Projects
37 Projects in 50 minutes
With time for questions...
This is not a comprehensive guide!
Different Technologies
•
•
•
•
•
•
Serving
Storing
Transforming
Generating
Hosting
Web Framework Rendering /
Templating / etc
What can we get in 50 mins?
• A quick overview of each project
• When talks on the project are
happening
• When meetups on the project are
happening
• Anything new/exciting about the
project?
• What interests me in the project!
Serving up
your Content
Apache HTTPD Server
•
•
•
•
•
•
•
http://httpd.apache.org/
Talks – All day Wednesday
Meetup – Thursday evening
Very wide range of features
(Fairly) easy to extend
Can host most programming
languages
Can front most content systems
Can proxy your content applications
Can host code and content
Apache TrafficServer
•
•
•
•
•
•
http://trafficserver.apache.org/
High performance web proxy
Forward and reverse proxy
Ideally suited to sitting between your
content application and the internet
For proxy-only use cases, will probably
be better than httpd
Fewer other features though
Often used as a cloud-edge http router
Apache Tomcat
•
•
•
•
http://tomcat.apache.org/
Talks – All day Friday!
Java based, as many of the Apache
Content Technologies are
Java Servlet Container
And you probably all know the rest!
Tomcat – What's New
•
•
•
•
•
http://tomcat.apache.org/
Memory leak detection – for your
applications, and for the JVM!
Easier to embed – no need for large
numbers of config files!
Asynchronous request processing for
things like Comet / Bayeux
Servlet 3.0
Improved JMX configurability
Storing all
that Content
Apache Cassandra
•
•
•
•
•
•
http://cassandra.apache.org/
Talk - 11am Wednesday
Meetup - Wednesday evening
One of our many NoSQL Databases
Column-Family store
Eventually consistent
Distributed, replicating, no SPF
Can elastically add machines
Apache CouchDB
•
•
•
•
•
•
•
http://couchdb.apache.org/
12pm Wednesday
Relax!
Erlang
NoSQL
Document orientated distributed store
Eventually consistent if replicating
Map-Reduce queries
Apache HBase
•
•
•
•
•
•
•
http://hbase.apache.org/
2pm Wednesday
Recently graduated from Hadoop
Another NoSQL Database
Column-Family store, modelled on
Google's Big Table paper
Some transactions and locking
Fast range queries and sorting
Built on HDFS
Which Apache NoSQL?
• Do you have tuples, documents,
variable key/values or complex object?
• Must data always be consistent?
• If you loose a chunk of machines
(partition), should read/write still work?
• Query by id, range, arbitrary key/value
or map-reduce function?
• How much human interaction is
required to add or remove nodes?
Apache DB: Derby
•
•
•
•
•
•
http://db.apache.org/derby/
Small, easy to embed SQL database
Can be embedded and accessed via
an embedded JDBC driver
Can be accessed over the network
Can be run entirely in-memory
Efficient on-disk format
Has a JavaME version – run it on basic
cell phones!
Apache Directory
•
•
•
•
•
•
http://directory.apache.org/
LDAP Directory
Optimised for many reads per write
Hierarchical, class/attribute based
storage
Triggers, stored procedures, queries
and views
Multi-master replication
Rich permissions model built in
Apache JackRabbit
•
•
•
•
•
•
•
http://jackrabbit.apache.org/
1.30pm Thursday
JCR (Java Content Repository)
Hierarchical content store
Supports structured and unstructured
data
Transactional
Support versions
Full text search built in
Apache Lucene
•
•
•
•
•
•
http://lucene.apache.org/
All day Friday + Meetup Tuesday night
Inverted index store
(Each term lists it documents, rather
than each document listing terms)
Searching is faster than adding
Normally stores text, but additional
data can be associated with it
Can hold indexed and un-indexed data
Lucene – What's New?
•
•
•
•
•
http://lucene.apache.org/
Lucene and SOLR have merged
Near real-time support when indexing
Better storing of attributes and other
data in the token stream
Numeric fields improved – no need to
externally process numbers into range
buckets yourself
Fast vector highlighter for large docs
Apache Subversion
•
•
•
•
•
•
http://subversion.apache.org/
Meetup Thursday evening
Versioning content store
Efficient at storing changes
Normally stores code, text and the odd
binary blob
If you have textual data and you want
a versioning store, it's a good fit!
Used by the new Apache CMS
Apache Xindice
•
•
•
•
•
http://xml.apache.org/xindice/
Native XML Database
No need to map your complex XML
files to a different data structure
Ideally suited to problems where you
have large numbers of XML files, and
little / no other content
Schema independent model
XPath queries
Transforming and
Reading Content
Apache PDFBox
•
•
•
•
•
•
http://pdfbox.apache.org/
4pm Wednesday
Read, Write, Create and Edit PDFs
Create PDFs from text
Fill in PDF forms
Extract text and formatting (Lucene,
Tika etc)
Edit existing files, add images, add text
etc
Apache POI
•
•
•
•
•
•
•
http://poi.apache.org/
3pm Wednesday + FastFeatherTrack
File format reader and writer for
Microsoft office file formats
Support binary & ooxml formats
Strong read edit write for .xls & .xlsx
Read and basic edit for .doc & .docx
Read and basic edit for .ppt & .pptx
Read for Visio, Publisher, Outlook
Apache Tika
•
•
•
•
•
http://tika.apache.org/
9am Friday + Fast Feather Track
Java (+ command line) toolkit for
detecting and extracting content
Identifies what a blob of content is
Gives you consistent metadata back
for it
Parses the contents into plain text,
HTML, XHTML or sax events
Tika – What's New?
•
•
•
•
http://tika.apache.org/
Lots of new parsers – text, office
formats, publishing formats, images,
audio, CAD, fonts etc
Long standing parsers improved –
better HTML from word for example
Embedded resources and containers
Use expanding – used by many SOLR
users, Alfresco, lots of people
crunching masses of data on Hadoop
Apache Cocoon
•
•
•
•
•
•
http://cocoon.apache.org/
Component Pipeline framework
Plug together “Lego-Like” generators,
transformers and serialisers
Generate your content once in your
application, serve to different formats
Read in formats, translate and publish
Can power your own “Yahoo Pipes”
Modular, powerful and easy
Apache Xalan
•
•
•
•
•
•
•
http://xalan.apache.org/
XSLT processor
XPath engine
Java and C++ flavours
Cross platform
Library and command line executables
Transform your XML
Fast and reliable XSLT transformation
engine
Apache XML Graphics: Batik
•
•
•
•
•
•
http://xmlgraphics.apache.org/#batik
Java SVG toolkit + library
SVG Parser – read and process
existing SVG files
SVG Generator – Graphics2D
implementation that outputs SVG
SVG Dom – easy way to manipulate
your SVG files
SVG viewer program (Squiggle)
Command line SVG rasteriser
Apache XML Graphics: FOP
•
•
•
•
http://xmlgraphics.apache.org/#fop
XSL-FO processor in Java
Reads W3C XSL-FO, applies the
formatting rules to your XML
document, and renders it
Output to Text, PS, PDF, SVG, RTF,
Java Graphics2D etc
Lets you leave your XML clean, and
define semantically meaningful rich
rendering rules for it
Apache Commons: Codec
•
•
•
•
http://commons.apache.org/codec/
Commons Track – Thursday Morning
Encode and decode a variety of
encoding formats
Base64, Hex, Phonetic and URLs
Handy when interchanging content
with external systems
Apache Commons: Compress
http://commons.apache.org/compress/
• Commons Track – Thursday Morning
• Standard way to deal with archive
formats
• Read and write support
• zip, tar, gzip, bzip, cpio and ar
• Wider range of capabilities than
java.util.Zip
• Common API across all formats
Apache Commons: Sanselan
•
•
•
•
•
•
•
http://commons.apache.org/sanselan/
Commons Track – Thursday Morning
Pure Java image reader and writer
Fast parsing of image metadata and
information (size, color space, icc etc)
Much easier to use than ImageIO
Slower though, as pure Java
Wider range of formats supported
PNG, GIF, TIFF, JPEG + Exif, BMP,
ICO, PNM, PPM, PSD, XMP
Generating
Content
Apache Forrest
•
•
•
•
http://forrest.apache.org/
Document rendering solution build on
top of cocoon
Reads in content in a variety of formats
(xml, wiki etc), applies the appropriate
formatting rules, then outputs to
different formats
Heavily used for documentation and
websites
eg read in a file, format as changelog
and readme, output as html + pdf
Apache Abdera
•
•
•
•
•
•
http://abdera.apache.org/
Atom – syndication and publishing
High performance Java
implementation of RFC 4287 + 5023
Generate Atom feeds from Java or by
converting
Parse and process Atom feeds
Atompub server and clients
Supports Atom extensions like
GeoRSS, MediaRSS & OpenSearch
Apache Droids (Incubating)
•
•
•
•
•
•
http://incubator.apache.org/droids/
Intelligent Robots!
Generic standalone crawler framework
Easy to extending existing common
crawlers
Easy to write custom ones
Queue requests for content, protocol
handler gets it, multi threaded
Uses Apache Tika for core of handling
fetched resources
Apache JSPWiki (Incubating)
•
•
•
•
•
•
•
http://incubator.apache.org/jspwiki/
Feature-rich extensible wiki
Written in Java (Servlets + JSP)
Fairly easy to extend
Can be used as a wiki out of the box
Provides a good platform for new wiki
based application
Rich wiki markup and syntax
Attachments, security, templates etc
Apache ManifoldCF (Incubating)
http://incubator.apache.org/connectors/
• Name has changed a few times...
(Lucene/Apache Connectors)
• Provides a standard way to get content
out of other systems, ready for sending
to Lucene etc
• Different goals to CMIS (Chemistry)
• Uses many parsers and libraries to talk
to the different repositories / systems
• Analogous to Tika but for repos
Apache PhotArk (Incubating)
•
•
•
•
•
•
http://incubator.apache.org/photark/
5pm Thursday
Open Source Photo Gallery application
Standalone or servlet modes
Can host photos locally
Can aggregate external photo albums
(Flickr, Picassa) for a unified view
SCA programming model – uses
Apache Tuscany to power it
Hosting
Content
Apache Chemistry (Incubating)
•
•
•
•
•
•
•
http://incubator.apache.org/chemistry/
2pm Wednesday
Java, Python and PHP, Atom and WS*
OASIS CMIS (Content Management
Interoperability Services)
Client and Server bindings
“SQL for Content”
Consistent view on content across
different repositories
Read / Write / Manipulate content
Chemistry vs ManifoldCF
•
•
•
•
•
•
incubator /chemistry/ /connectors/
ManifoldCF treats repo as nasty black
box, and handles talking to the parsers
Chemistry talks / exposes repo's
contents through CMIS
ManifoldCF supports a wider range of
repositories
Chemistry supports read and write
Chemistry delivers a richer model
ManifoldCF great for getting text out
Apache Lenya
•
•
•
•
•
•
•
http://lenya.apache.org/
9am Thursday
XML Content Management system
Powered by Apache Cocoon
WSIWYG editors onto Relax-NG XML
Rich workflow engine + staging
Clean URLs, CSS for styling
Sensible handling of metadata, assets,
internal links, users, permissions etc
Apache Roller
•
•
•
•
•
•
•
http://roller.apache.org/
Multi-user blog server
Used by the ASF internally
Scales to thousands of users & blogs
Should work with any JavaEE servlet
container and SQL database
Comment moderation and spam filters
Each author has full layout control
Indexes, feeds and Metaweblog API
support for 3rd party clients
Apache Shindig
•
•
•
•
•
•
http://shindig.apache.org/
Open Social Application Container
Hosts your open social widgets
Renders OpenSocial applications into
HTML + JavaScript
Stores the data for your application
Full client-side JavaScript libraries to
deliver gadget functionality
Reference implementation
Apache Wookie (Incubating)
•
•
•
•
•
http://incubator.apache.org/wookie/
5.30pm Wednesday
W3C Widgets server
Upload, Deploy and Host Widgets
Widgets can range from a badge,
through a small app to a full-blown
collaborative system like chat
Connector framework to make it easy
to write widgets in many languages
Web Frameworks
(those with a strong
Content focus to them)
Apache Sling
•
•
•
•
•
•
•
http://sling.apache.org/
12pm Wednesday
“Fun” and easy web framework
REST based
Backed by Jackrabbit content repo
Powered by OSGi
Easy to script, supports multiple output
languages (JSP, server side javascript,
scala etc)
Stores both templates and content
Apache Tapestry
•
•
•
•
•
•
http://tapestry.apache.org/
Object Orientated web applications
Build your application in terms of
objects, methods and properties
Tapestry handles URLs, query
parameters and state for you
Pages built with simple HTML
Concentrate on the content that backs
each part, and the business logic for it
Tapestry glues it together for you
Apache Tiles
•
•
•
•
•
http://tiles.apache.org/
Templating framework for Java
Works well with Struts and Shale
Lets you build your page from lots of
tiles (components), which can nest
Build tiles together to make templates
Clean separation between your
content, the business logic to select it,
and the rendering rules
Apache Velocity
•
•
•
•
•
http://velocity.apache.org/
Templating engine
MVC webapp or standalone
Can generate HTML, SQL, PostScript,
XML, Java Code or email from
templates
Anakia lets you make a xdoc file
available to a velocity template, handy
when generating HTML from xdoc
Fairly rich templating language
Apache Wicket
•
•
•
•
•
http://wicket.apache.org/
Build your web applications in Java
Uses Java in preference to JavaScript,
CSS etc
Handy if you have a strong Java team
and you need to do some web stuff
Fits well with your Java components
But JS / CSS front end devs tend to be
cheaper than Java ones....
Apache Clerezza (Incubating)
•
•
•
•
•
http://incubator.apache.org/clerezza/
OSGi based modular semantic web
application framework
Lets you build applications that fit into
the Semantic Web
Stores and easily manipulates RDF
Full control over REST and URIs
Build applications that both consume
semantic data (eg RDF files), and that
expose content to others
Any Questions?
Any cool projects that
I happened to miss?