Status Report I (1 of 2)
Download
Report
Transcript Status Report I (1 of 2)
NGA Public Web Site Redesign – CQ5 Lessons Learned
November, 2012 (David Beaudet, NGA Lead Developer / Architect)
Topics
1. Technical Architecture
2. Art Data and Image Integration
3. Other Useful Nuggets
NGA Web Site Redesign – March 2013
2
NGA CQ5 Technical Architecture
Installation / Upgrade Lessons Learned
• CQ 5.4 to 5.5 is a major change – everything is OSGi – some configs buried and some
features no longer supported, e.g. RMI no longer seems like an option.
• Java 1.7 not supported by CQ 5.5 – unsure about 5.6 – Adobe claims Oracle has a
problem in 1.7 that needs fixing
• Always disable virus scanner prior to CQ5 install, particularly McAfee, as CQ5 initial
install will not complete if McAfee on-access protection is enabled. Maybe run DEV, QA,
and Prod on Linux to avoid having to install a virus scanner ?
• webDAV doesn’t seem to be completely supported; compounded by Windows 7
shipping with broken webDAV client.
• Lucene index occasionally misses nodes – limited to DEV / QA systems? Reindexing
may be an option – we have succesfully reindexed.
• Start / stop scripts need modification to work properly on Linux
• 5.6 seems to have some user interface screens missing, e.g. under Tools. We’re sticking
with 5.5 SP 2.1 for now – any opinions about 5.6?
• After fresh install, one finds quickly that default memory is insufficient
NGA Web Site Redesign – March 2013
3
NGA CQ5 Technical Architecture
CQ5 Server Configuration and Operational Concerns
• one RHEL 6 VM guest for each of DEV and QA (16 GB RAM / 4 CPUs)
• one RHEL 6 VM guest each for prod author and prod publish (16 GB / 4 CPUs)
• Lessons Learned
• allocate plenty of memory to the JVM for each instance (we use between 4 and 8 GB
depending) – use 64 bit operating system otherwise processes are limited to 4GB each
• allocate plenty of storage up front – CQ doesn’t perform JCR garbage collection by
default – CQ5 halts itself when insufficient storage space is detected – 100GB min
• change the default temp directory of CQ5 in start script configuration to a volume with
plenty of space
• tar persistence is highly I/O intensive - use fastest reliable storage – anyone using
SSD? We are using iSCSI on 1 Gbps network
• Java patches need to be managed – system library links are severed as part of RHN
updates which can lead to CQ5 errors if CQ5 is left running during the patch.
• Schedule datastore garbage collection and use incremental backups to minimize
impact of backups
NGA Web Site Redesign – March 2013
4
NGA CQ5 Technical Architecture
CQ5 Clustering
• Set up is easy, but result is risky (CQ5 admin class) and expensive
• When clustering, understand the types of clustering available and make the right
choice based on your requirements (e.g. shared JCR files or synchronized
copies).
• Conclusions
• Frequently, the complexities inherent in software clusters create more downtime than
clusters protect against. For small shops especially, that’s almost always the case.
• Active / active requires two CQ5 licenses; active / passive might not (ask Adobe).
NGA Web Site Redesign – March 2013
5
NGA CQ5 Technical Architecture
Web Server Configurations
• NGA uses CQ5 behind Apache – HTTPS for author, both schemes for publish
– dispatcher only used for publish.
• Lessons Learned
• Configure mod_proxy to prevent Apache from becoming an open proxy server
• SSL configuration in CQ5 requires installing a server certificate into the Java keystore.
Unfortunately, Oracle doesn’t include the full set of CA certs that browers do, so your
CA’s cert might have to be added to the Java configuration. Moreover, Java updates
can replace the keystore which means you have to reinstall the CA cert every time
Java is updated. When getting your server’s CSR signed, select a vendor whose CA
cert is included out of the box in Java.
• Consider creating Apache and dispatcher configuration file templates and automating
the creation and distribution of the configs to avoid differences between environments.
• Use Apache rather than CQ5 for index page redirection; generally speaking, minimize
the number of requests that have to be sent to CQ5 publish instance
• Urge developers to test through DEV dispatcher before requested code promotion
as filter rejects are easy to identify; create a CGI to expose a long tail of DEV author,
DEV publish, and DEV dispatcher logs so developers can see errors.
NGA Web Site Redesign – March 2013
6
NGA CQ5 Technical Architecture
Dispatcher Configuration
• NGA using ver 4.1.2, but 4.1.3 available for download
• Lessons Learned
• Dispatcher is an application firewall, so it’s probably best to filter out by default
and include only the URL patterns that are needed
• Check for the text “reject” in dispatcher.log to identify requests that were rejected
by the dispatcher filters
• Differentiating between the scheme (http vs. https) of the original browser request is
impossible to do when proxying the connection to CQ (either with mod_proxy or
dispatcher) because the scheme reported by JSP is the scheme of the proxy, not
the browser.
• Latest dispatcher (4.1.3?) claims to support native HTTPS without require stunnel.
Even so, the latest dispatcher still might not support multiple configurations within a
single Apache instance, so you might have to install two separate Apache
instances on the same machine to detect the scheme properly.
• The dispatcher options for flushing the cache are quite limited – you might have
to write your own CGI to intercept dispatcher flush requests and handle them
yourself – or go with a very small number of stat file levels – we’re still figuring this
out.
NGA Web Site Redesign – March 2013
7
NGA CQ5 Technical Architecture
Author Instance Authentication
• NGA looked briefly at SSO with Apache / mod_auth* modules - we use it for
other web-based systems; the security model (trusted credential) has been
deprecated due to potentially serious security problems
• NGA opted for LDAP “out of the box” / we will be transitioning to LDAPS now
that it’s working over unsecured LDAP.
• Lessons Learned
• CQ5 trusted credentials attribute is being phased out as indicated in
JackRabbit log files – don’t depend on it - unclear what, if any, mechanism will
replace it.
• That said, Kerberos (or most any other kind of Apache supported authentication)
can be integrated with Apache simply by setting a certain HTTP header in the
request after Apache authentication is performed. However, there are security
issues with this since headers can be spoofed easily – under current SSO, you
must not expose CQ5 instances outside of the local machine.
• LDAP (prior to CQ 5.6 I believe) has a bug related to case sensitivity of user
names – it’s fixed by a patch – OR – users need to kill the session cookies that
are created after their initial login using LDAP.
NGA Web Site Redesign – March 2013
8
NGA CQ5 Technical Architecture
Publish Instance Authentication
• We support native CQ5 users and social users for public authentication
• Native CQ5 users are the same users used by the WCM tools, so be sure the
default group assignments do not include WCM permissions
• oAuth uses the deprecated trusted credentials attribute – so error log fills
with warnings about deprecation of trusted credentials attribute
• Lessons Learned
• oAuth is not a trivial endeavor – there are lots of details and configurations to
keep up with plus you’ll be writing some custom code to secure it anyway.
• Create specific logging configuration for the oAuth module to send repetitive
deprecation warnings to a separate log file or to /dev/null
• CQ5.5 SP 2.1 breaks oAuth due to ACL on /etc/cloudservices – you have to adjust
it to permit read to /etc/cloudservices/facebook and twitter directories.
• HTTPS vs. HTTP and cookie security – might have to get Apache involved to
logout users if unsecured cookie transmission is detected – we’re still working on
this.
NGA Web Site Redesign – March 2013
9
NGA CQ5 Technical Architecture
Secure URLs
• Need HTTPS for certain form submissions
• Avoid abuse of forms that generate e-mail
• Lessons Learned
• Force HTTPS using Apache and / or with JSPs (if possible) – still figuring out
• Use captcha to avoid getting on an e-mail blacklist
User Form Submissions
• Need removal of data from publish instances due to gov’t privacy rqmnts
• Lessons Learned
• Relatively straightforward workflows and workflow launchers can achieve this
(auto-reverse-replication and content deactivation); form authors need training
• Potential security issue with out of the box form handler (path specification for
form element name) – need to lock down permissions for anonymous user and
public authenticated users
NGA Web Site Redesign – March 2013
10
NGA CQ5 Art Data and Image Integration
NGA Art Data – key content for the web site
• 110k+ art objects, 20k artists, other entities – highly complex relationships
stored in external relational database tables
• Redesigned web site has complex art data searches, sort orders, and
faceting of art data search results
• Data must be synchronized – Art object data changes regularly and new
objects are added nearly every day
Images of Art Objects – key to a visually rich web site
• Images of the art – zoom files are very large, multiple sizes are required,
color profile management, specific algorithms required for resizing, must
support addition of manually cropped renditions
• Image associations with art objects are managed in a different system
• Images must be synchronized – images change regularly and new images
are added frequently
NGA Web Site Redesign – March 2013
11
NGA CQ5 Art Data and Image Integration
For art object data, we attempted to build a prototype that would feature:
• automatic synchronization of data between relational database and JCR
• a JCR content hierarchy able to accommodate 1 million+ art data entities and
relationships between them without sacrificing data richness or overloading the JCR
• Efficient JCR SQL2 queries that return distinct result sets in a highly customized sort
order along with custom facets
Findings
• Relational database to JCR mapping isn’t straightforward. We realized that in order for
JCR queries to perform well and in order to avoid duplicate search results, we would
have to store multiple copies of the art object data in multiple JCR hierarchies,
possibly many depending on the specific queries.
• Custom facet extractors in CQ5 must use the QueryBuilder API – none of the other
search APIs support it – and the QueryBuilder implementation requires ht be a
requirement of all CQ5 searches actually since CQthat all nodes of the result set be
visited to accumulate facets. This mig5 permissions do not seem to be indexed in
Lucene – so the larger the result set size, the slower the query performance –
others customers have reported similar findings.
• A solution involving duplication of data into an unknown number of content hierarchies was deemed
to be impractical.
NGA Web Site Redesign – March 2013
12
NGA CQ5 Art Data and Image Integration
What we did
• As the complexity of the art object requirements grew, we studied the feasibility of
loading all art data and image meta-data into memory. We optimized the memory
footprint of those structures and found that it was feasible. The art data bundle loads
and caches in RAM approximately 1GB of data. Searches, faceting, and sorting of this
data is lightening fast.
• What about free text search? We realized that re-implementing the features of Lucene
to support free-text search was not something we wanted to do, so we also built a small
module to replicate a significantly flattened set of art data in the JCR for Lucene
to perform free-text searches on. This data set supports the site search as well as
some of the searches on our “advanced collection search” page
NGA Web Site Redesign – March 2013
13
NGA CQ5 Art Data and Image Integration
For art object imagery, we attempted to:
• Use CQ5 DAM to create renditions, including zoom images
• Synchronize imagery with the CQ5 DAM using independent Java program invoking
JCR APIs over RMI.
Findings
• Using DAM for renditions ballooned the JCR to over 250GB very quickly – datastore
GC might have helped with that, but we didn’t know about it
• Insufficient control options over renditions - color profiles are not retained during
resizing operations and no out of the box control over resizing algorithms exists
• rsync over webDAV didn’t work, so separate program or OSGi bundle would have to
be developed to synchronize imagery
• Custom workflow processes for images would have to be developed for creating
image renditions
What we did
• Enhance existing image processing jobs to produce additional sizes; rsync images and
serve from a dedicated image and image zoom server
• Enhance Art Object JCR bundle to include image data
NGA Web Site Redesign – March 2013
14
NGA CQ5 Art Data and Image Integration
Lessons Learned
• Let CQ do what CQ is good at – authoring, storing, and rendering web
content. Resist the temptation to feed a lot of other data sources into the
JCR.
• Spend the money to consult with an Adobe Architect – it’s worth it. Just be
sure they understand how many hours you’ve allocated to each topic so you
don’t blow the budget.
• Consider using a search engine such as SOLR to power your search
features if you have requirements bordering on the complex or if you find
yourself struggling with CQ5 JCR queries to get what you need.
• The CQ5 DAM is useful for assisting content authors, but until it evolves into
something larger, limit it’s use to web content authoring.
• Perform data store garbage collection on a regular basis if you find your
JCR is growing too rapidly. Remember that CQ backups double your disk
space requirement.
• Adobe still has a lot of work to do with respect to JCR search. Hopefully the
next major version of CQ5 will expose a much richer and better performing
search engine to developers.
NGA Web Site Redesign – March 2013
15
NGA CQ5 Other Useful Nuggets
NGA Developer Workstation Configuration
• Java 1.6, latest patch
• CRXDE rather than Maven + Eclipse – recommendation from Adobe since small team
• CRXDE can be slow and SVN integration is barely sufficient
• Missing CRXDE libraries
• Disable virus scanner, particularly during install
• 4 fast cores + 16GB RAM + fast disk (SSD?) so developer can easily run two
instances + tools efficiently – some laptops have only 8 which becomes problematic
• local Apache + dispatcher would have been helpful
Builds
• Scripted build process on shared DEV / build server
• Uses curl for all operations
• Delete and recreate approach taken
• SVN export of apps, design, and environment specific sling:osgiConfig nodes
• Bundle build to create JAR files
• Package created of all apps, JARs, and OSGI configs; package deployed to targets
NGA Web Site Redesign – March 2013
16
NGA CQ5 Other Useful Nuggets
In hindsight and assuming availability of resources and time, I would have:
1.Specified SOLR as our search solution from the start
2.Sent a few people to CQ5 advanced developer training earlier in the process
3.Sent a few people to CQ5 admin training earlier in the process
4.Committed to Maven + Eclipse rather than using CRXDE for development
5.Would look more seriously at custom oAuth effort rather than using CQ5’s social collaboration
module for authentication
NGA Web Site Redesign – March 2013
17
NGA CQ5 Other Useful Nuggets
Questions?
d-beaudet
at
nga.gov
202.312.2755
NGA Web Site Redesign – March 2013
18
NGA CQ5 Other Useful Nuggets
• Multiple run modes: Configure your CQ5 instances with multiple run modes (e.g. dev,author)
and use sling:osgiConfig nodes to target specific run-mode specific directories (e.g.
config.dev.author, config.dev, config.author)
• Avoid Felix console: don’t apply configurations outside of your DEV activities using the Felix
console – it’s better to define those in osgi configs and push them as part of your build. Also,
don’t mix modes of applying these configurations – it gets confusing otherwise as changes are
persisted differently by each.
• Avoid memory caching of JCR content if possible (for performance reasons, it’s not always
possible) and focus instead on the dispatcher to cache your slowest running pages
• Use AJAX to avoid cache flushing: For content shared across the site via base page
templates (e.g. site-wide alerts), consider using a separate AJAX call in order to avoid having to
flush all of your pages from cache. Instead, just cache the individual component instance.
• Use reference components and train content authors how to create and use reference
component instances to avoid having to duplicate a lot of identical content across your site. If
you’re programmatically including fixed path reference components in a template, consider
keeping that content isolated to a specific directory that requires a developer to change that
content; e.g. we use reference components outside of the context of a cq:page for drawers on
our event pages.
NGA Web Site Redesign – March 2013
19
NGA CQ5 Other Useful Nuggets
• Use client libraries to consolidate and minify your CSS and JavaScript. Separate your JS
(under /apps/appname/component path) and your CSS (under /etc/designs/appname) and use
categories and embeds (and possibly dependencies) to roll up all of your client libraries into a
few consolidated files at publish time. Also, separate your publish components from your author
components by checking the WCM mode before including client libs. Otherwise, you might end
up (like we did) finding a 7.3MB widgets.js file included for no reason on your home page.
Remember that /etc/designs/yourapp/jcr:content contains the list of components permitted for a
given template – you can either store it in SVN or let it live on your production authoring instance
– but you should choose one approach over the other and adapt build scripts accordingly.
• Use web browser tools and free on-line web site performance analysis tools to suggest
ways of improving the load times of your pages and minimizing the number of requests.
• Consolidated Search: we had a requirement to spider and show results from non-CQ5
managed web sites within our general site search so we wrote a spider that stores the content of
select pages from those sites into the JCR. The web site source is used as a facet so users can
opt to select / deselect based on the source web site. SOLR (or another external search engine)
could be used for this purpose as well and CQ5 templates would simply wrap the SOLR restful
APIs to render results.
• Use latest dispatcher
• If at first Adobe support isn’t helpful, be persistent, request a phone call – the quality of
response is highly variable – I’ve also heard that specifying “Sling” as the component leads to the
best customer support personnel, but I haven’t confirmed that.
NGA Web Site Redesign – March 2013
20
NGA CQ5 Other Useful Nuggets
• Avoid using custom name spaces if possible – they cannot be removed from the JCR once
created.
• Avoid creating custom node types for the purpose of simplifying JCR queries – our
performance tests show that using attribute / value pairs is usally even faster than a query
specifying a particular node type.
NGA Web Site Redesign – March 2013
21