Large Instance Points

Download Report

Transcript Large Instance Points

Large Instance Points
16th Eurofiling Workshop
Wednesday 12 December
Herm Fischer
Mark V Systems Limited and
Arelle open source XBRL processor
Study results
•RAM consumption vs. instance size
•Instance XML to DOM (XML only) 1 : 3 to 1 : 10
•Instance XBRL to formula processor 1 : 60
•Instance XBRL to SAX object model 1 : 15
•Conclusion
•Constant-memory streaming approach suggested
•Non-XML technologies eventually required
US SEC form “SD”
•Mining and oil exploration “payout” details
•Sample size:
•Instance: 21.7 MB
•Every filer uses extension taxonomy
•150,006 facts
•21,001 contexts
•5 units
•0 footnotes
Streaming XBRL in XML syntax
•Base spec 2.1 working group note (WGN) & task force
•Compatible organization of instance for streaming
•SAX vs DOM improves speed, no XML persistence
•Constant memory usage
•Use for 2.1 & XDT validation
•Challenges for
•Financial validation (~GFM)
•Formula processing
•XPath (XML node access)
XBRL streamability issues
•Order
•Freedom to order facts, contexts, units, footnotes
•XML syntax detail
•Formula & Table Xpath to nodes and XML structure
•Validation and formula strategies
•Designed for complete instance in memory
•Complex fallback and existence strategies
XBRL streaming approach
•Constant memory
•Backwards compatibility
•Order constrained within instances
•Contexts/units located as needed in instance
Financial Validation EFM/GFM
•Full object models analysis in memory for
•Context, unit, fact duplications (could use hashes)
•Fact cross-dimension analysis (only some concepts)
•History of concepts used
•Roll ups, roll forwards, aggregations
•Full DTS model in memory for
•Concept issues, label/definition issues
•Missing / improper calculations, roll-ups, roll-fwds
•Can be re-architected for streaming environment
Global Ledger Architecture
•Multiple content models required in parallel
•Transactions
•Company data
•Account data
•Reformulate for independent streams or persistent
company/account models.
Formula Response
•Define subset of XPath for streamed processor
•No node-axis features (“/” or “[“ operators)
•More functions for context, typed dimensions, etc
•Should allow use of non-XML implementations
•Define subset of formula processing
•Consider SQL infrastructures
•Consider OLAP features
•Reconsider use of features like
•Fallbacks
•Multiple instances of large data sets
Abstract Model Response
•Abstract model is based on OMG MOF & CWM
•Abstracted XBRL semantics from syntax
•Implementation will layer on XML as well as
•CWM (OLAP supportive technologies)
•Next step is a pilot project
•Implement abstract model demonstration
•Evolve and tweak specs
•Provide prototype implementation