Java for High Performance Computing - Grid-SAFE

Download Report

Transcript Java for High Performance Computing - Grid-SAFE

GridSafe Overview
Stephen Booth [email protected]
Stephen Booth
EPCC
Grid-SAFE
• JISC funded project to build general purpose
accounting/monitoring solution.
– http://gridsafe.forge.nesc.ac.uk/
• Builds on accounting subsystem from SAFE user
administration system used by UK national facilities
HPCx/HECToR
2
Challenges
• Need to work with different HPC technologies
– Different batch systems
– Different middleware
• Need to work with wide variety of different local policies.
• Need to work with both grids and local HPC resources.
• One solution won’t fit all potential users
– Build kit of parts
– Pre-built solutions for common deployment scenarios.
• Key aims
– Modular design, individual functions can be deployed independently
– Behaviour can be customised using plug-ins to implement different
service policies.
Overview
Data Formats
• System can consume accounting data in a variety of formats.
• Each format has a plug-in parser module
• New formats can be supported by writing additional parser
plug-ins.
• Data is stored in an SQL database.
• Additional policy plug-ins can augment the parser to
customise behaviour.
Raw
Data
Parser
Policy
Policy
Policy
DB
Parser
• System can support multiple input formats at the same time.
• Current supported parsers
– OGF-UR XML
– SGE accounting logfile
– PBS accounting logfile
– EGEE JobManager logfile
– Etc.
• New parsers easy to generate
OGF-UR support
• OGF-UR XML is supported as an interchange format
– Parser plug-in to parse OGF-UR
– Export module to format internal data as OGF-UR
• Grids may only want to use only this Format for central accounting.
– Local instances could use raw data and generate UR for central processing.
• Various grid communities seem to interpret OGF-UR differently and/or
make additional requirements beyond that in the schema
– Required fields
– Different charging models
– Different global username models
– OGF-UR spec allows extensions.
– Specification will also evolve over time.
• Parser/exporter highly configurable to support variations/extensions.
7
Use in the grid
XML Grid accounting
Site accounting
XML
Independent
UR Generator
Report generation module
• Reports can be generated on demand from web interface
• Grid-safe uses XML templates to define reports
– Can generate unified reports over multiple data tables containing
different types of data
– Tables/charts
– Parameterised reports (e.g. to select user or project).
• Support reports in multiple output formats
– PDF HTML CSV XML
9
Report generation speed
• Performance of report generation a particular issue
• Number of database records key to this.
– Need to utilise database effectively. Not acceptable to read all records
into memory.
• ~1,000,000 record database table not a problem.
– Current National HPC systems within this range.
– Throughput clusters often have significantly larger record counts due
to large numbers of small short jobs.
• Old data can be moved to separate tables.
• Support for Daily aggregates via policy plug-in
– Builds secondary accounting table combining similar records.
– For ECDF 51 million records -> 35 thousand aggregates
Policy plug-ins
• Allow behaviour to be customised to local requirements
• Generate new properties
– E.G. Charge values
• Trigger additional processing
– Decrement charging allocations
– Generate aggregate records
– Etc.
• New policies can be written for specific requirements
Aggregation Policy
• Generates Aggregated records
– Each time a new record is loaded
– Corresponding aggregate is located/created
– Aggregate values updated
• The raw data is also kept and can be used in reports if
required.
• Aggregate data can be regenerated if required.
ClassificationPolicy
• Converts selected fields from raw accounting data into
references to separate database table.
– Reduces data footprint.
– Augmenting information can be added to these tables.
• Example:
URRecord
User
Institution
UnixGroup
Site
DailyAggregate
DerivedPolicy
• Defines new properties as expressions over existing
properties
• E.g. (EndTime-StartTime)*CPUs
• These expressions can then be used in reports.
LinkPolicy
• Merge data from different sources
– E.g. Batch system logs and middleware logs.
• Each data source is parsed to its own table.
– Primary table parsed first.
– LinkPolicy added to secondary data source.
– Locates corresponding primary record,
– Adds cross reference or copies additional properties to primary
Web Services
• RUPI
– Current proposal from OGF RUS-WG
– Web service for the upload of XML usage record.
– Grid-SAFE has an implementation of the current upload service
(RUPI).
• RUQI
– Currently working on a proposal for a Query specification
– Aims
– Easy to implement in different code bases.
– Provide sufficient functionality for efficient report generation.
– Long term aim to provide reporting portal that can query any
system that implements this interface.