naughton_condor_db

Download Report

Transcript naughton_condor_db

Leveraging Database
Technologies in Condor
Jeff Naughton
April 25, 2006
Overview
› Introducing ourselves
› What we have done since last year
• Obtained funding (Yay! thank you NSF!)
• Quill: deployed DB-centric data tool
• Quill++: more comprehensive, deployed in test-
cluster, running (guinea pig) user jobs
• Condor J2EE: radical departure experimental
system, deployed last week in test cluster
• Published some research papers…
› BOF 1:30 on Thursday
Who we are
› Faculty: David DeWitt, Jeff
Naughton
› Students: Jiansheng Huang, Ameet
Kini, Christine Reilly, Eric Robinson,
Srinath Shankar, Lakshmikant
Shrinivas
How do we fit in?
› Advanced Development/Research group
›
focused on data management.
Goal:
• Interact frequently with Condor Dev. Team and
users
• Design and prototype new technology;
• transfer to Condor team for deployment.
› What we don’t do:
• Determine roadmap and schedules for
deployment within Condor.
Why Condor and DBMS?
› Premise: A running Condor system is awash
in data:
• Operational data, Historical data, User data
› DBMS technology can help capture,
›
organize, manage, archive, and query this
data.
This can make Condor even more powerful,
usable, and useful.
Quill
> Non-invasive approach to capturing job related
>
>
>
information
Works by sniffing updates to the job queue log
Serves condor_q and condor_history queries
Independent, reliable, and efficient querying of
job related information, with underlying SQL
interface
So how does it work?
Quill Architecture
Master
Startd
…
Schedd
Quill
RDBMS
Job
Queue
log
Queue
+
History
Tables
Quill++
› More comprehensive than Quill (data from
›
›
›
›
all daemons, not just SchedD)
Built on Quill code base
Condor daemons write to SQL logs, Quill
daemon reads and inserts in DBMS
Central database serves entire pool
Web-based query GUI
Data Capture in Quill++
› Condor daemons
›
›
augmented to record
important events in a
database
Database is in addition
to standard daemon
logs
Pool will run
unaffected even in the
absence of a database
Schedd
Shadow
Startd
Starter
Negotiator
A Machine
Quill++ Architecture
Master
Startd
…
Schedd
Quill++
RDBMS
Event
logs
Job
Queue
log
Queue,
History,
Machine,
Match etc.
Implementation Details
› Quill++: First class condor daemon
• Managed by Condor Master
• Native PostgreSQL API
• Can be ported to any platform for which
PostgreSQL drivers are available (AIX, BSD,
IRIX, HP-UX, Linux, Solaris, Windows etc.)
• Porting Quill++ to other databases involves
implementing a database virtual class
Web Interface
› Useful for:
• User job monitoring
• Administrative monitoring over jobs and
resources
• Debugging
Condordb Admin Screen
Jobs in queue
History jobs
Machine Status
Recency summary
Job history by owner
Machine Report
Status about a job
Classad Info
Run Info
Event Info
Match Info
Rejects Info
Recency info for exceptional
data sources
Present Status
› Deployed in testbed
• dbc cluster (93 machines)
• Has successfully run almost 100,000 jobs.
• Working with Condor team planning
future distribution with Condor.
Caveats
› Web interface to DB
• Basic prototype implemented
• Needs to be made more robust, user
friendly (!)
› Gathers incomplete information in
multiple pool scenarios (flocking, glide-
in, condor-c)
CondorJ2
› To boldly go where no one has gone before
• Quill/Quill++: Database reflects state of Condor pool
• Condor J2: Database is the state of Condor pool
› Overview of CondorJ2
• Use database to maintain operational data (workflow
state, machine state, config policies, etc.)
• Implement workflow management, resource management
and resource allocation in Application Server environment
• Modify master, startd and starter to be thin web service
clients
• Provide web interface for all system services (workflow
submission, machine reconfiguration etc.)
Motivation
› Scalability
› Flexibility
› Administratibility
Java Application Servers
› Industrial strength middleware for high performance &
scalable web applications
› Widely deployed systems
•
Oracle AS 10g, IBM WebSphere, BEA WebLogic, JBoss (open
source)
› Key features
•
•
•
•
•
Support for transactions
Web service interfaces
Support for clustering (for scalability)
Configurable security
Backend database independence
Condor
Database
JDBC
Application Server
Machine
Modules
Matchmaking
Modules
Condor Pool
Web Site
HTTP
User’s Web
Browser
Workflow
Modules
Condor
Web Services
SOAP over HTTP
User’s
Custom
Tools
Web Service Clients
master
startd
starter
Execute Machines
What can do in CondorJ2 via
browsers and web services?
› Add and configure new machines
› Reconfigure machines on the fly
› Specify, submit, monitor and manage
workflows
› Monitor global system state
Virtuous Cycle
› As we learn where Condor can use DBMS
technology, we learn where DBMS
technology can be (must be?) improved.
• Support for sparse data sets [ICDE 2006].
• Pushing match-making style operations into
DBMS [SIGMOD 2006].
• Data provenance as byproduct of Quill++ data
capture. [IPAW 2006]
› Improving DBMS technology will lead to
more places that it can be installed.
Other ongoing work…
› File caching in Condor pools
› Techniques for explaining data
consistency rather than dictating
consistency
› Automatic monitoring of system
“health” by mining captured data
Visit us and see demos!
› Come see demos of Quill, Quill++, and
CondorJ2 in Rm. 216/218 Fluno
Center on Thurs. afternoon 1:30 –
2:30pm.