Group Detection and Link Analysis

Download Report

Transcript Group Detection and Link Analysis

The Condor DB Group Report
Jiansheng Huang, Ameet Kini, Shrinivas
Lakshmikant, Erik Paulson, Christine Reilly,
Eric Robinson, Srinath Shankar, David DeWitt,
Jeff Naughton
1
Overview


General overview of group projects
(Naughton).
Quill (Paulson).
2
Condor DB Group

Overall task:
Focus on data management aspects of
Condor
 Deliver prototypes of useful technology
 Explore, develop and evaluate technology
that may be useful to Condor down the
road.

3
Projects other than Quill






Provenance in a Condor System.
Statistical mining of log data to evaluate
system health.
Interaction of user data placement, caching,
and workflow job scheduling.
Job-machine matching in DB context.
Condor functionality based on App-Server
technology.
Recency and consistency in captured data.
4
Provenance and Condor


Christine Reilly ([email protected]).
Provenance: information on how data was
produced.

Observation: for each user job, Condor can
record:





Which version of program(s) was used;
Which version of data was used;
When it was produced;
What system it ran on (hardware, software.)
Questions:


How much information should we gather?
How much burden should we place on the system
designer, application programmer, or both?
5
Debugging through log mining


Srinivas Lakshmikant ([email protected])
Idea:

Record “events,” logically associated with entities.




E.g., job entities start, get scheduled, run, terminate.
Find which entities have infrequent events.
Find which entities lack frequent events.
Can you use this to detect problems?


Early results suggest yes: finds and pinpoints
problems that might not be found otherwise.
How can you increase the accuracy and
efficiency over naïve approaches?
6
Caching,Scheduling,Workflow


Srinath Shankar ([email protected])
Idea:




Cache input files and intermediate files on disks
of pool machines;
Record where these files are cached;
Schedule tasks in a workflow to minimize data
fetches/moves
Result: potentially much greater throughput.
7
Job Matching in a DBMS




Ameet Kini ([email protected])
Idea: matching looks a lot like a DBMS
join.
If machine and job data are already
stored in a DBMS, can we or should we
use the DBMS to do the matching?
Answer: early results are promising but
this is a non-trivial problem.
8
Recency of Quill Data





Jiansheng Huang ([email protected].)
Problem: daemons report in at uncontrollable
and unpredictable times.
Result: out of date and inconsistent data set.
Can we provide the user with a concise
characterization of the recency of the
sources relevant to a user query?
Note: surprisingly non-trivial to define what
we mean by “relevant” in this setting.
9
App. Servers and Condor



Eric Robinson ([email protected])
Idea: applications servers provide a lot
of technology that appears useful in a
Condor setting.
Approach: build prototype of some
Condor functionality using these tools,
evaluate the approach.
10
Moving on…


Further questions on these projects?
Best bet is to contact student listed on
each slide.
On to Quill portion of talk.
11
The Condor Quill
The Quill Developers
“Give me a condor's quill! Give me
Vesuvius' crater for an ink stand. Friends,
hold my arms! For in the mere act of
penning my thoughts of this Leviathan,
they weary me. . . To produce a mighty
book you must choose a mighty theme.”
-Melville, Moby Dick
12
What is Quill?
A non-invasive method of storing a
read-only version of the Condor
operational data in a relational
database.
13
Quill: In pictures
Disk
With Quill
Without Quill
SchedD
SchedD
Job queue
transaction
log
Job queue
transaction
log
(job_queue.log)
(job_queue.log)
DBMS
QuillD
14
Quill: Where we’ve been




First shipped in 6.7.11 (Sept 05)
Now “over the fence” – Condor Team is
driving the 6.8 version
Response from users very helpful!
Lessons learned
Passive collection good
 DBMSes are full of surprises

15
Quill: Where we’d like to be





Shared databases
Better job data
Data from non-job sources
More than just PostgreSQL DBMS
Examples of usage
16
Quill in Condor 6.9.3




Development effort mostly complete
Previous bullet points addressed 
Migration path for historical job data
Out of the box changes for Quill users:



Horizontal and vertical schema for active jobs
Jobs from multiple schedds in one database
By default, no new historical data stored
17
Example tables
ScheddName
Cluster
Proc
Owner
JobStatus
JobPrio
Universe
north.cs.wisc.edu
23
2
epaulson
IDLE
10
Vanilla
north.cs.wisc.edu
23
3
epaulson
IDLE
10
Vanilla
south.cs.wisc.edu
13
2
jhuang
RUN
5
Grid
north.cs.wisc.edu
13
2
miron
HELD
30
Standard
ScheddName
Cluster
Proc
Attr
Value
north.cs.wisc.edu
23
2
WantIO
TRUE
north.cs.wisc.edu
23
2
Group
Database
north.cs.wisc.edu
23
3
Group
Condor
south.cs.wisc.edu
13
2
Group
Condor
Horizontal Job
Table
Vertical Job
Table
18
More job information

The lifecycle of the job would be nice
to have



Events like those in the “user log”
But, need more info than what’s in the
job queue
Passive data collection works
19
Quill 6.9.3 diagram
SchedD
DBMS
QuillD
Job
queue.
log
event
log
(new)
Disk
 Schedd writes events to
the new “Event” log, Quill
daemon passively picks
up the events and inserts
them into the database.
 For the schedd, event
log contains userlog
events and job history
events
20
Examples



“Show me all the jobs that exited with
a segfault that at some point ran on
this machine”
“When my jobs get preempted, how
long until they get matched again?”
“What is the average runtime for jobs
for each different type of input file”

SQL “GROUP by”
21
Collecting non-job information
StartD
SchedD
DBMS
Disk
Negotiator
QuillD
event
log
(new)
22
New information stored





StartD: Machine status
Negotiator: Matches made
Starter/Shadow: Files transferred
Collector: “Submitter” ads
All daemons: Generic Events, daemon
ads
23
The DBMSD

New daemon responsible for database
housekeeping


Only one needed per DBMS
Purges old data

Three classes, independent thresholds




Resource: Machine classads
Run: matches, job log events
Job: condor_history information
Estimates size of database

“Soft quota”, warn when exceeded
24
Multiple DBMS systems

Oracle supported


Appears to need less maintenance
A nearly unified schema


Main difference is large text fields
Same binaries, DBMS type selectable via
configuration file
25
Example Usage

PHP web front end



Good enough for some people
Or, use as the basis for your own system
BoF on Thursday at 11:00am

We’ll use the web front end to explain the
information Quill now stores
26