Querying jobs

Download Report

Transcript Querying jobs

Developer APIs to Condor
+
A Tutorial on
Condor’s Web Service
Interface
Computer Sciences Department
University of Wisconsin-Madison
[email protected]
http://www.cs.wisc.edu/condor
Interfacing Applications w/
Condor
› Suppose you have an application which
needs a lot of compute cycles
› You want this application to utilize a
pool of machines
› How can this be done?
http://www.cs.wisc.edu/condor
2
Some Condor APIs
› Command Line tools
›
›
›
›
›
›
 condor_submit, condor_q, etc
DRMAA
Condor GAHP
JSDL
RDBMS
Condor Perl Module
SOAP
http://www.cs.wisc.edu/condor
3
Command Line Tools
› Don’t underestimate them!
› Your program can create a submit file
on disk and simply invoke
condor_submit:
system(“echo universe=VANILLA > /tmp/condor.sub”);
system(“echo executable=myprog >> /tmp/condor.sub”);
. . .
system(“echo queue >> /tmp/condor.sub”);
system(“condor_submit /tmp/condor.sub”);
http://www.cs.wisc.edu/condor
4
Command Line Tools
› Your program can create a submit file
and give it to condor_submit through
stdin:
PERL:
C/C++:
fopen(SUBMIT, “|condor_submit”);
print SUBMIT “universe=VANILLA\n”;
. . .
int s = popen(“condor_submit”, “r+”);
write(s, “universe=VANILLA\n”, 17/*len*/);
. . .
http://www.cs.wisc.edu/condor
5
Command Line Tools
› Using the +Attribute with
condor_submit:
universe = VANILLA
executable = /bin/hostname
output = job.out
log = job.log
+webuser = “zmiller”
queue
http://www.cs.wisc.edu/condor
6
Command Line Tools
› Use -constraint and –format with
condor_q:
% condor_q -constraint ‘webuser==“zmiller”’
-- Submitter: bio.cs.wisc.edu : <128.105.147.96:37866> : bio.cs.wisc.edu
ID
OWNER
SUBMITTED
RUN_TIME ST PRI SIZE CMD
213503.0
zmiller
10/11 06:00
0+00:00:00 I 0
0.0 hostname
% condor_q -constraint 'webuser=="zmiller"' -format "%i\t"
ClusterId -format "%s\n" Cmd
213503
/bin/hostname
http://www.cs.wisc.edu/condor
7
Command Line Tools
› condor_wait will watch a job log file
and wait for a certain (or all) jobs to
complete:
system(“condor_wait job.log”);
› can specify a timeout
http://www.cs.wisc.edu/condor
8
Command Line Tools
› condor_q and condor_status –xml
option
› So it is relatively simple to build on
top of Condor’s command line tools
alone, and can be accessed from many
different languages (C, PERL, python,
PHP, etc).
› However…
http://www.cs.wisc.edu/condor
9
DRMAA
› DRMAA is a GGF standardized job›
›
›
submission API
Has C (and now Java) bindings
Is not Condor-specific -- your app could
submit to any job scheduler with minimal
changes (probably just linking in a
different library)
SourceForge Project
http://sourceforge.net/projects/condor-ext
http://www.cs.wisc.edu/condor
10
DRMAA
› Easy to use, but
› Unfortunately, the DRMAA API does
not support some very important
features, such as:
Two-phase commit
Fault tolerance
Transactions
http://www.cs.wisc.edu/condor
11
Condor GAHP
› The Condor GAHP is a relatively low-level protocol
›
›
based on simple ASCII messages through stdin and
stdout
Supports a rich feature set including two-phase
commits, transactions, and optional asynchronous
notification of events
Is available in Condor 6.7.X
http://www.cs.wisc.edu/condor
12
Example:
GAHP, cont
R: $GahpVersion: 1.0.0 Nov 26 2001 NCSA\ CoG\ Gahpd $
S: GRAM_PING 100 vulture.cs.wisc.edu/fork
R: E
S: RESULTS
R: E
S: COMMANDS
R: S COMMANDS GRAM_JOB_CANCEL GRAM_JOB_REQUEST GRAM_JOB_SIGNAL
GRAM_JOB_STATUS GRAM_PING INITIALIZE_FROM_FILE QUIT RESULTS VERSION
S: VERSION
R: S $GahpVersion: 1.0.0 Nov 26 2001 NCSA\ CoG\ Gahpd $
S: INITIALIZE_FROM_FILE /tmp/grid_proxy_554523.txt
R: S
S: GRAM_PING 100 vulture.cs.wisc.edu/fork
R: S
S: RESULTS
R: S 0
S: RESULTS
R: S 1
R: 100 0
S: QUIT
R: S
http://www.cs.wisc.edu/condor
13
JSDL and Condor
› GridSAM: open
›
source web service
for job submission
and monitoring
Condor plugin for
GridSAM enables
JSDL submissions
to Condor.
http://www.cs.wisc.edu/condor
14
RDMS: Quill
› Job ClassAds
Master
Startd
…Schedd
Job
Queue
log
Quill
RDBMS
Queue
+
History
Tables
›
›
information
mirrored into an
RDBMS
Both active jobs
and historical jobs
Benefits BOTH
scalability and
accessibility
http://www.cs.wisc.edu/condor
15
Condor Perl Module
› Perl module to parse the “job log file”
› Recommended instead of polling w/
condor_q
› Call-back event model
› (Note: job log can be written in XML)
http://www.cs.wisc.edu/condor
16
Web Service Interface
› Simple Object Access Protocol
Mechanism for doing RPC using XML
(typically over HTTP or HTTPS)
A World Wide Web Consortium (W3C)
standard
› SOAP Toolkit: Transform a WSDL to
a client library
http://www.cs.wisc.edu/condor
17
Benefits of a Condor SOAP
API
› Condor becomes a service
Can be accessed with standard web
service tools
› Condor accessible from platforms
where its command-line tools are not
supported
› Talk to Condor with your favorite
language and SOAP toolkit
http://www.cs.wisc.edu/condor
18
Condor SOAP API
functionality
›
›
›
›
›
Submit jobs
Retrieve job output
Remove/hold/release jobs
Query machine status
Query job status
http://www.cs.wisc.edu/condor
19
Getting machine status via
SOAP
Your program
condor_collector
queryStartdAds()
Machine List
SOAP library
SOAP
over HTTP
http://www.cs.wisc.edu/condor
20
Lets get some details…
http://www.cs.wisc.edu/condor
21
The API
› Core API, described with WSDL, is
designed to be as flexible as possible
File transfer is done in chunks
Transactions are explicit
› Wrapper libraries aim to make
common tasks as simple as possible
Currently in Java and C#
Expose an object-oriented interface
http://www.cs.wisc.edu/condor
22
Things we will cover
›
›
›
›
›
›
Condor setup
Necessary tools
Job Submission
Job Querying
Job Retrieval
Authentication with SSL and X.509
An important addition in late 6.7
http://www.cs.wisc.edu/condor
23
Condor setup
› Start with a working condor_config
› The SOAP interface is off by default
 Turn it on by adding ENABLE_SOAP=TRUE
› Access to the SOAP interface is denied by default
 Set ALLOW_SOAP and DENY_SOAP, they
work like ALLOW_READ/WRITE/…
 See section 3.7.4 of the v6.7 manual for a
description
 Example: ALLOW_SOAP=*/*.cs.wisc.edu
http://www.cs.wisc.edu/condor
24
Necessary tools
› You need a SOAP toolkit
 Apache Axis (Java) - http://ws.apache.org/axis/
 Microsoft .Net - http://microsoft.com/net/
All our
 gSOAP (C/C++) - http://gsoap2.sf.net/
examples are
 ZSI (Python) - http://pywebsvcs.sf.net/
in Java using
 SOAP::Lite (Perl) - http://soaplite.com/
› You need Condor’s WSDL files
Apache Axis
 Find them in lib/webservice/ in your Condor release
› Put the two together to generate a client library
 $ java org.apache.axis.wsdl.WSDL2Java
condorSchedd.wsdl
› Compile that client library
 $ javac condor/*.java
http://www.cs.wisc.edu/condor
25
Helpful tools
› The core API has some complex spots
› A wrapper library is available in Java and C#
 Makes the API a bit easier to use (e.g. simpler file
›
transfer & job ad submission)
 Makes the API more OO, no need to remember and
pass around transaction ids
We are going to use the Java wrapper library for our
examples
 You can download it from
http://www.cs.wisc.edu/condor/birdbath/birdbath.jar
 Will be included in Condor release
http://www.cs.wisc.edu/condor
26
Submitting a job
› The CLI way…
cp.sub:
universe = vanilla
executable = /bin/cp
arguments = cp.sub cp.worked
should_transfer_files = yes
transfer_input_files = cp.sub
when_to_transfer_output = on_exit
queue 1
clusterid = X
procid = Y
owner = matt
requirements = Z
Explicit bits
Implicit bits
$ condor_submit cp.sub
http://www.cs.wisc.edu/condor
27
Submitting a job
• The SOAP way…
1. Begin transaction
Repeat to submit multiple clusters
2.Create cluster
3.Create job
4.Send files
Repeat to submit multiple
5.Describe job
jobs in a single cluster
6.Commit transaction
http://www.cs.wisc.edu/condor
28
Submission from Java
Schedd schedd = new Schedd(“http://…”);
Transaction xact =
schedd.createTransaction();
1. Begin transaction
xact.begin(30);
int cluster = xact.createCluster();
2. Create cluster
int job = xact.createJob(cluster);
3. Create job
File[] files = { new File(“cp.sub”) };
xact.submit(cluster, job, “owner”,
UniverseType.VANILLA, “/bin/cp”,
“cp.sub cp.worked”, “requirements”,
null, files);
xact.commit();
4&5. Send files & describe
job
6. Commit transaction
http://www.cs.wisc.edu/condor
29
Submission from Java
Schedd’s location
Schedd schedd = new Schedd(“http://…”);
Transaction xact =
schedd.createTransaction();
Max time between calls (seconds)
xact.begin(30);
int cluster = xact.createCluster();
int job = xact.createJob(cluster);
File[] files = { new File("cp.sub") };
Job owner, e.g. “matt”
xact.submit(cluster, job, “owner”,
UniverseType.VANILLA, “/bin/cp”,
“cp.sub cp.worked”, “requirements”,
null, files);
xact.commit();
Requirements, e.g. “OpSys==\“Linux\””
Extra attributes, e.g. Out=“stdout.txt” or Err=“stderr.txt”
http://www.cs.wisc.edu/condor
30
Querying jobs
› The CLI way…
$ condor_q
-- Submitter: localhost : <127.0.0.1:1234> : localhost
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE CMD
1.0 matt
10/27 14:45 0+02:46:42 C 0 1.8 sleep 10000
…
42 jobs; 1 idle, 1 running, 1 held, 1 unexpanded
http://www.cs.wisc.edu/condor
31
Querying jobs
› The SOAP way from Java…
String[] statusName = { “”, “Idle”, “Running”, “Removed”,
“Completed”, “Held” };
Also, getJobAds given a
int cluster = 1;
int job = 0;
constraint, e.g. “Owner==\“matt\””
Schedd schedd = new Schedd(“http://…”);
ClassAd ad = new ClassAd(schedd.getJobAd(cluster, job));
int status = Integer.valueOf(ad.get(“JobStatus”));
System.out.println(“Job is “ + statusName[status]);
http://www.cs.wisc.edu/condor
32
Retrieving a job
› The CLI way..
› Well, if you are submitting to a local
›
Schedd, the Schedd will have all of a job’s
output written back for you
If you are doing remote submission you
need condor_transfer_data, which
takes a constraint and transfers all files in
spool directories of matching jobs
http://www.cs.wisc.edu/condor
33
Retrieving a job
› The SOAP way in Java…
int cluster = 1;
Discover available files
int job = 0;
Schedd schedd = new Schedd(“http://…”);
Transaction xact = schedd.createTransaction();
xact.begin(30);
Remote file
FileInfo[] files = xact.listSpool(cluster, job);
for (FileInfo file : files) {
xact.getFile(cluster, job, file.getName(), file.getSize(),
new File(file.getName()));
}
xact.commit();
Local file
http://www.cs.wisc.edu/condor
34
Authentication for SOAP
› Authentication is done via mutual SSL
authentication
 Both the client and server have certificates and identify
themselves
› Possible in late-late 6.7 (available by 6.8)
› It is not always necessary, e.g. in some controlled
environments (a portal) where the submitting
component is trusted
› A necessity in an open environment -- remember
that the submit call takes the job’s owner as a
parameter
 Imagine what happens if anyone can submit to a
Schedd running as root…
http://www.cs.wisc.edu/condor
35
Authentication setup
› Create and sign some certificates
› Use OpenSSL to create a CA
 CA.sh -newca
› Create a server cert and password-less key
 CA.sh -newreq && CA.sh -sign
 mv newcert.pem server-cert.pem
 openssl rsa -in newreq.pem -out server-key.pem
› Create a client cert and key
 CA.sh -newreq && CA.sh -sign && mv
newcert.pem client-cert.pem && mv newreq.pem
client-key.pem
http://www.cs.wisc.edu/condor
36
Authentication config
› Config options…
 ENABLE_SOAP_SSL is FALSE by default
 <SUBSYS>_SOAP_SSL_PORT
• Set this to a different port for each
SUBSYS you want to talk to over ssl, the
default is a random port
• Example: SCHEDD_SOAP_SSL_PORT=1980
 SOAP_SSL_SERVER_KEYFILE is required and
has no default
• The file containing the server’s certificate
AND private key, i.e. “keyfile” after
cat server-cert.pem server-key.pem >
keyfile
http://www.cs.wisc.edu/condor
37
Authentication config
› Config options continue…
 SOAP_SSL_CA_FILE is required
›
• The file containing public CA certificates
used in signing client certificates, e.g.
demoCA/cacert.pem
All options except SOAP_SSL_PORT have an
optional SUBSYS_* version
 For instance, turn on SSL for everyone except
the Collector with
• ENABLE_SOAP_SSL=TRUE
• COLLECTOR_ENABLE_SOAP_SSL=FALSE
http://www.cs.wisc.edu/condor
38
One last bit of config
› The certificates we generated have a principal name, which
›
›
›
›
›
is not standard across many authentication mechanisms
Condor maps authenticated names (here, principal names) to
canonical names that are authentication method independent
This is done through mapfiles, given by
SEC_CANONICAL_MAPFILE and SEC_USER_MAPFILE
Canonical map: SSL
.*emailAddress=(.*)@cs.wisc.edu.* \1
User map: (.*) \1
“SSL” is the authentication method, “.*emailAddress….*” is a
pattern to match against authenticated names, and “\1” is
the canonical name, in this case the username on the email in
the principal
http://www.cs.wisc.edu/condor
39
HTTPS with Java
› Setup keys…
 keytool -import -keystore truststore -trustcacerts -file
demoCA/cacert.pem
 openssl pkcs12 -export -inkey client-key.pem -in clientcert.pem -out keystore
› All the previous code stays the same, just set some
properties
 javax.net.ssl.trustStore, javax.net.ssl.keyStore,
javax.net.ssl.keyStoreType,
javax.net.ssl.keyStorePassword
 Example: java -Djavax.net.ssl.trustStore=truststore Djavax.net.ssl.keyStore=keystore Djavax.net.ssl.keyStoreType=PKCS12 Djavax.net.ssl.keyStorePassword=pass Example https://…
http://www.cs.wisc.edu/condor
40
Questions?
http://www.cs.wisc.edu/condor
41