PowerPoint2000 - Computer Sciences Dept.

Download Report

Transcript PowerPoint2000 - Computer Sciences Dept.

The Roadmap to New Releases
Todd Tannenbaum
Department of Computer Sciences
University of Wisconsin-Madison
http://www.cs.wisc.edu/condor
[email protected]
www.cs.wisc.edu/condor
1
Stable vs. Development
Series
› Much like the Linux kernel, Condor
provides two different releases at any
time:
Stable series
Development series
› Allows Condor to be both a research
project and a production-ready system
www.cs.wisc.edu/condor
2
Stable series
› Series number in version is even (e.g.
6.2.0)
› Releases are heavily tested
› Only bug fixes and ports to new
platforms are added on a stable
series
www.cs.wisc.edu/condor
3
Stable series (cont.)
› A given stable release is always
compatible with other releases from
the same series
› Recommended for production pools
www.cs.wisc.edu/condor
4
Development Series
› Series number in the version is odd (e.g.
6.1.17, 6.3.1)
› New features and new technology are
added frequently
› Versions from the same development
series are not always compatible with
each other
www.cs.wisc.edu/condor
5
Development Series (cont.)
› Releases are not as heavily tested
› Generally not recommended for
production pools
… unless new features are required
… unless we recommend otherwise :^)
www.cs.wisc.edu/condor
6
Where is Condor Today?
› Version 6.3.2 being released asap –
this is the v6.4.0 release candidate.
› We expect version 6.4.0 released by
the end of March.
www.cs.wisc.edu/condor
7
What’s new for
Condor v6.4.0?
www.cs.wisc.edu/condor
8
New Ports in 6.4.0
› Full support (with checkpointing and
remote system calls):
RedHat 7.x
(Linux 2.4.x kernel + glibc 2.2.x)
www.cs.wisc.edu/condor
9
New Ports in 6.4.0 (cont.)
› ”Clipped" support (no checkpointing,
PVM, or remote system calls, but all
other functionality is available)
Windows 2000
Mac OS X
www.cs.wisc.edu/condor
10
Secure Communication
› Secure network communication
Strong user authentication
• Multiple methods supported: Kerberos, X509,
NT LanMan, …
Encryption
Integrity
› Authorization based on host or user
www.cs.wisc.edu/condor
11
New Job Universes
› MPI Universe
Launch MPI jobs linked with MPICH
library
› Globus Universe
Faster, more reliable, better integrated
› Java Universe
www.cs.wisc.edu/condor
12
Java Universe Job
condor_submit
universe = java
executable = Main.class
jar_files = MyLibrary.jar
input = infile
output = outfile
arguments = Main 1 2 3
queue
www.cs.wisc.edu/condor
13
Why not use Vanilla
Universe for Java jobs?
› Java Universe provides more than just
inserting “java” at the start of the execute
line
 Knows which machines have a JVM installed
 Knows the location, version, and performance of
JVM on each machine
 Provides more information about Java job
completion than just JVM exit code
• Program runs in a Java wrapper, allowing Condor to
report Java exceptions, etc.
www.cs.wisc.edu/condor
14
Java support, cont.
condor_status -java
Name
JavaVendor
Ver
aish.cs.wisc. Sun Microsy 1.2.2
anfrom.cs.wis Sun Microsy 1.2.2
babe.cs.wisc. Sun Microsy 1.2.2
...
State
Activity LoadAv Mem
Owner
Owner
Claimed
Idle
Idle
Busy
www.cs.wisc.edu/condor
0.000
0.030
1.120
249
249
123
15
Condor File Transfer
› Condor will transfer job files from the
›
›
submit machine to the execute machine
Files to send and/or receive specified at
submit time
Transfer is atomic
 All files are transferred, or transfer fails
› Appeared in v6.2 only in Condor for
Windows
www.cs.wisc.edu/condor
16
File Transfer, cont.
› Example:
transfer_input_files = x, y, z …
transfer_output_files = a, b, c ….
transfer_files = [ ALWAYS | ONEXIT ]
› Note: Condor can automatically figure
out output files
Default: Send back any new/changed files
www.cs.wisc.edu/condor
17
Remote I/O Socket
› Job can request that the condor_starter
›
process on the execute machine create a
Remote I/O Socket
Used for online access of file on submit
machine – without Standard Universe.
 Use in Vanilla, Java, …
› Libraries provided for Java and for C, e.g. :
Java: FileInputStream -> ChirpInputStream
C : open() -> chirp_open()
www.cs.wisc.edu/condor
18
shadow
starter
Secure Remote I/O
I/O Server
Local System Calls
I/O Proxy
Fork
Job
Home
File
System
Submission Site
I/O Library
Execution Site
Local I/O
(Chirp)
Job Policy Expressions
› User can supply job policy
expressions in the submit file.
› Can be used to describe a successful
run.
on_exit_remove = <expression>
on_exit_hold = <expression>
periodic_remove = <expression>
periodic_hold = <expression>
www.cs.wisc.edu/condor
20
Job Policy Examples
› Do not remove if exits with a signal:
›
›
on_exit_remove = ExitBySignal == False
Place on hold if exits with nonzero status or
ran for less than an hour:
on_exit_hold = ((ExitBySignal==False) &&
(ExitSignal != 0)) || ((ServerStartTime –
JobStartDate) < 3600)
Place on hold if job has spent more than 50%
of its time suspended:
periodic_hold = CumulativeSuspensionTime >
(RemoteWallClockTime / 2.0)
www.cs.wisc.edu/condor
21
Firewall Support
› Port Restrictions
 In condor_config file can specify:
LOWPORT = x
HIGHPORT = y
 All dynamic ports will be between x and y
inclusive
› Condor + Firewalls/Private Networks:
 Who: Se-Chang Son
 Time: 9am-12pm Weds
 Where: rm 3387
www.cs.wisc.edu/condor
22
Condor on Windows
› On both NT and Win2k
› New universes added: MPI, Java,
Scheduler (and Globus in the works!)
› DAGMan ported
› CondorView ported
› Run shadow + DAGMan as the user
Allows submission from directories on
shared filesystems
www.cs.wisc.edu/condor
23
And more…
›
›
›
›
›
Unix Man pages
Fetch/consolidate log files remotely
ClassAd chaining
Many DAGMan improvements
Bug fixes, etc…
www.cs.wisc.edu/condor
24
What’s Next?
Future Directions
› Increased focus on standalone tools built
with Condor Technology
 DAGMan
 NeST
 PFS
 HawkEye
 Condor-G
…
www.cs.wisc.edu/condor
25
What’s Next?
› Big Item:
More focus on being a service
provider than just an end-user tool:
Developer APIs / libraries
SOAP access to services
XML representations of user logs,
ClassAds, accounting info, etc.
www.cs.wisc.edu/condor
26
More what’s next…
› Condor on Windows
Increased support from Microsoft
Research
Remote I/O
Complete Shared Filesystem support
Condor-G
› MPI Scheduling Improvements
www.cs.wisc.edu/condor
27
More what’s next…
› New version of ClassAds into Condor
Conditionals !!
• if/then/else
Aggregates (lists, nested classads)
Built-in functions
• String operations, pattern matching, time
operators, unit conversions
Clean implementations in C++ and Java
ClassAd collections
www.cs.wisc.edu/condor
28
More what’s next…
› Re-write of the condor_schedd
Performance enhancements and lowered
resource requirements (particularly RAM)
› Re-write of the checkpoint server
Add secure communication
NEST technology infusion
Enhanced support for multiple servers
Store meta-data along with checkpoint
files
www.cs.wisc.edu/condor
29
Thank you for coming to
Paradyn/Condor Week!
www.cs.wisc.edu/condor
30