Transcript presented
Python@Work
2007-Jul-12
PkManager.py
Howard Kapustein
Director of Technology and Architecture
Manhattan Associates
Background
Director of Technology and Archicture
Manhattan Associates, 7 years
EPCglobal Reader Protocol 1.0, Co-Chairman [RFID]
SMS, Platform Services (Architecture/Subsystems), 12 years
Open Source submitter (Jython, among others)
20 years experience
Acronym Soup: C++, Java, Visual Basic, Windows, Unix, concurrency,
i18n, security, RDBMS, GUI, web server, XML, TCP, web services,
REST, AJAX, JSON(!), oodles more
No COBOL, No Perl
Python since 1997
Switched from AWK
Thompson AWK compiler created EXEs
Tried to learn TCL – not so much
Stumbled over Eric Raymond's “Why Python?” essay
Getting 'long in the tooth'
Made perfect sense Google for “why python eric raymond”
Python is Beautiful (even back when it was 1.5)
Rich language, Richer library
Thank god for py2exe
Pywin32 is pretty handy too
Application
Warehouse Management Open Systems (WMOS)
Large C++, CORBA, portable 'enterprise' application
>8 million lines of code
Borland's Visibroker for C++
AIX, HP-UX, Linux, Solaris, Windows
24x7x365 – Near-realtime 'Execution' system
Heavy RF+MHE interaction
99% of activity is high volume, low latency
Heavy customization element
Routinely modified for every customer
i.e. 1 hour outage = millions of dollars
Each customer = Forked codebase
IOW more variables + post-release
Performance, Scalability, Latency, Reliability, Resiliency
The 'not negotiable' family
Problem
CORBA process:
Server = EXE: initializes, registers available factories with ORB, responds to
requests
Client:
ORB knows what factories are supposed-to-be and actually-available
Borland: If requested factory not running, ORB asks the Object Activation Demon
(OAD) to start it [Just-In-Time Activation]
Problem: OAD stability is abominable
Runs for hours/days, then randomly hangs or crashes for no apparent reason
But JIT support made it popular for non-production (test, dev, …)
Doesn't mean we didn't regularly see support issues due to folks using the OAD
Homegrown replacements:
PkPad: Unix shell script, pre-start list of processes, polling via ps to determine
premature death to restart
Cons: 30 second sleep between sweep (or huge perf hit), no JIT, no management
PkManager.exe: NT Service, multithreaded, interrupt-driven (no polling)
Factory*f=bind(“factory”); Object*o=f->newInstance();
o->DoStuff(); release o; release f; //aka delete
Cons: Windows only (<20% customers), no JIT
Solution: PkManager.py
Superset: JIT + PreStart, interrupt-based (no polling), administration interface
And by-god-rock-solid-reliable!
Basic Architecture
Global Variable: timeToExit = threading.Event()
Thread 1: Main
Initialize (parse command line etc)
Start worker threads
Main loop
while not timeToExit.isSet():
time.sleep(0.1)
Thread 2: Monitor (Process Manager)
while not timeToDie.isSet():
ProcessRequests(); StartChildren(); WaitForDeath()
timeToExit.set()
Thread 3: API (Web Server)
JIT requests
Administration Console
Web Services
Thread 4: Uptime (Reporter)
while not timeToDie.isSet():
print 'Uptime: %s since %s' % (now-startup, startup)
timeToDie.wait(n)
Configuration (DSL)
Configuration file = Domain Specific Language (DSL)
[[wmosprod.dat]]
Python dictionaries are sweet!
Look ma, it's JSON
symbols={'N':'order', 'OnStart':'#prestart', 'JIT':'#ondemand', …}
config = []
lineno = 0
for line in open('wmosprod.dat').readline().strip():
lineno += 1
try:
entry = eval(line, {}, symbols)
config += entry
except:
print 'Error line %d' % (lineno)
errors += 1
if errors > 0:
raise UserWarning('Uh-oh…')
Users see simple and obvious configuration
Code is maintainable and simple
Mostly to 'nicely' handle and report errors
Signals – Ouch!
TIP: Do this very early
import signal
signals = dir(signal)
if 'SIGBREAK' in signals:
signal.signal(signal.SIGBREAK, signal.default_int_handler)
if 'SIGTERM' in signals:
signal.signal(signal.SIGTERM, signal.default_int_handler)
Surprises
#1:
SIGBREAK+SIGTERM not always available
#2: Default action is usually terminate
Now except KeybreakException will trip
Threading
All threads use same basic pattern e.g.
process_timeToExit = threading.Event() #Global
class Thread_Monitor(threading.Thread):
def __init__(self, other, parms, …):
…initialize…
def run(self):
try:
…setup…
while not self.timeToDie.isSet():
…do stuff…
except KeyboardInterrupt:
print 'Ctrl-Break detected; terminating…'
except Exception, e:
print FormatException()
process_timeToExit.set()
def stop(self):
self.timeToDie.set()
threading.Thread.join(self, timeout)
threading.Event is your friend
Global Event to coordinate process termination/cleanup
Per-thread communication
“Thread, Kill Thyself” = Event.set(); “Time to die?” = Event.isSet()
“Thread, Art Thou Dead?” = Thread.join()
Alternative, pair of events:
timeToDie = threading.Event()
iAmDead = threading.Event()
def KillThyself(): timeToDie.set()
def TimeToDie(): timeToDie.isSet()
def IAmDead(): iAmDead.set()
def AreYouDeadYet(): iAmDead.isSet()
FormatException()
Simplify exception reporting
def __function__(nFramesUp=1):
"""Create a string naming the function n frames up on the stack."""
co = sys._getframe(nFramesUp+1).f_code
return "%s (%s @ %d)" % (co.co_name, co.co_filename, co.co_firstlineno)
def FormatException(ei=None):
if ei == None:
ei = sys.exc_info()
info = traceback.format_exception(ei[0], ei[1], ei[2])
return ''.join(info)
Typical usage:
try:
DoSomething()
except SomeException:
print FormatException()
Never catch the exception object, though you can
try:
DoSomething()
except SomeException, e:
print FormatException(e)
KeyboardInterrupt
try block necessary per thread
Raised on the active thread when detected
Worse, KeyboardInterrupt derives from StandardException
except Exception eats everything
Including KeyboardInterrupt and SystemExit!
Probably not what you wanted…
This coupled with SIGBREAK fun was a bear to figure out
Python 3000 is supposed to 'fix' this
Changing the exception hierarchy!
Should make porting…fun…
Web Server
PkManager predates WSGI's emergence
class PkManagerWebServer(SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer):
pass
class PkManagerRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
protocol_version = 'HTTP/1.0'
#3
server_version = 'PkManagerHTTP/' + __version__
def do_HEAD(self):
self.do_GET()
def do_POST(self):
self.ProcessRequest(self.rfile)
def do_Get(self):
requestbody = StringIO()
requestbody.seek(0)
self.ProcessRequest(requestbody)
requestbody.close()
def ProcessRequest(self, requestbody):
…parse url…
name = 'Handler_' + path.replace('/', '_')
handler = self.__class__.__dict__.get(name)
if handler is None:
if not self.ServeStaticFile():
self.ProcessResponse(400)
return
else:
result = handler(self)
if 'Cache-Control' not in headers:
headers['Cache-Control'] = 'private, max-age=0'
self.ProcessResponse(statuscode, body, headers)
#n = Item of interest
#1
#2
#4
#4
#4
#5
#6
#7
#8
#8
Request Handlers
PkManagerRequestHandler methods, e.g.
monitor_requests = Queue.Queue() #Global variable
def Handler__process_start(self):
#1
parms = SplitToKVPairs(self.rfile)
processname = parms.get('exe')
if processname is None:
return (400, 'Missing parameter (exe=<name>)')
timeout = int(parms.get('wait', TimeoutDefault))
#2
iamdone = g_EventCache.get(timeout)
request = (self.effective_path, iamdone, processname)
#3
monitor_requests.put(request)
#4
realtimeout = self.TimeoutMSecToRealValue(timeout) #5
if iamdone != None:
iamdone.wait(realtimeout)
if not iamdone.isset()
return (408, None)
g_EventCache.put(iamdone)
return (200, None)
#6
#7
#1: Method name = 'Handler_' + URL's path component
#2: Parameters are fundamentally URL query parameters
#5: Timeout = N or Infinite or NoWait
#6: Wait up to the timeout
#7: If timeout, HTTP status = 408 Request Timeout
#9: Success! HTTP status = 200 OK
#8
#9
EventCache
New Event() per request = Huge Perf Pig
Took 3 hours to identify bottleneck
Only 20 minutes to solve! I Python
class EventCache:
def __init__(self):
self.cache = Queue.Queue()
def get(self, timeout):
if timeout == Timeout_NoWait: return None
try:
event = self.cache.get_nowait()
event.clear()
return event
except Queue.Empty:
return threading.Event()
def put(self, event):
self.cache.put(event)
def __len__(self):
return self.cache.qsize()
g_EventCache = EventCache()
Call get(timeout) for a new Event
Call put(event) to return Event to cache when done
Only if done with the Event
If errors occurred (e.g. timeout), don't put()
Python will clean up the Event object once no longer referenced
Queue.Queue
All inter-thread-communication via Event and Queue
Handler creates a tuple to queue
(resource, event, …parameters…)
Output parameters passed as empty list
Iamdone = Event() : name=[] : age=[] : shoesize=[]
request = (self.effective_path, iamdone, name, age, shoesize)
queue.put(request)
iamdone.wait()
print name[0], age[0], shoesize[0]
Monitor thread pulls requests from queue
def HandleRequests():
try:
while 1:
request = queue.get_nowait()
path = request[0]
name = 'HandleAPIRequest_' + path.replace('/', '_')
handler = globals().get(name) : assert handler != None
handler(request)
except Queue.empty, e:
pass
def HandleAPIRequest__some_service_entrypoint(request):
name=request[2] : age=request[3] : shoesize=request[4]
…do stuff…
name.append(…) : age.append(…) : shoesize.append(…)
iamdone = request[1]
if iamdone != None: iamdone.set()
So effective I ported Queue to C++
Internationalization (i18n)
Initially tried module gettext
Standard. Capable. Simple API. Very similar to GNU gettext API
But…needed simple deployment
“Zero-Install” – anything else is just a support call (or many…)
How to find the message catalogs?
localedir/language/LC_MESSAGES/domain.mo
Create a 3-level tree, with very fixed names, to drop a bunch of localized text resources?
And what about customization?
Bah. Python to the rescue!
[[PkManagerI18N-*.py]]
i18n={} : i18nMeta={}
def i18nLoad(path):
sys.path.insert(0, path)
for root, paths, filenames in os.walk(path)
if fnmatch.fnmatch(filename, 'PkManagerI18N-*.py'):
name = os.path.splitext(filename)[0]
pathname = os.path.join(root, filename)
try:
module = __import__(name)
text = getattr(module, 'Text', None)
if text != None:
meta = getattr(module, 'Meta', None)
for locale in text.iterkeys():
i18n[locale] = text[locale] : i18nMeta = meta[locale]
except (ImportError, SyntaxError), e:
Abort(5, 'Error loading i18n resource %s' % (filename))
del sys.path[0]
Internationalization (i18n) – Part Deux
Simple format
Text = { 'es': { 'About':'Sobre', 'English' : u'Engl\u00e9s', … } }
Meta = { 'es': { 'Name':'Spanish', 'Display' :u'Espa\u00f1ol' } }
But what about complex languages? Python source files can use arbitrary encodings!
# -*- coding: utf8 -*Text = { 'zh': { 'About':u'亸乾些亖亃',
… },
'jp': { 'About':u'ノキアについて', … },
'ar': { 'About':u'} } 'عن
Meta = { 'zh': { 'Name':'Chinese', 'Display':u'中国 ' },
'jp': { 'Name':'Japanese', 'Display':u'日本語 ' },
'ar': { 'Name':'Arabic',
'Display':u'} } 'العربية
One neat trick in module gettext _() is defined as ‘lookup-text’. Nifty idea
print _(‘About’)
def _(s, locale=None, language=None):
if locale==None: locale=options.locale
textlist = i18n.get(locale)
if textlist != None:
text = textlist.get(s)
if text != None:
return text
if language != None
textlist = i18n.get(language)
if textlist != None:
text = textlist.get(s)
if text != None:
return text
if isint(s): return s
else: return ‘[%s]’ % (s)
py2exe
Running PkManager.py is natural on Unix
Not so much on Windows
py2exe binds source + runtime into .exe
# setup.py
from distutils.core import setup
import py2exe
setup(name='PkManager',
version=GetVersion(),
description="WMOS process manager, overseer, care and feederer",
author='Manhattan Associates',
url='http://www.manh.com',
console=[{'script':"PkManager.py",
'icon_resources':[(1, 'PkManager.ico')]}],
zipfile=None, #Append to .exe / no separate .zip
data_files=[('.', [os.path.abspath(r‘wmosprod.dat')])],
options={"py2exe":{"compressed":1,
"optimize":2,
"xref":0,
"includes":[],
"dll_excludes":[]}}
Create the executable
python -OO setup.py py2exe
Replace console parameter to compile an NT Service
service=[{'modules':'PkManager',
'script':"PkManager.py",
'icon_resources':[(1, 'PkManager.ico')]}],
Demo
Questions?
Blog: http://blog.kapustein.com
Email: [email protected]