Simple Tutorial - reflectometry.org

Download Report

Transcript Simple Tutorial - reflectometry.org

Tutorial for PARK data fitting
Paul KIENZLE, Wenwu CHEN and Ziwen FU
Reflectometry Group
Objective: Distributed Computing Environment
User/Client
ServiceServer
Management
WorkingServer
User
User
User
User
Service Server
Master Node
User
Cluster
Prerequisite
Python:
version >= 2.40
Windows:
cygwin
Client:
wxPython: version >= 2.6
matplot
Most services may need numpy
Setup of park
• Download Source code:
– Source code: svn co svn://[email protected]/park
– Package for unix/linux: park-0.2.0.tar.gz park-0.2.0.tar.bz2
– Package for windows: park-0.2.0.zip
• Edit cluster config file:
– park/config/hosts
• Start service server
– park/servers/mapServer.py
• Start client
– park/client/AppJob.py
• Provide services
– park/services
Setup of park in Unix/Linux
• Download park-0.2.0.tar.gz or park-0.2.0.tar.bz2 from
http://danse.us
• Unzip the file:
tar –xvzf park-0.2.0.tar.gz
• Make the installation:
cd park-0.2.0
make install
or
setup.py install –install-purelib=home_directory_of_park
The command make install is equivalent to setup.py install –
install-purelib=~. It will install park in directory ~/park.
Setup of park in Windows
• Download park-0.2.0.zip or park-0.2.0.tar.bz2 from
http://danse.us
• Unzip the file:
unzip park-0.2.0.zip
• Make the installation in MSDOS window:
cd park-0.2.0
setup.py install
It will install park in directory ~/Lib/site-packages/park.
Edit the config file
The server makes use of park/config/hosts to configure the working nodes.
Example of park/config/hosts:
#
# hosts configure file for park
# example for compufans.ncnr.nist.gov cluster:
# 4 nodes, each node with 2 cpus
#
# the format is similar to that of /ect/hosts:
# ip_address full_name alias_name[:port:number_of_cpus]
#
127.0.0.1
localhost.localdomain localhost:5300:2
#172.16.255.251 n4.ncnr.nist.gov n4:6500:2
#172.16.255.252 n3.ncnr.nist.gov n3:6300:2
#172.16.255.253 n2.ncnr.nist.gov n2:6200:2
#172.16.255.254 n1.ncnr.nist.gov n1:6100:2
Start the server
The server is park/servers/mapServer.py:
cd park/servers
python mapServer.py
Or in cygwin in Windows
cd Lib/site-packages/park/servers
python mapServer.py
The full command is:
python mapServer.py –port port –host host_name –log
log_file_name.
Start the server
• Make sure that python and its environments are set correctly.
• Make sure that RSH defined in park/servers/environ.py is set
to the remote shell command for cluster with multiple working
nodes
• Make sure that this remote shell command can start the
remote command without the password.
• Make sure that the services are executable files.
Common Error:
• [Errno 2] No such file or directory: '~/park/config/hosts': no configure
file hosts.
• ERROR (111, 'Connection refused')
– the working server doesn’t start.
– make sure that the port is not used
• ERROR (xxx, ‘port is used')
– Wait a while before restart the server
– make sure that the port is not used
Stop the server
Shut down the service server by Ctrl-C or kill command.
Use kill without -9 command, which will also stop the
working server program. Otherwise the working server
will continue to work even the service server is killed.
Start the client
•
•
•
•
•
•
•
•
Enter ~/park/client
Run the client application:
$python AppJob.py
Connect the server:
– server > server | port (default port is 5400)
– click connect button to connect the server.
Prepare and submit the service request:
– shell > load : load xml service request, which will be shown in the
upper text field
– click submit button to submit the service request
– the message related to service request is shown in the lower text field.
View the service results:
– view : to view the results.
There are 3 types of data to be viewed: experimental data (with error bar),
simulation data, and chi square. The experimental and simulation data only
show the best results, and chi square shows the improvement of chisq for
data fitting. Under the panel is a toolbar, which can be used to zoom in/out,
save figure, and change the properties of figure (property button).
Shutdown the client:
– server > disconnect then close the window
– or close the window directly.
Map-reduce parallel pattern
• Map: master node assigns working unit [i] to working node [j] :
– map(fn, input[i] ) = output[i] to working node j
• Reduce: master node collection message from each working node
and perform reduce function, and send the result to the user:
– reduce(gn, output[0], …, output[n] ) => send to the user client
Service Server
Mapping
Service Server
reducing
Service request
<?xml version='1.0' encoding='UTF-8'?>
<session version='2.0.1' type='7' user='wwchen‘
email='[email protected]' priority='0' >
<group name='group1'>
<dataSet>
</dataSet>
Reduce function
<reduce classname='Chisq'/>
<task cmd='longwinstr.py' >
map function
<bufsize value='3000'/>
<home value='/home/wwchen/dansesrc/park/services/tester'/>
<cwd value='/home/wwchen/dansesrc/park/servers/tester'/>
</task>
inputs
<joblist name='job1' priority='4' cnt='4'
>
<input count='24'>
</input>
</joblist>
</group>
</session>
Software Infrastructure of PARK
for data fitting
Data presentation
Data reduction
Reduce Service Developer
View Developer
Service Server
User Interface
Data View
Working
Nodes
Data simulation
Model Developer
Service
Service
Service
Service
Scientist
Reduce function
The class inherits from park/services/reduce/reduce.Reduce.
class Reduce:
""" A base class as the reduce function. """
def __init__(self):
""" constructor. """
self.archive = None
self.msgqueue = None
def setArchive(self, archive):
self.archive = archive
""" set the archive to store data """
def setMsgQueue(self, msgqueue):
self.msgqueue = msgqueue
""" set the message queue. """
def __call__(self, msg):
""" called by the PARK to process the reply from the working node. """
pass
A example of Reduce function
park/services/reduce/Chisq.Chisq:
class Chisq (Reduce):
def __init__(self):
Reduce.__init__(self)
self.chisq = None
""" A class to handle the chisq for data fitting. """
""" constructor. """
def __call__(self, reply):
keys = {};
keys['gid'] = reply.gid;
self.archive.put(keys, str(reply))
keys['jid'] = reply.id
if hasattr(reply, 'chisq'):
chisqval = self.chisq
if self.chisq is None:
self.chisq = chisqval
elif chisqval < self.chisq:
self.chisq = chisqval
self.msgqueue.putMsg(reply.gid, '%s<reply gid="%s" update="%s" chisq=%s/>' \
%(XML_HEADER, str(reply.gid), str(reply.id), str(chisqval)))
map function
1. The pure python function.
- Running as a thread in PARK.
- Bad scalability for SMP (due to python multithreading implementation)
- Only works for pure python function.
Format:
output_string function_name(input_string)
• The executable program.
-
Running as a separated process in PARK.
Excellent scalability for SMP
Works for any executable program
Need more memory and long start-up time
Read input from the standard in and output the results to standard out.
A example of map function
park/services/tester/longwinstr.py:
if __name__ == '__main__':
try:
longwin()
except:
sys.stderr.write('Exception:%s' %(sys.exc_info()[1]))
A example of map function
def longwin():
print 'call longwin'
s0 = sys.stdin.read()
node = minidom.parseString(s0).childNodes[0]
t = int(node.getAttribute('count'))
if t > 25:
count = t
else:
count = 2**t
print ' Start work with iteration number: ', t
cnt = 0
while (cnt < count):
a= math.sqrt(2.0)
cnt += 1
print ' finish work: cnt=', cnt
Fully Distributed Services ?
User
Client
Services
Service Register
Message Queue
Job Queue
Cluster Management
Task Management
Service Management
Archive
Data Fetching
Logging
Shared Files
Pull or put ?
Job Server
Message Server
Working Server
1. Job server sends job to working server, and working server
send results to message server
2. Job server sends job to working server, and message
server working retrieve results from working server
3. Working server retrieves job from job server and send
results to message server
4. Working server retrieves job from job server and message
server working retrieve results from working server
Security: authentication and
authorization
Job Server
Security Server
MessageServer
Working Server
Data Transfer
1. Provide the data center server for the cluster, which
will retrieve data from remote data server, and store
the data for the accessing by the local working
nodes. Necessary for diskless nodes in the cluster.
2. Provide the reference to the remote data (similar to
url), and each working node will access the data
individually.
UI/Visualization
MVC model
Traits-UI
2D/3D
Multi-tier of PARK
Client Server
Explicit direct connection
Implicit direct connection
Possible connection
Service Server
Reduce Server
Working Server
All are working as both the server and the client
Data Server
Multi-tier of PARK
Client Server
Explicit direct connection
Implicit direct connection
Possible connection
Service Server
Reduce Server
Working Server
All are working as both the server and the client
Data Server