Transcript RAL Tier 1a
Tier1A Status
Martin Bly
28 April 2003
CPU Farm
• Older hardware:
– 108 dual processors (450, 600 and 1GHz)
– 156 dual processor 1400MHz PIII
• Recent delivery:
– 80 dual 2.66GHz P4 Xeon
– 533MHz FSB, 2GB memory
• Next delivery expected in the summer
Operating Systems
• Operating Systems:
– Redhat 6.2 service will close in May
– Redhat 7.2 service has been in production for
Babar for 6 months.
– New Redhat 7.3 service now available for
LHC/other experiments
• Increasing demands for security updates
becoming problematic.
Disk Farm (last Year)
• Last year – 26 servers, each with 2 external
RAID arrays - 1.7TB disk per server:
– Excellent performance, well balanced system
– Problems with a bad batch of Maxtor drives –
many failures and high error rate – all 620 drives
now replaced by Maxtor.
– Still outstanding problems with Accusys controller
failing to eject bad drives from RAID set.
Disk Farm (this year)
• Recent upgrade to disk farm.
– 11 dual P4 servers (with PCIx), each with 2
Infortrend IFT-6300 arrays
– 12 Maxtor 200GB Diamondmax Plus 9 drives per
array.
• Not yet in production – but a few snags:
– Original tendered Maxtor: Maxline Plus II drive
was found not to exist.
– Infortrend array has 2TB limit per RAID set – some
(10%) wasted space!
• Nick White ([email protected]) for more
info
New Projects
• Basic fabric performance monitoring
(ganglia)
• Resource CPU accounting (based on
PBS accounts/mysql)
• New CA in production
• New batch scheduler (MAUI)
• Deploy new helpdesk (May)
Ganglia Monitoring
• Urgently needed live performance and
utilisation monitoring
– RAL Ganglia Monitoring (live)
– RAL Ganglia Monitoring (Static)
• Scalable solution based on multicast
• Very rapidly deployable - reasonable support
on all Tier1A Hardware
• See: http://ganglia.sourceforge.net/
PBS Accounting Software
• Need to keep track of system CPU and disk
usage.
• Home grown PBS accounting package
(Derek Ross):
– Upload PBS and disk stats into MYSQL
– Process with perl DBI script
– Serve via Apache
• http://www.gridpp.rl.ac.uk/stats
• Contact Derek ([email protected]) for more
info.
MAUI/PBS
• Maui scheduler has been in production for
last 3 months.
• Allows extremely flexible scheduling with
many features. But ….
– Not all of it works – we have done much work with
developers for fixes.
– Major problem – MAUI schedules on wall clock
time – not CPU time. Had to bodge it!!
New Helpdesk Software
• Old helpdesk mail based/unfriendly.
• With additional staff, urgently need to deploy
new solution.
• Expect new system to be based on free
software – probably Request Tracker
• Hope that deployed system will also meet
needs of Testbed and may also satisfy Tier 2
sites.
• Expect deployment by end of May.
• http://requestracker.gridpp.rl.ac.uk/ (Static)
Outstanding Issues/worries
• We have to run many distinct services. For
example, FERMI Linux, RH 6.2/7.2/7.3, EDG
testbeds, LCG …
• Farm management is getting very complex.
We need better tools and automation.
• Security Is becoming a big concern again.