Monitoring - Instituto de Física Corpuscular

Download Report

Transcript Monitoring - Instituto de Física Corpuscular

VALENCIA TESTBED SITE
IFIC
(Instituto de Física Corpuscular)
Universitat de València-CSIC
1
GoG Farm
192 Athlon PC (134 IFIC + 58 ICMOL)
VIA KT133A & KT266A based MotherBoard
CPU: AMD Athlon K7 @ 1.2 & 1.4 GHz
RAM: 2x512 Mbytes (SDRAM & DDR
SDRAM)
HD: 40 Gbytes
NIC: 3COM 905CX & RealTek RTL8139 (Fast
Ethernet + PXE)
2U chassis
IFIC (CSIC-València)
2
GoG Farm
9 racks (800x600 mm)
22 PCs + 1 network switch per rack
2U chassis with 3 fans
pros:
less space required
cons:
add 240 euros per PC (aprox.)
heat concentration
IFIC (CSIC-València)
3
GoG Farm
IFIC (CSIC-València)
4
GoG Farm
IFIC (CSIC-València)
5
GoG Farm
IFIC (CSIC-València)
6
GoG Farm
IFIC (CSIC-València)
7
GoG Farm
2U chassis Athlon PC
IFIC (CSIC-València)
8
Local Network
IFIC (CSIC-València)
9
Local Network
All Worker Nodes have private IP
addresses in the net 192.168.4.0/22
secure environment
administrative independence from University
network management staff
Communication equipment have public IP
addresses in the net 147.156.149.64/26
can be monitored and upgraded from
University centralized service
IFIC (CSIC-València)
10
Local network
External communications go through
gog01 in which NAT rules are applied
Connections from Worker Nodes to public
nodes are allowed through masquerading
with some restrictions, eg:
FTP requires passive mode
Connections from external nodes to Worker
Nodes requires a previous NAT rule to be
configured.
IFIC (CSIC-València)
11
Monitoring
GoG Heartbeat
Senders (worker nodes) send multicast packets with
node info & status in xml format at time intervals.
Receivers collect info to build statistics or to present
a graphical view of the farm to an operator
pros:
connectionless
several receivers can run at the same time
cons:
everybody can read packets
IFIC (CSIC-València)
12
Monitoring
Main window:
dead/alive status
cpu usage
memory
disk
(picture refers to IFIC network,
not farm)
IFIC (CSIC-València)
13
Monitoring
Client info window
reflects info sent by
the client
as obtained mostly
from /proc
IFIC (CSIC-València)
14
LCFG configuration
LCFG configuration is being deployed
Worker nodes boots always using PXE
DHCP server sends:
A PXE_MENU with remote linux image and
parameters the first time.
A PXE_MENU with local boot other times
IFIC (CSIC-València)
15
LCFG configuration
Use slightly modified LCFG package to:
use PXE booting
allow extended partitions on HD
allow different domain than ypdomain
notify DHCP server at the end of installation
the first time
IFIC (CSIC-València)
16
LCFG configuration
We plan to switch to LCFG to configure
Worker Nodes, but lack of some LCFG
objects prevent us from doing it now
(however, it seems that it will be solved
soon)
But, we will use LCFG to configure our
Crossgrid TestBed for sure.
IFIC (CSIC-València)
17
Current grid Activities
We are participating in this year in the
DATA CHALLENGE (DC1) of ATLAS
experiment with to goals:
Event production in our farm in order to
provide samples for physics studies and make
the Technical Design Report for the end of
2002.
To test in our farm the new software,
including Geant4, the new event data model
and the evaluation of database technologies
(e.g. Root-I/O). IFIC (CSIC-València)
18