transparencies - Indico
Download
Report
Transcript transparencies - Indico
Running Computers in CC
FIO Tools for all Service Managers
Overview
Goal of this presentation
FIO goals
Existing tools (selection)
Having all Service Managers able to use
them
Goal of this presentation
FIO has tools for managing computers
Many of these are of general interest
non-FIO SMs could use them to easily set-up
their systems for administration/operation
Goal 1 : list /select the interesting tools
Goal 2 : identify what is missing for having
non-FIO Service Managers able to use them
easily
FIO goals
Homogeneous way of managing nodes
Efficient and by everybody (?)
Common configuration repository
Promote our tools and methods
Usage of SMS (monitoring, vendor calls,…)
Keep CDB up-to-date
Delegation of tasks (Software updates,
SysAdmins,…)
etc…
Existing FIO tools (selection)
CDB
ELFms
Console service
OPMs (procedures)
Remedy (interventions)
These are the tools, but…
What prevents non-FIO Service
Managers to use those tools ?
Using FIO tools
Missing documentation
Easy to find
Explaining concepts
Giving solutions to Service Managers
Offering good setup
Tools easy to use
What can be improved
Documentation (finding)
Create a single entry point
Improve publishing :
Better structure of web sites
Some (sub-)services web sites not user oriented
Some information is difficult to find
e.g. is in CERN specific area
Locations
(web page/site)
TWiki : currently on a (private) server !
Some documents in private web spaces
Look and feel (IT schema)
Documentation (explaining)
Brief explanations about
Management model
Recommend ELFms or suggest alternatives
(i.e. use AIMS + kickstart file)
Benefits for SM
Expectations from Service Managers
Not all SM are equal (standard, advanced, power user)
SM to identify his implication
Indicate next steps
Depending on “category of user” (SM)
Propose tutorials
Documentation (solutions)
Provide check lists
Publish workflows and use cases
Information to provide according to category
Whom to contact, interface to use
e.g. using SMS, hook into it
e.g. running components
Create HOW-TO’s
from the Service Manager point of view
e.g. adding extra software, having data backed up
e.g. upgrading the kernel
Documentation (solutions)
For the “advanced users”
Describe CDB structure and fields
Which templates holds which information
profile_X, pro_type_Y_Z
Purpose of fields and expected values
List / explain components and features
certainly offering improvements, but not understood
arising some worry (loss of control vs. automation dilemma)
No exhaustive list and descriptions of such components
is known
Offering good setup
CDB registration
Avoid to put nodes in lxnoq cluster
Not good for further maintenance, e.g. by FIO colleagues
Create instead a pro_type_cluster_noq or pro_type_cluster_os
Proactively grant permissions to update templates
Provide a “ready-to-use” template/setup
Based on the current certified OS
With fixed software base(s)
With a minimal monitoring
Foresee a few base components
Tools easy to use
Limit the number of logins/accounts
CDB, Remedy, OPMs, etc…
Simple (web) interface for usual tasks :
(improvements)
To be determined from the check lists
Mainly for non-expert or administrative tasks
configuration interface (CDB) for high level functions
A more sophisticated CDB interface
(advanced users)
cdbop not suitable for every SM
Remember Panguin ?
Missing tools
Configurable alarm system
for thresholds and views (filtering)
and allowing to set recovery actions (and notifications?)
would make visible what is monitored (alarms)
CDB interface(s)
CDB web tool (suite?) being developed
All requirements collected ?
Deserves a better visibility
User guide or self explanatory pages
Panguin replacement
Also showing impacted machines/clusters when editing a
template
Even more tools
No (kind of) "control centre" which would allow :
to group access to other tools (entry point)
to offer access to the console with a simple click
to change the configuration and trigger the
actions on relevant target nodes
making use of the ELFms tools whenever possible
Lemon-status started to integrate access to other
parts of the system (templates, Remedy tickets…)