transparencies - Indico

Download Report

Transcript transparencies - Indico

Running Computers in CC
FIO Tools for all Service Managers
Overview




Goal of this presentation
FIO goals
Existing tools (selection)
Having all Service Managers able to use
them
Goal of this presentation





FIO has tools for managing computers
Many of these are of general interest
non-FIO SMs could use them to easily set-up
their systems for administration/operation
Goal 1 : list /select the interesting tools
Goal 2 : identify what is missing for having
non-FIO Service Managers able to use them
easily
FIO goals

Homogeneous way of managing nodes



Efficient and by everybody (?)
Common configuration repository
Promote our tools and methods




Usage of SMS (monitoring, vendor calls,…)
Keep CDB up-to-date
Delegation of tasks (Software updates,
SysAdmins,…)
etc…
Existing FIO tools (selection)





CDB
ELFms
Console service
OPMs (procedures)
Remedy (interventions)
These are the tools, but…
What prevents non-FIO Service
Managers to use those tools ?
Using FIO tools

Missing documentation





Easy to find
Explaining concepts
Giving solutions to Service Managers
Offering good setup
Tools easy to use

What can be improved
Documentation (finding)


Create a single entry point
Improve publishing :

Better structure of web sites


Some (sub-)services web sites not user oriented
Some information is difficult to find


e.g. is in CERN specific area
Locations



(web page/site)
TWiki : currently on a (private) server !
Some documents in private web spaces
Look and feel (IT schema)
Documentation (explaining)

Brief explanations about

Management model

Recommend ELFms or suggest alternatives
(i.e. use AIMS + kickstart file)


Benefits for SM
Expectations from Service Managers



Not all SM are equal (standard, advanced, power user)
SM to identify his implication
Indicate next steps


Depending on “category of user” (SM)
Propose tutorials
Documentation (solutions)

Provide check lists



Publish workflows and use cases



Information to provide according to category
Whom to contact, interface to use
e.g. using SMS, hook into it
e.g. running components
Create HOW-TO’s



from the Service Manager point of view
e.g. adding extra software, having data backed up
e.g. upgrading the kernel
Documentation (solutions)

For the “advanced users”

Describe CDB structure and fields




Which templates holds which information
profile_X, pro_type_Y_Z
Purpose of fields and expected values
List / explain components and features



certainly offering improvements, but not understood
arising some worry (loss of control vs. automation dilemma)
No exhaustive list and descriptions of such components
is known
Offering good setup

CDB registration

Avoid to put nodes in lxnoq cluster




Not good for further maintenance, e.g. by FIO colleagues
Create instead a pro_type_cluster_noq or pro_type_cluster_os
Proactively grant permissions to update templates
Provide a “ready-to-use” template/setup




Based on the current certified OS
With fixed software base(s)
With a minimal monitoring
Foresee a few base components
Tools easy to use

Limit the number of logins/accounts


CDB, Remedy, OPMs, etc…
Simple (web) interface for usual tasks :




(improvements)
To be determined from the check lists
Mainly for non-expert or administrative tasks
configuration interface (CDB) for high level functions
A more sophisticated CDB interface
(advanced users)


cdbop not suitable for every SM
Remember Panguin ?
Missing tools

Configurable alarm system




for thresholds and views (filtering)
and allowing to set recovery actions (and notifications?)
would make visible what is monitored (alarms)
CDB interface(s)

CDB web tool (suite?) being developed




All requirements collected ?
Deserves a better visibility
User guide or self explanatory pages
Panguin replacement

Also showing impacted machines/clusters when editing a
template
Even more tools


No (kind of) "control centre" which would allow :
 to group access to other tools (entry point)
 to offer access to the console with a simple click
 to change the configuration and trigger the
actions on relevant target nodes
 making use of the ELFms tools whenever possible
Lemon-status started to integrate access to other
parts of the system (templates, Remedy tickets…)