Network Administration - Pravin Shetty > Resume
Download
Report
Transcript Network Administration - Pravin Shetty > Resume
Network
Administration
Research and Analysis
Week-7
Theory of Network Admin
Burgess – Ch.11
Science vs. Technology
Studying Complex Systems
Purpose of Observation
Evaluation Methods and Problems
Evaluating a Hierarchical System
Deterministic and Stochastic Behaviour
Observational Errors
Strategic Analyses
A Scientific Basis
for System Administration
System admin has always involved
experimentation
Development of Networks has lead to
exponential increase in system complexity
and corresponding increase in difficulty of
Management
A purely mechanical approach may no longer
be adequate: time for a theoretical basis….
World-wide interest, encouraged by
professional organisations (SAGE, USENIX,
ACM, IEEE, ACS)
Network Admin Research
Science vs Technology
System
Admin studies mostly
“Applied Research” which result in
the development of a specialised
toolset that solves local/specific
problem
Some workers have attempted to
collate results to form a more
general technology of more
permanent or global value.
But this is not Science !
What is “Science”?
The Scientific Method
Knowledge advanced by series of studies
that either verify/falsify a hypothesis
Study may be theoretical or practical but
all contribute to a larger on-going
discussion that leads to progress
A single study is rarely the end of the
discussion
Each study is usually repeated and verified
or challenged by other researchers
Reproducibility is very important
Scientific Method
Motivation – statement of context and
objectives
Appraisal of problems
Theoretical Model - used to understand or
solve problems and provide a framework for
comparison and measurement
Design an experiment – the Approach
Perform an Experiment – obtain Results
Evaluation or Verification of Approach and
Results
Scientific Method
Science
is a dialog of Theories
Science proceeds by Experiment
Need Theory to interpret
observations
Need observations to disprove
Theory
Network Admin Research:
Studying Complex Systems
Areas
of study in System Admin have
been Technical and/or Behavioural and
include:
– Reliability studies
– Finding and evaluating methods for system
integrity
– Observation which apply to non-linear
behaviour
– Issues related to strategy and planning
Mostly
study
Empirical or Qualitative case
Purpose of Observation
Gather
Info about a Problem to
enable development of a Technology
which solves it
To evaluate the Technology for
effectiveness
(ie whether it fulfils it’s design goals)
But
evaluation of SysAdmin
experiments is difficult due to Vested
Interests and lack of clearly defined
metrics
Evaluation Methods
and some Problems
Ideally
there should be a repeatable
test yielding measurements
The trouble is that while a good
system administrator could do this
heuristically, these are
– Very difficult to quantify
– Different SysAdmins work in different
ways
– Extreme variability in systems and
users
Some Research Topics
Efficiency
& Automation
Network Administration
methods/models
Reliability Studies
– Fault management
– Metrics
– Patterns of events
prediction & performance
Eg
A Common Research topic
and the problems
Ways to relieve Administrators of tedious
work, so they can use there talents
better in other ways. What sort of
experiment is needed?
Measure time spent working on a system
but the time required usually expands to occupy the
time available!
Record actions of an automatic system
and compare with those of a human
administrator
but depends on the person - different people do things in
different ways
Network Admin Research:
effect of Vested Interests….
SysAdmins require tools….
Such tools often acquire a dedicated
following of users who grow to like them
regardless of what the tools allow them to
achieve
Marketing skills of one software vendor
might be better than others and create a
bias in the marketplace that effects the
perceived usefulness of a particular tool
So one cannot estimate the effectiveness
of a tool based just on the number of those
who use it
Evaluating Hierarchical System
What level of detailed decomposition of
levels within the hierarchy is appropriate?
Building a model of the hierarchy is often
the best way to address complexity – focus
on what’s important or practical
Experiments based on this model might
then involve
–
–
–
–
Measurements
Simulations
Case studies
User surveys
Faults
IEEE classify software anomalies as:
O/S crash
Program hang
Program crash
Input problem
Output problem
Failed required
performance
Perceived total
failure
System error
message
Service Degraded
Wrong output
No output
Most common faults for
SysAdmin are:
Input
Problem
– Missing or inappropriate configuration
Failed
performance
– Usually through loss of resources
Software
problems can be eliminated
by revaluation of individual software
components
Reliability and Redundancy
R
Average (Mean) time before failure
With parallel or redundant components
Rparallel
With serial or dependent components
Rseries
Probability of Failure
MeanUptime
TotalElapsedTime
1
1
1
...
R1 R 2
Rn
R1 R2 R3 ...
P(t ) exp( t )
R
MTBF and Computers
Computer system MTBF doesn’t account
for:
– Dependency – Not all systems have same
attachments
– Fail-over and Latency of service
Systems may fail, then recover after a single
delay
this may occur repeatedly !!
– Patterns of usage
User behaviour may bias the outcome
Some Metrics
Net
– Total number of packets
– Amount of IP fragmentation
– Density of Broadcast messages
– Number of Collisions
– Number of Sockets(TCP) in and out
– Number of malformed packets
Some Metrics
Storage
– Disk Usage in Bytes
– Disk Operations per Second
– Paging rate (free memory and
thrashing)
Fig 11.2 Daily paging data
Error bars exceed variation of data!
Fig 11.3 Weekly paging data
Also showing extreme variation
Some Metrics
Processes
– Number of privileged processes
– Number of non-privileged processes
– Maximum percentage CPU used in
processes
Some Metrics
Users
– Number logged on
– Total Number
– Average time spent logged on per user
– Load Average
– Disk Usage rise per session per user per
hour
– Latency of Services
Distributions
Delta
– constant X
Uniform – constant Y
Gaussian or Random
Normal – “bell curve”
Black-Body or Planck – approx
exponential
Poisson – random arrival with mean
rate
Pareto – Power Law
Theory of System
Admin
(end)