How to build a NOC - UW Information Technology Wiki
Download
Report
Transcript How to build a NOC - UW Information Technology Wiki
How to Build a NOC
Pacific NorthWest GigaPop
November 29, 2007
4/2/2016
How to Build a NOC
Key Elements to Building a NOC
•
Identify Customers
– Who are your customers?
•
Understand Customer Expectations
– What are your user expectations?
– SLA’s?
•
Support Service Offerings
– Besides networking, what other services are
being offered?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
• Monitor and Troubleshoot Service Issues
– How large and complex is your network
– What level troubleshooting and/or monitoring
will your NOC do?
– How will you communicate outages and
planned work to customers?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
Determine Appropriate Staffing Levels
– What will be the service hours?
•
•
Not all NOCs need to be 7x24x365
What about holidays? Weekends? On-call?
– Do your SLA's require in-person staffing?
– Do you have after hours service response
requirements?
– What level of staff needs to be present, and
when?
– What will be the means of responding to
issues when NOC is not staffed 24x7?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
Organizational Structure
– What staffing tiers/hierarchy will you have
for support? Techs? Leads? NEs?
– What will be your escalation practices and
policies?
– To what group does the NOC report?
– What other groups report there, and what is
your organizational relationship to other key
groups?
– Who will write and update procedures,
9/11/2007 training manuals,
How to etc.?
Build a NOC
Key Elements to Building a NOC
•
Location and Design of NOC Facility
– How much space do you need?
– What is your facility like?
– How do you want to arrange your
staff? Separate offices? "War room"?
•
NOC Funding
– How will your organization be funded?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
NOC Tools
– How will you track customer
information? (Database needs, CRM?)
– How will you monitor and
troubleshoot? Tools, specifically.
– Are you writing any of your own tools?
– Who will maintain your applications?
– How will you track trouble tickets?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
Reporting
– What reports will you issue?
– How will you measure the data?
9/11/2007
How to Build a NOC
NOC Evolution
• What factors may determine operational
changes for your organization - new
services, expanded hours, increased number of
customers, new equipment types, deeper skill
level
9/11/2007
How to Build a NOC
Building a NOC
Pacific NorthWest GigaPop
Case Study
October 3, 2007
4/2/2016
How to Build a NOC
Customers and Expectations
• Our customers are in the Pacific
Northwest and Pacific Rim.
• We track customer information in a
database designed and maintained locally.
• Time zones have driven our need for 7x24.
• We cover 15 time zones which also makes
scheduling planned maintenance difficult.
9/11/2007
How to Build a NOC
Customers and Expectations
• Escalation policies also drive our need for
on-call schedules and on-site personnel.
• Our customers expect 48 hours notice
prior to work, unless it is an emergency.
• Outages are communicated via an
application we have built locally.
• We also want to know when our
customers have planned work.
9/11/2007
How to Build a NOC
Customers and Expectations
• Expectations are included in the contract
and located on the PNWGP web page.
• We prefer to have the NOC as the primary
customer contact point for our organization
in order to maintain quality of service when
a ticket moves between groups.
9/11/2007
How to Build a NOC
Supported Services
• In addition to network monitoring for
PNWGP, our NOC monitors:
• UW campus connectivity, including two
hospitals (both layer 2 and layer 3, and
wired as well as wireless, many UPSs),
• Approximately 500 sites throughout WA
state for the K-20 network,
• Pacific Wave, and Transit Rail.
9/11/2007
How to Build a NOC
Supported Services
• We do not do any host, server, or
customer application monitoring.
• We do not do desktop support.
• We do not reset passwords or arrange
services such as email and web.
• We do not monitor data center security
alarms such as for fire or flood.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
PNWGP: approx. 20 customers, 5 node
sites, MPLS, BGP, VPN
Campus: 115,000 connected devices,
~100 remote locations, ~7,000 wireless
APs, ~4,000 layer 2 switches, ~150
routers.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
Over 450 Washington state K-20 sites
monitored.
Transit Rail, approx. 5 node sites,BGP and
ISIS status.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
PNWGP troubleshoots with the customer
on routing problems, latency, and loss of
connectivity.
IS-IS and IGP status is monitored.
We manage DWDM, Sonet, and MPLS
circuits.
Complexity is increased with escalation
paths being different depending on what
isn’t working.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
Outages and planned events are sent via
email announcement in a standard format.
We include the date/time of the work or
when the outage began.
If the customer’s connectivity is entirely
down, we also call or page them.
Updates are sent at predefined intervals
for large events, or when we have a
change in status.
9/11/2007
How to Build a NOC
Staffing
• We are 7x24 with full-time staff.
• Weekends we only have one person
covering each day, so vacations and sick
time are problems.
• Holidays are covered by on-site and oncall staff.
• On-call consists of a 7-day period and
rotates among all NOC staff on a regular
schedule.
9/11/2007
How to Build a NOC
Staffing
• Tier 1 are student staff, also called
Network Analysts.
• Tier 2 are full-time staff, most are titled
Network Specialists.
• Tier 3 are full-time staff, titled Network
Engineers.
9/11/2007
How to Build a NOC
Staffing
• We advertise to all customers an on-site
7x24 staff for immediate response to
outages.
• Our SLAs indicate when a problem should
be escalated beyond the oncall staff to a
manager, director, and higher at any time
of the day or week.
9/11/2007
How to Build a NOC
Staffing
• Day shift M-F requires that we have
engineers (Tier 3) staff in the building.
• Off-hours we have Tier 2 staff always
present in the building. They will escalate
as necessary, either due to outage
severity or complexity, or at a customer’s
request.
9/11/2007
How to Build a NOC
Staffing
• To maximize staffing efficiencies, 2nd and
3rd shift personnel report to Computer
Operations managers, rather than the
NOC manager.
• These staff provide all of the same support
and have the same training/access as the
daytime Tier 2 staff in the NOC.
9/11/2007
How to Build a NOC
Staffing
• We have on-call lists with a primary and
secondary person for backup when NOC
staff is not on-site.
• A separate call list exists for escalation to
managers.
• Engineering and other service groups are
also available on-call 7x24x365.
9/11/2007
How to Build a NOC
Organizational Structure
• PNWGP has recently adopted a tiered
approach for staffing.
• NOC Network Engineers are available
after 6pm M-F and weekends only via oncall.
• One manager, reporting to the Director of
Operations.
9/11/2007
How to Build a NOC
Organizational Structure
• Our escalation practices and policies are
based on length or severity of outage.
• At predetermined intervals additional
management levels are notified of severe
outages in order to help with escalation at
other organizations (telcos), or to keep
peers updated at the affected sites.
9/11/2007
How to Build a NOC
Organizational Structure
• Many people in the NOC work to provide
and update training materials; SMEs write
procedures.
• We use a wiki to maintain our
documentation.
• We were able to have a tech writing expert
give us training on writing effective
procedural documents and re-organize our
wiki.
9/11/2007
How to Build a NOC
NOC Location
• Currently, our NOC has some singleoccupant offices and some multiple (2-3).
We are all on the same floor.
• We find that being located near the
Network Engineering team is quite helpful
for urgent escalations.
• Next year we will be moving into a new
location and we do not yet have details on
what that will look like.
9/11/2007
How to Build a NOC
NOC Location
• Consider lighting and noise control with
shared offices.
• How many monitors will each person
need? Will you use a large central monitor
for some things?
• Provide an impromptu meeting space for
collaboration on big events.
9/11/2007
How to Build a NOC
NOC Location
• Conference bridges greatly enhance
collaboration across geographic distances
whether working on outages or events.
9/11/2007
How to Build a NOC
NOC Funding
• UW funding comes primarily from the state
tax payers of Washington,and UW Medical
Centers.
• Funding also comes from the PNWGP and
customers and WA state K-20 network.
• If you recharge you will need a business
and billing model.
• Or will you use time and materials?
9/11/2007
How to Build a NOC
Tools
• A very useful tool is live chat or IM for
coordinating efforts no matter where your
office is.
• Our customer information is tracked in a
home-grown database which has grown
and morphed over a dozen years.
• New needs such as SLAs and layer 1 info
now require significant investment in
upgrades.
9/11/2007
How to Build a NOC
Tools
• Our monitoring system – surprise – is also
“homegrown”.
• We monitor interface state and IP
reachability; performance and protocol
state connectivity will soon be integrated
into our “event system” (NMS)
• Automated tools can page the appropriate
group to notify them of outages or
threshold conditions.
9/11/2007
How to Build a NOC
Tools
• We have a separate Tools team (with 10
staff members) who design, write,
implement, and maintain tools.
• This allows us to have full-featured and
robust tools.
• One trade-off is fewer “one-off” tools for
specific or isolated issues.
9/11/2007
How to Build a NOC
Tools
• PNWGP uses Request Tracker, RT, an
open-source application to track trouble
tickets.
• Weekly reports are generated for our
Directors by sector, severity, and type.
• Monthly reports are generated by sector
for billing purposes.
9/11/2007
How to Build a NOC
Reporting
•
Key metrics we track include:
1.
2.
3.
4.
Ticket numbers by sector for billing
Phone call volumes
Duration of outages
Root Cause Analysis for high-impact events
•
Outage time is measured by duration of
the customer impact.
After Action Review and Follow-up is
conducted for serious events.
•
9/11/2007
How to Build a NOC
Reporting
• Monthly report is emailed to the customer
for traffic sent to/from their site.
• Our internal reporting includes “operational
impacts” to groups under our main
organization.
• How do you measure your NOC’s
success? Response times? Reduced
calls?
9/11/2007
How to Build a NOC
NOC Evolution
• Factors that have determined operational
changes for our organization have been
increased size, complexity and number of
networks monitored;
• Need to respond to outages 24 hours/day
with on-site personnel rather than paging;
• Skill and responsibility levels have
increased significantly, and continue to do
so.
9/11/2007
How to Build a NOC