How to build a NOC - UW Information Technology Wiki

Download Report

Transcript How to build a NOC - UW Information Technology Wiki

How to Build a NOC
Pacific NorthWest GigaPop
November 29, 2007
4/2/2016
How to Build a NOC
Key Elements to Building a NOC
•
Identify Customers
– Who are your customers?
•
Understand Customer Expectations
– What are your user expectations?
– SLA’s?
•
Support Service Offerings
– Besides networking, what other services are
being offered?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
• Monitor and Troubleshoot Service Issues
– How large and complex is your network
– What level troubleshooting and/or monitoring
will your NOC do?
– How will you communicate outages and
planned work to customers?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
Determine Appropriate Staffing Levels
– What will be the service hours?
•
•
Not all NOCs need to be 7x24x365
What about holidays? Weekends? On-call?
– Do your SLA's require in-person staffing?
– Do you have after hours service response
requirements?
– What level of staff needs to be present, and
when?
– What will be the means of responding to
issues when NOC is not staffed 24x7?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
Organizational Structure
– What staffing tiers/hierarchy will you have
for support? Techs? Leads? NEs?
– What will be your escalation practices and
policies?
– To what group does the NOC report?
– What other groups report there, and what is
your organizational relationship to other key
groups?
– Who will write and update procedures,
9/11/2007 training manuals,
How to etc.?
Build a NOC
Key Elements to Building a NOC
•
Location and Design of NOC Facility
– How much space do you need?
– What is your facility like?
– How do you want to arrange your
staff? Separate offices? "War room"?
•
NOC Funding
– How will your organization be funded?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
NOC Tools
– How will you track customer
information? (Database needs, CRM?)
– How will you monitor and
troubleshoot? Tools, specifically.
– Are you writing any of your own tools?
– Who will maintain your applications?
– How will you track trouble tickets?
9/11/2007
How to Build a NOC
Key Elements to Building a NOC
•
Reporting
– What reports will you issue?
– How will you measure the data?
9/11/2007
How to Build a NOC
NOC Evolution
• What factors may determine operational
changes for your organization - new
services, expanded hours, increased number of
customers, new equipment types, deeper skill
level
9/11/2007
How to Build a NOC
Building a NOC
Pacific NorthWest GigaPop
Case Study
October 3, 2007
4/2/2016
How to Build a NOC
Customers and Expectations
• Our customers are in the Pacific
Northwest and Pacific Rim.
• We track customer information in a
database designed and maintained locally.
• Time zones have driven our need for 7x24.
• We cover 15 time zones which also makes
scheduling planned maintenance difficult.
9/11/2007
How to Build a NOC
Customers and Expectations
• Escalation policies also drive our need for
on-call schedules and on-site personnel.
• Our customers expect 48 hours notice
prior to work, unless it is an emergency.
• Outages are communicated via an
application we have built locally.
• We also want to know when our
customers have planned work.
9/11/2007
How to Build a NOC
Customers and Expectations
• Expectations are included in the contract
and located on the PNWGP web page.
• We prefer to have the NOC as the primary
customer contact point for our organization
in order to maintain quality of service when
a ticket moves between groups.
9/11/2007
How to Build a NOC
Supported Services
• In addition to network monitoring for
PNWGP, our NOC monitors:
• UW campus connectivity, including two
hospitals (both layer 2 and layer 3, and
wired as well as wireless, many UPSs),
• Approximately 500 sites throughout WA
state for the K-20 network,
• Pacific Wave, and Transit Rail.
9/11/2007
How to Build a NOC
Supported Services
• We do not do any host, server, or
customer application monitoring.
• We do not do desktop support.
• We do not reset passwords or arrange
services such as email and web.
• We do not monitor data center security
alarms such as for fire or flood.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
 PNWGP: approx. 20 customers, 5 node
sites, MPLS, BGP, VPN
 Campus: 115,000 connected devices,
~100 remote locations, ~7,000 wireless
APs, ~4,000 layer 2 switches, ~150
routers.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
 Over 450 Washington state K-20 sites
monitored.
 Transit Rail, approx. 5 node sites,BGP and
ISIS status.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
 PNWGP troubleshoots with the customer
on routing problems, latency, and loss of
connectivity.
 IS-IS and IGP status is monitored.
 We manage DWDM, Sonet, and MPLS
circuits.
 Complexity is increased with escalation
paths being different depending on what
isn’t working.
9/11/2007
How to Build a NOC
Monitoring and Troubleshooting
 Outages and planned events are sent via
email announcement in a standard format.
 We include the date/time of the work or
when the outage began.
 If the customer’s connectivity is entirely
down, we also call or page them.
 Updates are sent at predefined intervals
for large events, or when we have a
change in status.
9/11/2007
How to Build a NOC
Staffing
• We are 7x24 with full-time staff.
• Weekends we only have one person
covering each day, so vacations and sick
time are problems.
• Holidays are covered by on-site and oncall staff.
• On-call consists of a 7-day period and
rotates among all NOC staff on a regular
schedule.
9/11/2007
How to Build a NOC
Staffing
• Tier 1 are student staff, also called
Network Analysts.
• Tier 2 are full-time staff, most are titled
Network Specialists.
• Tier 3 are full-time staff, titled Network
Engineers.
9/11/2007
How to Build a NOC
Staffing
• We advertise to all customers an on-site
7x24 staff for immediate response to
outages.
• Our SLAs indicate when a problem should
be escalated beyond the oncall staff to a
manager, director, and higher at any time
of the day or week.
9/11/2007
How to Build a NOC
Staffing
• Day shift M-F requires that we have
engineers (Tier 3) staff in the building.
• Off-hours we have Tier 2 staff always
present in the building. They will escalate
as necessary, either due to outage
severity or complexity, or at a customer’s
request.
9/11/2007
How to Build a NOC
Staffing
• To maximize staffing efficiencies, 2nd and
3rd shift personnel report to Computer
Operations managers, rather than the
NOC manager.
• These staff provide all of the same support
and have the same training/access as the
daytime Tier 2 staff in the NOC.
9/11/2007
How to Build a NOC
Staffing
• We have on-call lists with a primary and
secondary person for backup when NOC
staff is not on-site.
• A separate call list exists for escalation to
managers.
• Engineering and other service groups are
also available on-call 7x24x365.
9/11/2007
How to Build a NOC
Organizational Structure
• PNWGP has recently adopted a tiered
approach for staffing.
• NOC Network Engineers are available
after 6pm M-F and weekends only via oncall.
• One manager, reporting to the Director of
Operations.
9/11/2007
How to Build a NOC
Organizational Structure
• Our escalation practices and policies are
based on length or severity of outage.
• At predetermined intervals additional
management levels are notified of severe
outages in order to help with escalation at
other organizations (telcos), or to keep
peers updated at the affected sites.
9/11/2007
How to Build a NOC
Organizational Structure
• Many people in the NOC work to provide
and update training materials; SMEs write
procedures.
• We use a wiki to maintain our
documentation.
• We were able to have a tech writing expert
give us training on writing effective
procedural documents and re-organize our
wiki.
9/11/2007
How to Build a NOC
NOC Location
• Currently, our NOC has some singleoccupant offices and some multiple (2-3).
We are all on the same floor.
• We find that being located near the
Network Engineering team is quite helpful
for urgent escalations.
• Next year we will be moving into a new
location and we do not yet have details on
what that will look like.
9/11/2007
How to Build a NOC
NOC Location
• Consider lighting and noise control with
shared offices.
• How many monitors will each person
need? Will you use a large central monitor
for some things?
• Provide an impromptu meeting space for
collaboration on big events.
9/11/2007
How to Build a NOC
NOC Location
• Conference bridges greatly enhance
collaboration across geographic distances
whether working on outages or events.
9/11/2007
How to Build a NOC
NOC Funding
• UW funding comes primarily from the state
tax payers of Washington,and UW Medical
Centers.
• Funding also comes from the PNWGP and
customers and WA state K-20 network.
• If you recharge you will need a business
and billing model.
• Or will you use time and materials?
9/11/2007
How to Build a NOC
Tools
• A very useful tool is live chat or IM for
coordinating efforts no matter where your
office is.
• Our customer information is tracked in a
home-grown database which has grown
and morphed over a dozen years.
• New needs such as SLAs and layer 1 info
now require significant investment in
upgrades.
9/11/2007
How to Build a NOC
Tools
• Our monitoring system – surprise – is also
“homegrown”.
• We monitor interface state and IP
reachability; performance and protocol
state connectivity will soon be integrated
into our “event system” (NMS)
• Automated tools can page the appropriate
group to notify them of outages or
threshold conditions.
9/11/2007
How to Build a NOC
Tools
• We have a separate Tools team (with 10
staff members) who design, write,
implement, and maintain tools.
• This allows us to have full-featured and
robust tools.
• One trade-off is fewer “one-off” tools for
specific or isolated issues.
9/11/2007
How to Build a NOC
Tools
• PNWGP uses Request Tracker, RT, an
open-source application to track trouble
tickets.
• Weekly reports are generated for our
Directors by sector, severity, and type.
• Monthly reports are generated by sector
for billing purposes.
9/11/2007
How to Build a NOC
Reporting
•
Key metrics we track include:
1.
2.
3.
4.
Ticket numbers by sector for billing
Phone call volumes
Duration of outages
Root Cause Analysis for high-impact events
•
Outage time is measured by duration of
the customer impact.
After Action Review and Follow-up is
conducted for serious events.
•
9/11/2007
How to Build a NOC
Reporting
• Monthly report is emailed to the customer
for traffic sent to/from their site.
• Our internal reporting includes “operational
impacts” to groups under our main
organization.
• How do you measure your NOC’s
success? Response times? Reduced
calls?
9/11/2007
How to Build a NOC
NOC Evolution
• Factors that have determined operational
changes for our organization have been
increased size, complexity and number of
networks monitored;
• Need to respond to outages 24 hours/day
with on-site personnel rather than paging;
• Skill and responsibility levels have
increased significantly, and continue to do
so.
9/11/2007
How to Build a NOC