Transcript Document

Performance Enhancement
and Response Team
TNC 2004,
Rhodes (GR), 09/06/04
Nicolas Simar, Network Engineer
Toby Rodwell, Network Engineer
DANTE
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
PERT Origins
• Where is it coming from?
– Historically it is long distance circuits (the “wide-area”)
that have been the bottleneck in a network.
– Over the last few years, the capacity of long distance
circuits has significantly increased.
– Now, End-to-end performance bottle-necks may occur at
any point in a system, application, hardware or LAN
level in addition to wide-area networks.
– As such, it is becoming more and more difficult for a
non-expert end-user to diagnose their network
performance issues
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
PERT Origins (2)
• A group of NRENs met in December 2002 at
the TERENA offices in Amsterdam…
– Mauro Campanella (GARR), Valentino Cavalli
(TERENA), Larry Dunn (Cisco), Marian Garcia
(DANTE), Simon Leinen (SWITCH), Victor Reijs
(HEAnet), Nicolas Simar (DANTE), Sven Ubik
(CESnet) and Steve R. Williams (Uni. of
Swansea/UKERNA)
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
PERT Origins (3)
• …and they came up with the concept of PERT
– To provide a support structure to investigate and
resolve problems in the performance of applications
over computer networks
– Comparable to CERT structure.
• http://www.dante.net/pert
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
PERT? What is it?
• Performance Enhancement and Response Team
• A virtual team consisting of
– Cross-discipline experts who are capable of identifying
the locations of performance bottle-necks.
– Subject specialists who can precisely diagnose the cause
of a given problem and help the end-users resolve it.
• Monitoring tools
– Deployment of a monitoring infrastructure to ease the
troubleshooting.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
PERT? What is it? (2)
• Information storage, tracking and retrieval
– Tracking system.
• Knowledge base to document
– Known performance issues, with possible ways to
address them.
– Successful diagnostic strategies.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
PERT current status
• PERT is in a Pilot phase
– Informal, unregulated access to PERT; anybody can
request the help from PERT.
– Primary purpose of investigation is to improve PERT’s
knowledge and experience.
– As this is a Pilot phase, the problems are addressed on
a best effort basis.
– No dedicated Monitoring tools.
– RoundUp tracking system (off-the-shelf) used.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
PERT current status
• Please DO e-mail … with any performance
issues you or your customers are experiencing
and would like investigated
• Please DON’T assume the issue will quickly
resolved
– However, very few issues to date have been passed to
the PERT.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
Case 1 - Strasbourg Astrology
Laboratory - Fermilab
• From Fermilab SDSS.
– SDSS – Sloane Digital Sky Survey
– Transfer rate is 5Mbps (rsync)
• PERT contacted
• Potential causes
–
–
–
–
Ethernet interfaces not full duplex mode
TCP buffer size, occasional losses on large RTT path
Application
Hardware
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
Case 1 Strasbourg-Fermilab (2)
• Use of other machines and same software
– US to FR, memory to memory with large buffer using
NTTCP/iperf: 90Mbps
– But when using rsync with large buffer: lower than
20Mbps.
– Investigate now the HW, the application and keep an
eye on the network losses.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
Case 2 – JIVE - GEANT
• Joint Institute VLBI in Europe (JIVE )
– VLBI: Very Long Baseline Interferometry
– JIVE is located in Dwingerloo (The Netherlands) and
collects and correlate data from the European VLBI
network (radio telescopes).
• Test the download of 430MB file from the JIVE
website in Dwingerloo to the University of
Oxford.
– Problems with the systems in Oxford, therefore test
done between JIVE and a GÉANT workstation.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
Case 2 – JIVE (2)
• Initial transfer test:
– Via http, using wget
– Took 5 minutes to complete the 430Mbps transfer,
(approximately 10Mbps throughput)
• PERT case opened
• Potential causes
– Ethernet interfaces not full duplex mode
– Insufficiently large TCP buffers
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
Case 2 – JIVE (3)
– The TCP receive buffers max size on GEANT of
reasonable size
– wget uses the default TCP buffer size size. TCP default
buffer size increased on two receiver (ws4.uk: Linux ->
8MB, ws1.de: Unix -> 196kB)
Improvement: 40Mbps
– Could not access the JIVE webserver to increase the Tx
buffer (critical production machine)
– Access was granted to the JIVE FTP server, where the
Tx buffer was increased to 2MB
Improvement: 90Mbps
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
Lessons Learned
• Identify technical contact at each end
• Determine the scope of testing possible
– Production machines involved, some configurations
changes may not be acceptable for testing purpose.
– Strong test machines needed (close to the user, easy
access to it)
• Wherever possible, use methods to minimise
the amount of variables
– e.g. sink data to /dev/null, memory to memory transfer
not to disk
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
GN2
• SA3: enable End-to-End QoS across European
academic networks on a routine basis.
– Development of a distributed provisioning system
(SA3) and a performance monitoring system (JRA1).
– Need to deploy both systems to be “SA3-compliant”
– GN2 will only ensure QoS edge-to-edge; the PERT will
be crucial to help users to get the best performance
from their end-systems in order that they may make full
use of any Premium IP service they might reserve.
• PERT structure and operation will be designed
by SA3.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
GN2 SA3 milestones
• First 12 months
–
–
–
–
–
–
31 Oct 04 - Establish PERT
31 Jan 05 - PERT Ticket system deployed
28 Feb 05 - PERT troubleshooting procedures published
28 Feb 05 - PERT knoweldgebase operational
31 May 05 - Consultancy service established
31 May 05 - First issue of 'User Guide' and 'Best
Practice Guide'
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
GN2 SA3 – Work Items
• WI-2 Define and establish the PERT (Sep 04 Dec 04)
• WI-3 Deliver PERT Documentation
– The heart of the PERT documentation will be a
knowledgebase. The knowledgebase will be the source
for a Best Practice Guide (information for network
administrators) and an End User's Guide
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
GN2 SA3 – Work Items
• WI-4 Deliver PERT Ticket System (Sep 04 May 05)
– Enable the PERT (and its customers) to track issues
from when they are raised to when they are resolved.
– An end user will never raise an issue directly with their
NREN. Each PERT ticket will be created from a NREN
ticket that has been escalated to the PERT
– It is expected there will be an interface between the
Ticket System and the Knowledgebase, so that all
PERT cases can be accessed through the
knowledgebase.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
GN2 SA3 – Work Items
• WI-5 Deliver PERT Troubleshooting
Procedures (Sep 04 - Dec 04)
• WI-6 PERT Day to Day Operations (Feb 05 –
end of the project)
– Case Managers (CM) receive the PERT requests. Every
issue is opened by them. Note: CMs funded by GN2.
– When a case managers cannot solve the problems
themselves they will localise the area of the problems
and then contact a Subject Matter Expert (SME). Note:
the SMEs will probably work on a voluntary
(unfunded) basis.
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])
Any Question?
Thank you.
http://www.dante.net/pert
[email protected]
[email protected]
Performance and Enhancement Response Team – TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])