Network Move & Upgrade 2008

Download Report

Transcript Network Move & Upgrade 2008

Network Move & Upgrade 2008/2009:
October 2008
Les CottrellSLAC
for SCCS core services network group (Antonio
Ceseracciu, Jared Greeno,Yee Ting Li, Gary
Buhrmaster),
Presented at the
OU Admin Group Meeting October 16, 2008
www.slac.stanford.edu/grp/scs/net/racks/netmove-oct08.ppt
Why move
• ~ 70 Building Switches connected to old core switch
that has to “move” to seismically retrofitted area
• While at it, replace old, beyond end of life, limited
capability switches to provide better service
Move Types
• Already done: Kavli, MCC, LCLS, SSRL (70 switches, 17 need replacing, will probably
need to re-address later, SSRL decision)
• Migrate: Switch beyond end of life, features missing (auto negotiation, higher
speeds) = replace switch, connect to new core, re-address hosts
– (CGB1), TL1, (WHS), CLA1, CLA2, 280, CL1..2, B267, CGB3
• Move - 1: Switch OK = use same switch but connect to temporary core switch,
readdress later (after April 15th 2009)
– B214, B031, B210, B005, B275, B279, CLR113, CLR224, CLR343, HFB1, HFB2,
MCC-CORE1..2, MCC- WAPCORE1..2, ROB, Research Yard: SWH-RY, B062, B104A,
B113, B121, B124, B128, B211, B225, B231, B420
• Move – 2: Switch beyond end of life etc., but not central responsibility to upgrade =
connect to temporary core switch
– Guest House has 2, PEP ring has 4 but ring de-commissioned at moment
• Move – 3: Switch shares trunk cable, requires long (2 days) workday outage, or
overtime (cost depends on what cables have to move etc., estimating costs
probably $5K (2 technicians for 2 days)
– Guest House 1 &2, ESA, CRYO, IR12, CGB2 (1 day).
– Will send an email to OU Admins with head’s up so can contact and warn users,
get account if need non-working hours and schedule.
Long Outage Switches
• Contact users, group leaders to see if can take outage
in normal work hours or get an account for overtime
(could be $5K), schedule outage
– ESA (21): Tyler Adams (11), Nicholas Arias (2), Rafael Gomez (5),
Zen Szalata (3)
– Cryo (7 hosts): Agustin Burgos (5), Tom Galeto (2)
– IR12 (4 hosts): Tala Cadorna (1), Raymond Lo (3)
• www.slac.stanford.edu/grp/scs/net/racks/slaconly/switches/
gives details of hosts on switches
Experience with Moves
• Moves are easy:
– Each building switch has two (for redundancy) fibre pairs to two
old core routers on to B050 floor 2
– Prepare port in 2 temporary (probably ~ 1 year) switches in
seismically retrofitted area
– Identify pairs and prepare jumpers
– Move backup pair to backup temporary switch
– Move primary pair to primary temporary switch.
– Two ~ 5 second outages, users unlikely to notice.
• No need for detailed coordination with OU admins, users,
can do whenever we get to it etc.
• Could publish a schedule in future to all OU admins, but
will require more effort, scheduling, easier to notify when
done, or 5 mins before do it
Migrations
• Require re-addressing & close coordination
• ID Admins (can be many) & switch ports etc. create web page
documenting what has to be done, addresses, set up tracking tickets
etc.
• Email to admins request them to validate CANDO info and read web
page:
– Three types of hosts: printers, SLAC only, open access to world.
• Meet with admins, explain, schedule time
• Install replacement switch when appropriate, configure
• With each admin, a network tech and a network engineer move
cables one by one from old switch to replacement, re-address host,
check things work etc.
• During or shortly after migration, network engineer will update
CANDO with new IP address.
• To date, have been migrating all of one OU Admin’s machines at a
time.
Migration Experience
• Two switches almost done (CGB1, TL1), elapsed > 1 week
• Difficult, labor intensive, requires lots of coordination,
availability, impacts users
• Problems with devices not being in the documented place,
patch panel labeling being wrong, patch cables not being
long enough
• Be wary of old, non-standard devices
• Devices that have been turned off do not show up on our
spreadsheets
• Takes time to get print queues changed on Windows, but
can be requested in advance
• Will be setting a hard deadline depending on # devices etc.
Lessons learned
• New networks require different subnet mask and default
gateway; make sure this is clear.
• Make sure all devices have an IP assigned in advance to
reduce confusion.
• Confirm which devices should be SLAC Only (IFZ) vs
Public in advance.
• When replacing the switch, can take up to 15 minutes per
device (walk to machine, log in, change IP, request cable
change, test), so be prepared and patient.
• Use ipconfig /registerdns on Windows computers to make
sure Windows DNS gets updated, then test and inform
windows-admin if IP is still wrong.
• Still working on developing automation to change Windows
system IPs.
Progress Temporary switches
CORE3OLD in seismically
retrofitted area – Sep 08
Need reconfig and connect
up & CORE4OLD
CORE4OLD in seismically
retrofitted area – Oct 08
CORE4OLD in place too
Documentation
• See “Seismic Retrofitting Rack Move 2008” site
– https://confluence.slac.stanford.edu/display/NetMan/Seismic+Retrofitting+Rack
+Move+2008
– Contains background information, overview of
procedures, milestones, drill down to lots more details
(tickets, spreadsheets, subnet allocations, hosts on
individual switches etc.)
– This is where to go to get detailed information. It is
very dynamic.
– If you need more, let us know we will add as appropriate
• Email to core-neteng
• There is an FAQ at
https://confluence.slac.stanford.edu/display/NetMan/Frequently+Asked+Questions
New Area
• New area circa Aug
21 ‘08
• New area circa
Oct 15 ‘08
• New area circa
Sep 18 ‘08
Central Routers
• SWH-CORE1&2-OLD in old
racks
CORE3OLD in seismically
retrofitted area – Sep 08
Need reconfig and connect
up & CORE4OLD