Schoolcraft College Data Center

Download Report

Transcript Schoolcraft College Data Center

Tier 3+
1
Title:
Design of a Shared Tier III+ Data Center: A Case Study with Design Alternatives and Selection
Criteria
Abstract:
Schoolcraft College is constructing a High Availability (HA) Data Center that is targeted at an
Uptime Institute Tier Rating of III+. The new center will provide colocation hosting services to the
education, municipal, and commercial communities. The design criteria of this facility include
substantial redundancies in the power, cooling, network, and security areas. For example, the power
infrastructure redundancy includes a continuous-duty natural gas generator that will supplement the
DTE public power grid, such that campus power capacity should remain flat even when this 150+
rack facility is at capacity. The data center is also being designed to give students first-hand exposure
to the skills needed to design and operate such a high-performance facility, without compromising
security or uptime for municipal, commercial, and educational institution customers. The design
criteria will be presented, alternative designs discussed, and final selections presented and
rationalized.
2





Space is money
Reputation is
success or failure
Customer service
must be paramount
PUE drives
profitability/cost
containment
Design flexibility
3






Concurrently maintainable
Redundant Capacity components
Multiple independent distribution paths
One distribution path required to serve computer
equipment at any time
Dual powered IT equipment
Twelve hours of on-site engine generator fuel
storage for N capacity
Additional Tier 4 Features
• Fault tolerant
• N capacity power/cooling available after any infrastructure failure
• Multiple independent active distribution paths
4







100% Uptime
Tier 3+ rated data center
PUE ↠ 1.5
Minimum of N+1 redundancy
in all critical systems
Private power generation
2N back-up power
Carrier neutral with multiple
carriers
System
Design
Basic Co-Location
Space
Power/Cooling
Access to Carriers
Security
And Beyond…
Lease
Equipment
Remote
Hands
DBMS
Disaster
Recovery
Maintenance
OS
Mgmt.
5






High density power – average 5kw/rack
24/7 monitoring
24/7 client access
Latest high efficiency HVAC system
Inert gas primary fire suppression system
Dual authentication
physical security
with biometrics
6








CHP Generator vs. Fuel cell
Single vs. dual utility feeds
# of failover protections
Single vs. dual backup
power
UPS Battery vs. Flywheel
UPS Tier III vs. IV (A+B)
Busway vs. conduit/wire
Branch circuit monitoring
◦ Panel vs. Busway vs. PDU


Transfer switch vs. PLC
failover management
On site load bank vs.
Maintenance service
7
Data Center Power Plant
8
1.) From the Row…
A
B
B
A
B
A
B
A
B
A
B
A
B
B
A
A
B
A
2.) To the Rack…
A
Power Busway
A
Power Busway
B
3.) To inside the Rack…
BusPlug
Dual Corded Server
(Two Power Supplies)
B
Solid State Transfer
Switch
A
Power Strip
(a.k.a. PDU or
Power
Distribution Unit )
Single Corded Device
Back of Rack
9









Single
Single
Single
Single
vs.
vs.
vs.
vs.
Multiple carriers
Multiple entry points (Diverse Entry)
Multiple carrier paths (Diverse Path)
Dual lateral connection per carrier
Entry pathways owned
vs. Carrier owned
Single vs. Dual core
carrier routers to MDF
Single vs. Dual edge
switches (HSRP)
Cisco 6500 series vs.
Nexus with SDN
Carrier Battery plant vs.
Operator A-B UPS
10
11
SCDC and MERIT Network
Relationship
Merit will re-sell SCDC
colocation services
Merit delivers a
150Mbps into the
Applied Sciences via an
AT&T EVC
Merit Networks
services will come into
the SCDC via AT&T or
Level3 “last mile” with
whom they already
have a relationship.
Merit has
approximately 3000
miles of fiber network
in Michigan.
Merit has relationships
and fiber access to area
public school districts
and universities as
potential SCDC clients
12




Cooling 101 – Get air to
front of device to allow
device fans to pull air to
the back
Cooling is largest “non-IT”
power usage
Leakage – Open spaces
create air mixing,
turbulence, loss of
efficiency
Design layout CFD failure
mode verified
13







Area/perimeter (blowing up the
balloon)
Ducted supply and/or ducted
return
Raised floor, In row,
Economizers, etc.
Hot aisle vs. Cold aisle
containment
DX (Air), Glycol, Chilled Water
DX pumped refrigerant with
free cooling below 55 F
Heat exchangers
14





N+1 cooling capacity
Liebert DSe pumped
refrigerant with free
cooling
CFD Validation
Highest efficiency with
hot aisle containment
Typical PUE = 2.0
Cold Supply Air
Hot exhaust air
PUE = Power to Racks
Total D. C. Power
15







24/7 Alarm active
An MCOLES certified PA330 Police
Authority, co-located in the Data
Center building
Dual authentication with Prox card
and Biometrics
24/7 monitored (…by third party)
CCTV mega pixel security cameras
with remote viewing
Motion activated video record with
90 day retention minimum
Non-Clients/Vendors 100% escorted
16







Building vs. Room
UPS vs. Feeder
General vs. First responder
activation
Fire Suppression Activation
Code
requirements…Equipment
servicing room
CRAC’s and IT equipment
vs. CRAC’s only
EPO
NEC
Article 645 - B
Disconnection Means (Emergency Power Off)
Section 645.10 of the 2008 NEC requires
that there be disconnecting means for each
zone in the IT room. Section 645.10 of the
2011 NEC has two alternatives for the
disconnecting means, (A) covers remote
disconnect controls with requirements the
same as the 2008 NEC and (B) covers critical
operations data systems. Critical operations
data systems (defined in 645.2) are
permitted to have alternate disconnecting
means provided that five additional
conditions are met:
(1) An approved shut down procedure has
been established
(2) Qualified personnel are continuously
available 24/7
(3) Smoke sensors are in place.
(4) A fire suppression system is in place.
(5) Plenum cables are used for signaling
◦ First responder only
◦ Equipment servicing room –
CRAC’s – Agent effectiveness
◦ IT Equipment power – optional
◦ Power for lighting & utility
outlets
17



New evaporative particulate
Inert gas FM200/ECARO –
dual detector 165
Dry pipe – Dual action 185
◦ 2 detector active to charge
lines
◦ Pellet melt water zone

First Responder Training
18






Preventative vs. Reactive
How much – Granular view vs. Sensory Overload
Methods & Protocols
◦
◦
◦
◦
SNMP
BACnet
Mod bus
Dry Contact
◦
◦
◦
◦
email
text
phone call
audible alarms
Alerting
Response Policy
Infrastructure HW vs. Network
19



Critical to Uptime and PUE
Hand in Hand with Redundancy




SNMP
BACnet
Modules
BMS
DCIM
Transfer Switch
Facility Power Meter
PUE = Power to Racks
Total D. C. Power
20








Policy Compliance – SSAE16 SOC2,
HIPAA
~100 Control policies with Quality
Control Repository
Operations guide
Risk Analysis & Mitigation Plan xx
points
Disaster Recovery Plan – First
Responder Guide
Employee handbook
DCIM & Asset management
Incident management & Ticketing
System times
21











Preventative
Service-effecting or Non-service-effecting
Notification of Clients (2-3 weeks in advance)
Network and compute redundancy and DR
CRAC’s & Condensers
Primary Transformer
Generator – Switchgear
UPS – Wrap around maintenance bypass
Breakers (ARC Flash) & Coordination
Fire Suppression & EPO
Transfer switches & Control logic
22
Data Center Footprint
23

Academic Program focus:
◦ Data Center Design
◦ Operation Management




Continuing Education Seminars
Teaching lab in data center for hands-on
learning
Lab sponsorships being sought from EMC,
CISCO, HP, Dell, etc.
Focus on latest offerings/technologies
24






A SSAE16 SOC2, HIPAA, PCI compliant facility
Superior Infrastructure
Superior Redundancy
Superior Power
Security
Expertise in Commercial Data Center Design
and Management
25
Add Vblock schematic here
MDF = Main Distribution Facility
LGX = Light Guide Cross-connect – Carriers
BGP
HSRP
26
27
DTE Substation Supplying Data
Center
28
29

Purpose vs. Necessity
◦ Access control – individually locked racks
◦ Layers of security

Styles
◦
◦
◦
◦
◦
None
traditional
wrap
Individually locked racks
impact on floor plan
30

AUP – Acceptable Use Policy
◦
◦
◦
◦
Velcro vs Zip tie
Cable labeling
Blanking plates
Fan direction
31
2
1

Carrier
Fiber
Plant
4
3
A virtualized MDF
Infrastructure
Carrier Neutral
BGP Carrier Redundancy
Edge Router Redundancy
Multi-tenant Cloud
Services
◦ Multiple Carriers Services
Delivery Modes
◦ Ready for BUaaS, BCaaS,
IaaS, etc.
◦
◦
◦
◦
LGX
Carrier
Racks
Data
Center
MDF
ASR/ISR
ASR/ISR
Edge #1
Edge #2
Nexus 5K/7k
Nexus 5K/7k
Fiber Intercon.
Fiber Intercon.
SAN
UCS
Blade
Chassis
CLOUD
UCS
Blade
Chassis
CLOUD
Converged Network and Compute
Edge and
Carrier
Protected
Client
Rack
Client
Rack
Carrier
Protected
Client
Rack
Client
Rack
Carrier
Direct
Client
Rack
32





Colocation services
focused on power,
space, bandwidth, and
physical security
Flexibility –
Independently variable
levels of space, power,
bandwidth, cooling, etc.
Environmental controls
Security
Carrier neutral/multiple
carriers
33