Lecture 1 - uniroma1.it

Download Report

Transcript Lecture 1 - uniroma1.it

Lecture One
What’s a Data Centre
and what are its components
Data Centre: what’s this?
• Wikipedia: “A Data Centre is a facility used to house computer
systems and associated components” …
• … better for me: “A Data Centre is a facility used to house … data,
stored in specific electronic devices and accessible by computer
systems suitable for their update, elaboration, transmission”
Data Centre: the components
• Site
• Hardware
• Network Connections
• Logical Security Systems
• System Software
• Application Software
• Data
• People
• Organisation
Data Centre components – Site (1/3)
• “Site” = Building + “Building Plant”
• Building Plant (or Physical Plant) is the machinery used within the
building. Generally:
•
•
•
•
•
•
Power Supply from external providers
Stand alone Power Generators and UPS Systems
HVAC (Heating, Ventilation and Air Conditioning) Systems
Hardware Cooling Systems (in case of unsuitable HVAC)
Fire Protection Systems
Anti-intrusion Systems
Data Centre components – Site (2/3)
• Some good practices about the Building:
• Location
•
•
•
•
Out of the City Centre (traffic, security, …).
Easy access for transportation (big hardware devices).
Possibly not hot climatic conditions.
Good availability for commodities (power, water, …) possibly from different providers
• Design (possibly ad hoc)
• Wide and flat structure (not more than 2-3 floors – possibly only one over the ground).
• Three concentric layers topology: external for transit spaces (products delivery, visitors,
…), medium for personnel, internal for hardware.
• Wide rooms inside with few dividing walls, movable panels – according with security
conditions – for the first two layers. Regular design (i.e. rectangular) possibly with no
divisions at all for the inner layer.
Data Centre components – Site (3/3)
• Some good practices about the Building Plant:
• Not only one provider for Power Supply. Keep Margins for Future Development
(KMFD).
• Evaluate Power Generators and UPS Systems sizing as a function of “stand-alone
time” you need (regular shut-down of all systems). Consider your vital applications
and the possibility of a “minimal sub-system continuity”. Consider the refuelling time
for the power generators. KMFD.
• Possibly separate HVAC servicing personnel rooms from those used for hardware
rooms (different requirements). For huge Data Centres it could be required a
specialised Cooling Systems for hardware (i.e. water flow closed-circuit); the same
could be required for special devices. KMFD.
• Consider different Fire Protection Systems, depending from the presence of
personnel in a room and the evacuation time (gas systems are toxic).
• Consider different level of sophistication for the Anti-intrusion Systems, for the three
layers (i.e. security guards for the external one, badge-protected doors for the
medium, biometric systems for the inner rooms)
Data Centre components – Hardware (1/4)
• In this section we consider, basically, only three “families” of
components:
• Devices for data recording (STORAGE)
• Devices for data elaboration (COMPUTERS)
• Devices for data flow to/from Storage, Computers and Network (SAN/LAN
SWITCHES)
• The Data Centre houses many other hardware components (Network
and Security Systems we’ll se in the next sections, as well as systems
for Building Plant automation, working-stations for the personnel, …)
Data Centre components – Hardware (2/4)
• The age, the past history and the “maturity level” of a Data Centre
produce the “homogeneity level” of its HW components
• Generally the big Data Centres are mostly more than 20-years aged,
and experienced, in their past, numerous applications merges,
substitutions, re-engineering. This history – not defined by a global
initial design, but built through day by day needs – generally leads to
very poor homogeneity levels of HW components.
• Lower is the homogeneity level, greater is the effort (human,
technical and economical as well) to manage the Data Centre.
Data Centre components – Hardware (3/4)
• Generally, in a medium-high size Data Centre, it’s possible to find (in different
ratios) three “families” of computers:
• Mainframes
• Middle-range systems
• Intel based system (i.e. rack mounted 386 servers)
• Similarly, as storage systems, you can find:
• Disk subsystems (with different, but similar architecture and functions)
• Tape subsystems (generally equipped with robotic libraries)
• Solid state systems
• From the size point of view, one of the main figures characterising a Data Centre
is the couple “storage capacity” & “computing power”. However is generally
impossible to represent this figure by a simple couple of numbers, just due to a
low homogeneity level that requires many numbers and many descriptions
related to devices of different hardware architecture.
Data Centre components – Hardware (4/4)
• In the last 10-15 years the hardware (and software) architectures
developed more and more sophisticated “Virtualisation” solutions,
that allow to “map” virtual computer and storage systems on physical
computer and storage devices. With virtualisation a computer or a
storage device must not be dedicated to one specific application, but
can be shared by different applications
• From the quality (maturity) point of view, one of the main
characteristic of a Data Centre is the level of virtualisation achieved.
Try to uncouple as more as possible your applications from a specific
computer and/or storage: it will give you more degrees of freedom in
technical terms and will relieve you from providers lock-in
Data Centre components – Network (1/2)
• The Network – by its nature – is out of the Data Centre, as it’s the
infrastructure aimed to connect the Data Centre with “the rest of the
world”, both internal and external to the organization the Data Centre
operates for (so we can distinguish from an internal and an external
network)
• However Data Centers contain a set of Routers and Switches that
transport data traffic between the servers and to/from the outside
world. The physical connection with the outside world is often
provided by using two or more (to maximize availability) upstream
service providers.
Data Centre components – Network (2/2)
• The network, today, is most often based on the IP protocol suite. So the
Data Centre often contains servers used for running the basic Internet
(external network) and Intranet (internal network) services needed by
external and internal users (i.e. DNS servers). In other cases these services
are out of the Data Centre, committed to the mentioned service providers.
• Also common are controlling and monitoring systems for the network. This
set of systems is generally known as NOC (Network Operations Centre).
• Sometimes the Data Centre contains the NOC. However, for Disaster
Recovery purposes (see later), is a good practice to take it far from the Data
Centre, together with other network servers not duplicated somewhere
else.
Data Centre components – Security Systems
• Logical Security Systems are systems dedicated to control the access to
data and applications, through a complex set of authentication and
authorisation rules. As the great majority of unauthorised accesses comes
from the network (both internal and external), the Security Systems are
lined up the network input/output points and are mostly considered
“network systems”
• The most common systems for security are:
• Firewall: a system used to control the incoming IP traffic at a double level. At IP
packets level (protocol correctness) and at application level (content suitability)
• IDS (Intrusion Detection System): a system used to detect any unauthorised access
• IPS (Intrusion Prevention System): a system used to prevent any unauthorised access
• Similarly to the NOC, is generally present a SOC (Security Operations
Centre), to control and monitor security systems
Data Centre components – Software & Data
• To this point we talked about physical components (building,
machinery, hardware). Now we face with “less physical” items:
Software and Data.
• Two considerations:
• Software is a special kind of Data. It’s a set of “instructions” that can be
interpreted by the computers and used to “elaborate” other data. Software &
Data, indeed, are recorded on storage and read in computers memory in very
similar ways
• Software & Data are “less” physical than the previous components (as they
have not a weight, a colour, a shape …), but they are not merely “logical”
components, as they exist as a physical alteration of electronic circuits
Data Centre components – System Software
(1/2)
• We can classify as “System Software” all the programs executing the “base”
functions of the systems (computers, switches, storage systems, network
and security systems), independently of the applications running on those
systems
• Therefore it’s common to find identical System Software in Data Centres
offering completely different application services (banks, airline companies,
phone service providers, public administration, …)
• System Software is generally delivered by specialised software houses:
IBM, Microsoft, Oracle, …
• In the last years “Open System Software” (Linux based) is gaining a more
and more wide share of the market. Within this area, even if it’s “open”,
anyway some companies based a big business (i.e. Red Hat).
Data Centre components – System Software
(2/2)
• As an example, some of the most common System Software are:
•
•
•
•
The Operating Systems (Windows, IBM OS, Linux, IBM AIX, iOS, …)
The Virtualisation Systems (VMWare, Linux V-Server, …)
The Data Base Management Systems (Oracle, DB2, SQL-Server, …)
The so called “Middleware” and all the Software used to develop application
servers, web servers, etc. (WebSphere, Apache, SAP, JBOSS, …)
• The “Oltp” and “Message Queuing” Systems (IMS, CICS, WebSphere MQ, …)
• … and more and more …
Data Centre components – Application
Software (1/2)
• Unlike the System Software, the Application Software is developed to
deliver a specific Service to a specific set of users.
• Therefore it’s possible to find externally similar Application Software in
Data Centres offering similar services (two different banks, two different
public administration, …), but it’s quite sure that they are completely
different:
•
•
•
•
In many particular functions
In the internal technical architecture
In their performance
… etc.
• … and, generally, we’ll find completely different Application Software in
Data Centres operating in different business areas (i.e. a phone service
provider and an airlines company), with the exception of limited functions
(i.e. e-mail, personnel management, …)
Data Centre components – Application
Software (2/2)
• Unlike the System Software, the Application Software is not delivered
in the same form by some Companies to different customers, because
of the high personalisation they require
• Therefore the Application Software is:
• Developed from one Software House for the single customer, with the
personalised characteristics the customer requires, otherwise …
• … “home-made” by the customer itself (in case with a Software House
external help)
Data Centre components – Data (1/3)
• Data are the most important component of the Data Centre. They
are absolutely unique for the organisation: everything else we saw
above (building, HW, SW) can be replaced, rebought, rebuilt … but it’s
not true for the data. Or, at least, for the great majority of the data.
That’s the reason why (as we’ll see later) data play a lead role in the
Disaster Recovery project.
• Furthermore, data must be protected not only from the risk of loss,
but also from the risk of unauthorised access, that’s even more
frequent and insidious.
Data Centre components – Data (2/3)
• The data in a Data Centre may be categorized by different points of
view:
• By the technical form of their internal structure (sequential, relational data
base, unstructured, …)
• By the type of device they are stored (disk, tape, …)
• By how the applications access to them (read, read-write, online, batch, …)
• By the performance they must guarantee (how many R/W per second, …)
• By the level of relevance they have for the organisation (vital, critical,
important, less important, …)
• By their life-cycle characteristics (how long must they be kept, if must exist
security copies, …)
• … and by many other criteria
Data Centre components – Data (3/3)
• Generally the amount of data in a Data Centre grows year by year at a
very impressive speed. A very common figure is 30%.
• The reasons why we can generally measure such a dramatic grow are
four:
• Applications begin more and more sophisticated and they require more and
more data
• The innate type of data is becoming more and more space consuming (voice,
imagines, videos, …)
• Generally the “cleaning” operations to remove the obsolete data are not at
high priority in the Data Centre policy.
• The cost of storage is not high and is continuously decreasing
Data Centre components – People
• People, for the Data Centre, are more important than data. And that’s true
not only for “ethic” reasons. It can be even rationally justified, because of
the “Knowledge” the people have about the Data Centre itself.
• The knowledge of the Data Centre means to know:
•
•
•
•
What to do (services to users) and how (service levels)
The mission (why must the services be delivered)
The strategy (why the services must & how they can be improved)
The means to operate by (Data Centre components)
• The knowledge cannot be bought. It comes from years and years of
experience. It comes from people and enriches people, in a never ending
virtuous circle.
Data Centre components – Organisation (1/2)
• We told about People that have the Knowledge of the Data Centre:
that’s a strong but, at the same time, a weak point.
• It’s a weak point as the Data Centre People change. They change as
someone goes and some other one comes. And even standing People
often change “inside” (their skill, their wills, their vigour change).
• Therefore the Knowledge must be preserved regardless of People
changes. This can be achieved through the Organisation, in other
terms by Procedures & Documentation.
Data Centre components – Organisation (2/2)
• Procedures must be defined to exactly identify “who does what, when and
how”. Some “improvisation” is ever unavoidable, but the unexpected
situations must be strongly limited
• All the defined Procedures must be well described in an appropriate
Documentation.
• People must cooperate to design procedures and to write documentation
• Documentation must be accessible to people, who must be trained on the
procedures
• Procedures and documentation must be continuously adapted to the Data
Centre changes
Data Centres in the time of Web, Apps, Cloud
… do they make sense yet?
• More and more “smart informatics” on consumer devices (PC’s,
tablets, smartphones, …) accessible through simple Apps, using data
stored “who-knows-where” (Cloud) lead us to consider Data Centres
as “proto-industry products” …
• … but also “smart informatics” needs data … and more and more data
indeed (Big Data)! So we’ll plausibly need bigger and more steady
Data Centres.
• It'll be however unsurprising some decrease in the number of
medium/small private Data Centres, whose activity will be merged
into less bigger public Data Centres (Cloud)
Data Centres actually …
… a few numbers …
• Possible classifications of Data Centres
• The Data Centres costs
… and a few cases …
• 1st case: the Italian Public Administration Data Centres (survey)
• 2nd case: two merging national banks Data Centres (2007)
• 3rd case: Google’s Container Data Centre
Possible classifications of Data Centres (1/4)
• Many different classifications of Data Centres may be defined:
• Dimensional, by:
•
•
•
•
•
•
•
computing or storage capacity
site area extension
number of involved people
number of served users
number of executed transactions per second
costs
… etc.
• Qualitative, on the basis of :
•
•
•
•
energy efficiency
technological up to date
reliability level
… etc.
Possible classifications of Data Centres (2/4)
• The great part of these classifications are context-conditioned (for
example a dimensional classification may significantly differ
considering the set of main 50 US public administrations and the set
of main 50 automotive industries). So their interest is restricted to
single specialized surveys
• However some classifications have a generalized interest and may be
used as a standard. Some examples are:
• The Telecommunications Industry Association's TIA-942 Standard
• The metrics established by The Green Grid Consortium
Possible classifications of Data Centres (3/4)
• TIA-942 Standard defines the minimum requirements for Data
Centres availability, setting four “Tiers”:
• Tier 1 = Non-redundant capacity components (single uplink and servers). For
the Data Centre must be guaranteed a general availability not less then
99.671%
• Tier 2 = Tier 1 + Redundant capacity components. The general availability
guaranteed must be equal to 99.741% or more
• Tier 3 = Tier 2 + Dual-powered equipment and multiple uplinks. The general
availability guaranteed must be equal to 99.982% or more
• Tier 4 = Tier 3 + all components are fully fault-tolerant including uplinks,
storage, chillers, HVAC systems, servers etc. Everything is dual-powered. The
general availability guaranteed must be equal to 99.995% or more
Possible classifications of Data Centres (4/4)
• The Green Grid Consortium is a non-profit, industry consortium of end-users,
technology providers, utility companies, collaborating to improve the resource
efficiency of Data Centres
• The main metrics defined by the GGC is the Power Usage Effectiveness (PUE). It’s
a measure of how efficiently a computer Data Centre uses energy; specifically,
how much energy is used by the computing equipment (in contrast to cooling and
other overhead). PUE is the ratio of total amount of energy used by the Data
Centre at all, to the energy delivered to computing equipment.
• An ideal PUE is 1.0 (while some surveys measured, all over the world, a medium
PUE equal to 1.8). Anything that isn't considered a computing device in a Data
Centre (i.e. lighting, cooling, etc.) falls into the category of facility energy
consumption.
• PUE is the inverse of Data Centre Infrastructure Efficiency (DCIE).
The Data Centres costs
• A Data Centre is a heap of high technology and high-skilled people, so
it’s a generator of significant costs
• The main costs come (in descending order) from:
1.
2.
3.
4.
People
Software
Hardware
Energy
• Inside the hardware the order is: servers, network, storage. Energy
cost is usually higher then the servers cost.
1st Case: the Italian Public Administration Data
Centres (survey)
• The survey has been realized in 2013 and involved about 1.000 Data
Centres
• A dimensional classification based on the site area, showed that:
• Only 1% of the Data Centres sites occupy more than 1.000 m2
• 10% of them are from 100 and 1.000 m2
• The rest is smaller than 100 m2
• Only 7% of the Data Centres is less than 3 years old. The 57% was built
before 2000.
• From an architectural point of view the servers are mainly “rack-type” with
Windows OS. Linux systems follow. Other OS are a minority.
• From the TIA-942 point of view, the 65% of the Data Centres are TIER-1.
2nd case: two merging national banks Data
Centres (2007) – (1/x)
• First Bank – Two main Data Centres:
Monte Bianco DC
Basson DC
2nd case: two merging national banks Data
Centres (2007) – (2/x)
FIRST BANK – MAIN DATA CENTRE:
• Features:
• net surface: 3.000 sqm
• 24 rooms
• just a Data-Center: no offices in the premises (only
Control Room)
• campus design: two distinct ‘Half Campus’
completely independent with separated equipment
(power, cooling, connections…)
2nd case: two merging national banks Data
Centres (2007) – (3/x)
FIRST BANK – a few figures:
• Computers: 6 mainframes (4+2)
• 324 central processors
• more than 40.000 MIPS
• 1.9 Tbyte of central storage
•
•
•
•
•
560 Tbyte of disk storage
1.500 Tbyte of tape storage
Avg online transactions per day: 23 mln
Number of data-base access per day: about 960 mln
Transactions completed in 0.6 seconds: 97%
2nd case: two merging national banks Data
Centres (2007) – (4/x)
• Second Bank – Three main Data Centres + minor others:
A
B
C
2nd case: two merging national banks Data
Centres (2007) – (5/x)
• Second Bank – Three Data Centres (after consolidation and
DR projects):
A
C
B
2nd case: two merging national banks Data
Centres (2007) – (6/x)
SECOND BANK – a few figures:
• Computers: 5 mainframes (3 active + 2 stand-by)
• more than 13.000 MIPS (active)
• 150 Tbyte of disk storage (active) + DR capacity in sites B & C
• 600 Tbyte of tape storage
• Avg online transactions per day: 16 mln
• Transactions completed in 1.0 seconds: 95%
rd
3
Case – Google’s Container Data Centre
An example of CMDF (Containerized and Modular Data Centre Facility)
platform
• Repeatable, pre-engineered, prefabricated, and quality assured set of
building blocks (containers) that together bring online the necessary
amount of IT capacity (computing, storage, network capacity + power
supply, cooling facility, fire control)
▶ Google container data center tour.mp4