Monitoring in the Data Center
Download
Report
Transcript Monitoring in the Data Center
Monitoring and Managing the Data Center
Section 5 - Introduction
© 2006 EMC Corporation. All rights reserved.
本章目标及内容
依赖于存储管理工具的数据监测与管理是本章要讨论的
主要内容。通过对存储的硬件、软件、信息容量、格式、
内容等诸多方面的监测,信息可以得到最优化的管理与应
用。同时,本章还介绍了一些主要的信息管理软件的基础
应用知识。
本章内容包括2个方面:
5.1 数据中心的监测(Monitoring in the Data Center)
5.2 数据中心的管理(Managing in the Data Center)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 2
Section Objectives
Upon completion of this section, you will be able to:
Describe areas of the data center to monitor
Discuss considerations for monitoring the data center
Describe techniques for managing the data center
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 3
Monitoring in the Data Center
Module 5.1
© 2006 EMC Corporation. All rights reserved.
Monitoring in the Data Center
After completing this module, you will be able to:
Discuss data center areas to monitor
List metrics to monitor for different data center
components
Describe the benefits of continuous monitoring
Describe the challenges in implementing a unified and
centralized monitoring solution in heterogeneous
environments
Describe industry standards for data center monitoring
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 5
Monitoring Data Center Components
Client
HBA
Port
HBA
IP
Keep Alive
Port
IP
Network
SAN
Storage Arrays
Health
Capacity
Performance
Cluster
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Security
Storage Systems Architecture - Introduction - 6
Why Monitor Data Centers?
Availability
– Continuous monitoring ensures availability
– Warnings and errors are fixed proactively
Scalability
– Monitoring allows for capacity planning/trend analysis which in turn
helps to scale the data center as the business grows
Alerting
– Administrators can be informed of failures and potential failures
– Corrective action can be taken to ensure availability and scalability
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 7
Monitoring Health
Why monitor health of different components?
– Failure of any hardware/software component can lead to outage of a
number of different components
Example: A failed HBA could cause degraded access to a number of
data devices in a multi-path environment or to loss of data access in a
single path environment
Monitoring health is fundamental and is easily understood
and interpreted
– At the very least health metrics should be monitored
– Typically health issues would need to be addressed on a high priority
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 8
Monitoring Capacity
Why monitor capacity?
– Lack of proper capacity planning can lead to data un-availability and
the ability to scale
– Trend reports can be created from all the capacity data
Enterprise is well informed of how IT resources are utilized
Capacity monitoring prevents outages before they can
occur
– More preventive and predictive in nature than health metrics
Based on reports one knows that 90% of a file system is full and that the
file system is filling up at a particular rate
95% of all the ports have been utilized in a particular SAN fabric, a new
switch should added if more arrays/servers are to be added to the same
fabric
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 9
Monitoring Performance
Why monitor Performance metrics?
– Want all data center components to work efficiently/optimally
– See if components are pushing performance limits or if they are
being under utilized
– Can be used to identify performance bottlenecks
Performance Monitoring/Analysis can be extremely
complicated
– Dozens of inter-related metrics depending on the component in
question
– Most complicated of the various aspects of monitoring
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 10
Monitoring Security
Why monitor security?
– Prevent and track unauthorized access
Accidental or malicious
Enforcing security and monitoring for security breaches is
a top priority for all businesses
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 11
Monitoring Servers
Health
– Hardware components
HBA, NIC, graphic card, internal disk …
– Status of various processes/applications
Capacity
HBA
– File system utilization
HBA
– Database
Table space/log space utilization
– User quota
Server
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 12
Monitoring Servers
Performance
– CPU utilization
– Memory utilization
– Transaction response times
Security
HBA
– Login
HBA
– Authorization
– Physical security
Data center access
© 2006 EMC Corporation. All rights reserved.
Server
Storage Systems Architecture - Introduction - 13
Monitoring the SAN
Health
– Fabrics
Fabric errors, zoning errors
– Ports
Failed GBIC, status/attribute change
SAN
– Devices
Status/attribute Change
– Hardware Components
Processor cards, fans, power supplies
Capacity
– ISL utilization
– Aggregate switch utilization
– Port utilization
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 14
Monitoring the SAN
Performance
– Connectivity ports
Link failures
Loss of signal
Loss of synchronization
Link utilization
SAN
Bandwidth MB/s or frames/s
– Connectivity devices
Statistics are usually a cumulative value of all the port statistics
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 15
Monitoring the SAN
Security
– Zoning
Ensure communication between dedicated sets of ports (HBA and
Storage Ports)
– LUN Masking
Ensure the only certain hosts have access to certain Storage Array
volumes
– Administrative Tasks
Restrict administrative tasks to a select set of users
Enforce strict passwords
– Physical Security
Access to Data Center should be monitored
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 16
Monitoring Storage Arrays
Health
– All hardware components
Front End
Back End
Memory
Disks
Power Supplies
…
– Array Operating Environment
RAID processes
Environmental Sensors
Replication processes
Storage
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 17
Monitoring Storage Arrays
Capacity
– Configured/unconfigured capacity
– Allocated/unallocated storage
– Fan-in/fan-out ratios
Performance
– Front End utilization/throughput
– Back End utilization/throughput
– I/O profile
– Response time
– Cache metrics
Storage
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 18
Monitoring Storage Arrays
Security
– LUN Access
Ensure the only certain hosts have access to certain Storage Array
volumes
Disallow WWN spoofing
– Administrative tasks
Most arrays allow the restriction of various array configuration tasks
Device configuration
LUN masking
Replication operations
Port configuration
– Physical Security
Monitor access to data center
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 19
Monitoring IP Networks
Health
– Hardware Components
Processor cards, fans, Power Supplies, ...
– Cables
Performance
– Bandwidth
– Latency
– Packet Loss
IP
– Errors
– Collisions
Security
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 20
Monitoring the Data Center as a Whole
Monitor data center environment
– Temperature, humidity, airflow, hazards (water, smoke, etc.)
– Voltage – power supply
Physical security
– Facility access (Monitoring cameras, access cards, etc.)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 21
End-to-End Monitoring
Client
HBA
Port
HBA
IP
IP
Network
Keep Alive
Port
SAN
Storage Arrays
Single Failure
Multiple Symptoms
Root Cause Analysis
Cluster
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Business Impact
Storage Systems Architecture - Introduction - 22
Monitoring Health: Array Port Failure
H1
Degraded
HBA
HBA
SW1
H2
Degraded
HBA
Port
HBA
Port
SW2
Storage Arrays
H3
Degraded
HBA
HBA
SAN
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 23
Monitoring Health: HBA failure
Degraded
H1
HBA
HBA
SW1
H2
HBA
Port
HBA
Port
SW2
Storage Arrays
H3
HBA
HBA
SAN
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 24
Monitoring Health: Switch Failure
SW1
Port
Port
All Hosts
Degraded
Port
Port
SW2
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Storage Arrays
SAN
Storage Systems Architecture - Introduction - 25
Monitoring Capacity: Array
New Server
SAN
SW1
Storage Array
Port
Port
SW2
Port
Port
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Can the Array provide the required
storage to the new server?
Storage Systems Architecture - Introduction - 26
Monitoring Capacity: Servers File System Space
No Monitoring
FS Monitoring
File System
File System
Extend FS
Warning: FS is 66% Full
Critical: FS is 80% Full
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 27
New Server
H4
SW1
Port Util. %
Monitoring Performance: Array Port Utilization
SW2
Port
HBA
HBA
H1
HBA
HBA
H2
H3
100%
H1 + H2 + H3
HBA
HBA
Port
Port
HBA
HBA
SAN
Storage Arrays
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Module Title - 28
Monitoring Performance: Servers
Critical: CPU Usage above 90% for
the last 90 minutes
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 29
Monitoring Security: Servers
Login 1
Login 2
Login 3
Critical: Three successive
login failures for username
“Bandit” on server “H4”,
possible security threat
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 30
Monitoring Security: Array – Local Replication
SAN
SW1
Storage Array
Port
WG2
Workgroup 2 (WG2)
Port
SW2
Replication
CMD
Workgroup 1 (WG1)
© 2006 EMC Corporation. All rights reserved.
Port
WG1
Port
Warning: Attempted replication
of WG2 devices by WG1 user –
Access denied
Storage Systems Architecture - Introduction - 31
Monitoring: Alerting of Events
Warnings require administrative attention
– File systems becoming full
– Soft media errors
Errors require immediate administrative attention
– Power failures
– Disk failures
– Memory failures
– Switch failures
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 32
Monitoring: Challenges
EMC
Hitachi
Storage Arrays
NetApp
CAS
NAS
HP
DAS
IBM
SAN
Cisco
TLU
SUN
Network
Servers
McData
MF
UNIX
WIN
Databases
Oracle
Informix
© 2006 EMC Corporation. All rights reserved.
SAN
Applications
IP
Brocade
MS SQL
Storage Systems Architecture - Introduction - 33
Monitoring: Ideal Solution
Monitoring/Management
One UI
Engine
Storage Arrays
Storage Arrays
Network
CAS
NAS
Servers, Databases,
DAS
SAN
Applications
TLU
Servers
Network
MF
UNIX
WIN
SAN
Databases
© 2006 EMC Corporation. All rights reserved.
IP
Applications
Storage Systems Architecture - Introduction - 34
Without Standards…
No common access layer between
managed objects and applications –
vendor specific
No common data model
Network Management
Applications Management
No interconnect independence
Multi-layer management difficulty
Legacy systems can not be
accommodated
No multi-vendor automated discovery
Policy-based management is not
possible across entire classes of
devices
© 2006 EMC Corporation. All rights reserved.
Host Management
Storage Management
Database Management
Interoperability!
Storage Systems Architecture - Introduction - 35
Simple Network Management Protocol (SNMP)
SNMP
– Meant for network management
– Inadequate for complete SAN Management
Limitations of SNMP
– No Common Object Model
– Security - only newer SAN devices support v3
– Positive response mechanism
– Inflexible - No auto discovery functions
– No ACID (Atomicity, Consistency, Isolation, and Durability) properties
– Richness of canonical intrinsic methods
– Weak modeling constructs
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 36
Storage Management Initiative (SMI)
Created by the Storage Networking
Industry Association (SNIA)
Integration of diverse multi-vendor
storage networks
Management Application
Development of more powerful
management applications
Integration Infrastructure
Object Model Mapping Vendor Unique Features
Common interface for vendors to
develop products that incorporate the
management interface technology
Key components
–
–
–
–
–
–
–
Inter-operability testing
Education and collaboration
Industry and customer promotion
Promotions and demonstrations
Technology center
SMI specification
Storage industry architects and
developers
© 2006 EMC Corporation. All rights reserved.
SMI-S
Interface
•Platform Independent
•Distributed
•Automated Discovery
•Security
•Locking
•Object Oriented
CIM/WBEM
Technology
Tape Library
Switch
Array
Many Other
MOF
MOF
MOF
MOF
SNIA’s SMI-S
Storage Systems Architecture - Introduction - 37
Standard
Object
Model per
Device
Vendor
Unique
Function
Storage Management Initiative Specification
(SMI-S)
Based on:
– Web Based Enterprise
Management (WBEM) architecture
– Common Information Model (CIM)
Features:
– A common interoperable and
extensible management transport
– A complete, unified and rigidly
specified object model that
provides for the control of a SAN
– An automated discovery system
Management
Graphical User
Storage Resource Management
Performance
Capacity Planning
Removable Media
Management Tools
Container Management
Volume Management
Media Management
Other
Users
Data Management
File System
Database Manager
Backup and HSM
Storage Management Interface Specification
Managed Objects
Physical Components
Removable Media
Tape Drive
Disk Drive
Robot
Enclosure
Host Bus Adapter
Switch
Logical Components
Volume
Clone
Snapshot
Media Set
Zone
Other
– New approaches to the application
of the CIM/WBEM technology
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 38
Common Information Model (CIM)
Describes the management of data
Details requirements within a domain
Information model with required syntax
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 39
Web Based Enterprise Management (WBEM)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 40
Enterprise Management Platforms (EMPs)
Graphical applications
Monitoring of many (if not all) data center components
Alerting of errors reported by those components
Management of many (if not all) data center components
Can often launch proprietary management applications
May include other functionality
– Automatic provisioning
– Scheduling of maintenance activities
Proprietary architecture
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 41
Monitoring in the Data Center – Summary
Key concepts covered in this module are:
It is important to continuously monitoring of data center
components to support the availability and scalability
initiatives of any business
– Components include the server, SAN, network, and storage arrays
The four areas of monitoring:
–
–
–
–
Health
Capacity
Performance
Security
There are attempts to define a common monitoring and
management model
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 42
Apply Your Knowledge
Upon completion of this topic, you will be able to:
Describe how EMC ControlCenter can be used to monitor
the Data Center
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 43
EMC ControlCenter Architecture
User Interface Tier
• Console (many)
• Optional applications
Agent Tier
• Master Agent (1)
• Application Agents (many)
Infrastructure Tier
• Server (one)
• Repository (one)
• Store (many)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 44
EMC ControlCenter Console
Primary interface through which the storage environment is
viewed and managed
Java-based application supported on Windows and Solaris
platforms
Objects managed by various agents are organized into groups
such as Storage, Hosts, and Connectivity
Information about an object can be retrieved by the Console
from the Repository or in real-time directly from the agent
Any command issued for the object is passed from the
Console to the ControlCenter Server and handled
appropriately
There can be several Consoles spread across the network
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 45
EMC ControlCenter Server
ControlCenter Server is the primary interface between the Console and the
ControlCenter infrastructure
ControlCenter Server provides a diverse collection of services including:
– Web Applications Server – used for installing the Java Console
– Security and access management, such as licensing, login, authentication, and
authorization
– Communication with the Console
– Alert and event management
– Real-time statistics
– Object management to maintain a list of managed objects
– Agent management to maintain a list of available agents
ControlCenter Server retrieves data from the Repository for display by the
Java and Web Console
User initiated real-time data requests from some agents, are also handled
by the ControlCenter Server
Balances Agent to Store communication based on workload
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 46
EMC ControlCenter Repository
Licensed, embedded Oracle 9i database that holds
current and historical information about the managed
environment
ControlCenter Server executes transactions on the
Repository to retrieve information requested by the
Console
Store(s) populate the Repository with persistent data from
the agents
Repository requires minimal user interaction or
maintenance. The database has restricted access and
can be updated only by ControlCenter applications
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 47
EMC ControlCenter Store
Store receives the data sent by the agents, processes the
data and updates the Repository
There can be multiple Stores in the environment,
providing load balancing, scaling, and failover
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 48
EMC ControlCenter Agents
Master agent:
– One per host
– Manages other agents on the host – start/stop,
monitor agent status and health
ControlCenter Agents:
– Runs on hosts to collect data and monitor object
health
– Generate alerts
– Multiple agents can exist on a host
– Passes information to the ControlCenter Store and
the ControlCenter Server.
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 49
EMC ControlCenter Support for Storage Arrays
The following Storage Arrays are supported by EMC ControlCenter
EMC Symmetrix
EMC CLARiiON
EMC Centera
EMC Celerra and Network Appliances NAS servers
EMC Invista
Hitachi Data Systems (including the HP and Sun resold versions)
HP Storageworks
IBM ESS
SMI-S (Storage Management Initiative Specification) compliant
arrays
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 50
EMC ControlCenter support for SAN Devices
The following SAN devices are supported by ControlCenter
EMC Connectrix
Brocade
McData
Cisco
Inrange (CNT)
IBM Blade Server (IBM-branded Brocade models only)
Dell Blade Server (Dell-branded Brocade models only)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 51
EMC ControlCenter Support for Hosts
The following hosts are supported by ControlCenter
Dedicated Host agents
–
–
–
–
–
–
–
Microsoft Windows
Hewlett-Packard HP-UX
IBM AIX
IBM mainframe
Linux
Novell Netware
Sun Solaris
Proxy management via Common Mapping Agent (CMA)
– Compaq Tru64
– Fujitsu-Siemens BS2000
– Windows, Solaris, AIX, Linux, and HP-UX hosts can also be monitored by
Common Mapping Agent proxy
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 52
EMC ControlCenter Support for Database and Backup
The following databases are supported by ControlCenter
Dedicated database agent
– Oracle
– DB2 on mainframe
Proxy management via Common Mapping Agent (CMA)
–
–
–
–
SQL Server
Sybase
Informix
DB2
Dedicated backup agent
–
–
–
–
EMC EDM
IBM Tivoli
EMC Networker
Veritas Netbackup
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 53
Discovery of Managed Objects by Agents
Automatic Discovery: Many agents discover data objects
automatically
Assisted Discovery: These agents must discover their
objects by administrator action
– Common Mapping Agent
– Database Agent for Oracle
– Fibre Channel Connectivity Agent
– Storage Agents for CLARiiON, Centera, Invista, NAS, SMI, HP
StorageWorks, HDS and ESS
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 54
Data Collection Policies (DCP)
Formal set of statements used to manage the data
collected by ControlCenter agents
Policies specify the data to collect and the frequency of
collection
ControlCenter agents have predefined collection policy
definitions and templates
–
Default definitions can be easily modified, or new definitions can
be created from the templates provided
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 55
Console View of the Storage Environment
SAN Switch
Server
Dual HBAs
WWN of HBAs
Storage Array
Storage Array Front-end
Directors and Ports
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 56
Alerts - Overview
Why Alert? - Data availability
– Monitor and report on events that could lead to application
outages
– Every ControlCenter agent can monitor a number of metrics
30 agents and 700+ alerts
Alert categories
– Health
Examples - Database instance up/down, Symmetrix service
processor down, Connectivity device port status
– Capacity
Examples - File System Space, File/Directory Size Change
– Performance
Examples – Symmetrix Total Hit %, Host CPU Usage
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 57
Alert Notification
Notification capabilities
Messages are directed to the ControlCenter console by
default
Messages can be directed to a Management Framework
via Integration Gateway (SNMP) – governed by
Management Policy associated with the Alert
E-mail notification as specified in the Management Policy
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 58
EMC ControlCenter Console View of Alerts
Message
Object Name
Alert state
© 2006 EMC Corporation. All rights reserved.
Severity
Alert severity
Storage Systems Architecture - Introduction - 59