Monitoring in the Data Center

Download Report

Transcript Monitoring in the Data Center

Monitoring and Managing the Data Center
Section 5 - Introduction
© 2006 EMC Corporation. All rights reserved.
本章目标及内容
依赖于存储管理工具的数据监测与管理是本章要讨论的
主要内容。通过对存储的硬件、软件、信息容量、格式、
内容等诸多方面的监测,信息可以得到最优化的管理与应
用。同时,本章还介绍了一些主要的信息管理软件的基础
应用知识。
本章内容包括2个方面:
5.1 数据中心的监测(Monitoring in the Data Center)
5.2 数据中心的管理(Managing in the Data Center)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 2
Section Objectives
Upon completion of this section, you will be able to:
 Describe areas of the data center to monitor
 Discuss considerations for monitoring the data center
 Describe techniques for managing the data center
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 3
Monitoring in the Data Center
Module 5.1
© 2006 EMC Corporation. All rights reserved.
Monitoring in the Data Center
After completing this module, you will be able to:
 Discuss data center areas to monitor
 List metrics to monitor for different data center
components
 Describe the benefits of continuous monitoring
 Describe the challenges in implementing a unified and
centralized monitoring solution in heterogeneous
environments
 Describe industry standards for data center monitoring
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 5
Monitoring Data Center Components
Client
HBA
Port
HBA
IP
Keep Alive
Port
IP
Network
SAN
Storage Arrays
Health
Capacity
Performance
Cluster
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Security
Storage Systems Architecture - Introduction - 6
Why Monitor Data Centers?
 Availability
– Continuous monitoring ensures availability
– Warnings and errors are fixed proactively
 Scalability
– Monitoring allows for capacity planning/trend analysis which in turn
helps to scale the data center as the business grows
 Alerting
– Administrators can be informed of failures and potential failures
– Corrective action can be taken to ensure availability and scalability
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 7
Monitoring Health
 Why monitor health of different components?
– Failure of any hardware/software component can lead to outage of a
number of different components
 Example: A failed HBA could cause degraded access to a number of
data devices in a multi-path environment or to loss of data access in a
single path environment
 Monitoring health is fundamental and is easily understood
and interpreted
– At the very least health metrics should be monitored
– Typically health issues would need to be addressed on a high priority
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 8
Monitoring Capacity
 Why monitor capacity?
– Lack of proper capacity planning can lead to data un-availability and
the ability to scale
– Trend reports can be created from all the capacity data
 Enterprise is well informed of how IT resources are utilized
 Capacity monitoring prevents outages before they can
occur
– More preventive and predictive in nature than health metrics
 Based on reports one knows that 90% of a file system is full and that the
file system is filling up at a particular rate
 95% of all the ports have been utilized in a particular SAN fabric, a new
switch should added if more arrays/servers are to be added to the same
fabric
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 9
Monitoring Performance
 Why monitor Performance metrics?
– Want all data center components to work efficiently/optimally
– See if components are pushing performance limits or if they are
being under utilized
– Can be used to identify performance bottlenecks
 Performance Monitoring/Analysis can be extremely
complicated
– Dozens of inter-related metrics depending on the component in
question
– Most complicated of the various aspects of monitoring
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 10
Monitoring Security
 Why monitor security?
– Prevent and track unauthorized access
 Accidental or malicious
 Enforcing security and monitoring for security breaches is
a top priority for all businesses
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 11
Monitoring Servers
 Health
– Hardware components
 HBA, NIC, graphic card, internal disk …
– Status of various processes/applications
 Capacity
HBA
– File system utilization
HBA
– Database
 Table space/log space utilization
– User quota
Server
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 12
Monitoring Servers
 Performance
– CPU utilization
– Memory utilization
– Transaction response times
 Security
HBA
– Login
HBA
– Authorization
– Physical security
 Data center access
© 2006 EMC Corporation. All rights reserved.
Server
Storage Systems Architecture - Introduction - 13
Monitoring the SAN
 Health
– Fabrics
 Fabric errors, zoning errors
– Ports
 Failed GBIC, status/attribute change
SAN
– Devices
 Status/attribute Change
– Hardware Components
 Processor cards, fans, power supplies
 Capacity
– ISL utilization
– Aggregate switch utilization
– Port utilization
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 14
Monitoring the SAN
 Performance
– Connectivity ports
 Link failures
 Loss of signal
 Loss of synchronization
 Link utilization
SAN
 Bandwidth MB/s or frames/s
– Connectivity devices
 Statistics are usually a cumulative value of all the port statistics
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 15
Monitoring the SAN
 Security
– Zoning
 Ensure communication between dedicated sets of ports (HBA and
Storage Ports)
– LUN Masking
 Ensure the only certain hosts have access to certain Storage Array
volumes
– Administrative Tasks
 Restrict administrative tasks to a select set of users
 Enforce strict passwords
– Physical Security
 Access to Data Center should be monitored
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 16
Monitoring Storage Arrays
 Health
– All hardware components
 Front End
 Back End
 Memory
 Disks
 Power Supplies
…
– Array Operating Environment
 RAID processes
 Environmental Sensors
 Replication processes
Storage
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 17
Monitoring Storage Arrays
 Capacity
– Configured/unconfigured capacity
– Allocated/unallocated storage
– Fan-in/fan-out ratios
 Performance
– Front End utilization/throughput
– Back End utilization/throughput
– I/O profile
– Response time
– Cache metrics
Storage
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 18
Monitoring Storage Arrays
 Security
– LUN Access
 Ensure the only certain hosts have access to certain Storage Array
volumes
 Disallow WWN spoofing
– Administrative tasks
 Most arrays allow the restriction of various array configuration tasks
 Device configuration
 LUN masking
 Replication operations
 Port configuration
– Physical Security
 Monitor access to data center
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 19
Monitoring IP Networks
 Health
– Hardware Components
 Processor cards, fans, Power Supplies, ...
– Cables
 Performance
– Bandwidth
– Latency
– Packet Loss
IP
– Errors
– Collisions
 Security
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 20
Monitoring the Data Center as a Whole
 Monitor data center environment
– Temperature, humidity, airflow, hazards (water, smoke, etc.)
– Voltage – power supply
 Physical security
– Facility access (Monitoring cameras, access cards, etc.)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 21
End-to-End Monitoring
Client
HBA
Port
HBA
IP
IP
Network
Keep Alive
Port
SAN
Storage Arrays
Single Failure
Multiple Symptoms
Root Cause Analysis
Cluster
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Business Impact
Storage Systems Architecture - Introduction - 22
Monitoring Health: Array Port Failure
H1
Degraded
HBA
HBA
SW1
H2
Degraded
HBA
Port
HBA
Port
SW2
Storage Arrays
H3
Degraded
HBA
HBA
SAN
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 23
Monitoring Health: HBA failure
Degraded
H1
HBA
HBA
SW1
H2
HBA
Port
HBA
Port
SW2
Storage Arrays
H3
HBA
HBA
SAN
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 24
Monitoring Health: Switch Failure
SW1
Port
Port
All Hosts
Degraded
Port
Port
SW2
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Storage Arrays
SAN
Storage Systems Architecture - Introduction - 25
Monitoring Capacity: Array
New Server
SAN
SW1
Storage Array
Port
Port
SW2
Port
Port
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Can the Array provide the required
storage to the new server?
Storage Systems Architecture - Introduction - 26
Monitoring Capacity: Servers File System Space
No Monitoring
FS Monitoring
File System
File System
Extend FS
Warning: FS is 66% Full
Critical: FS is 80% Full
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 27
New Server
H4
SW1
Port Util. %
Monitoring Performance: Array Port Utilization
SW2
Port
HBA
HBA
H1
HBA
HBA
H2
H3
100%
H1 + H2 + H3
HBA
HBA
Port
Port
HBA
HBA
SAN
Storage Arrays
Hosts/Servers with
Applications
© 2006 EMC Corporation. All rights reserved.
Module Title - 28
Monitoring Performance: Servers
Critical: CPU Usage above 90% for
the last 90 minutes
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 29
Monitoring Security: Servers
Login 1
Login 2
Login 3
Critical: Three successive
login failures for username
“Bandit” on server “H4”,
possible security threat
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 30
Monitoring Security: Array – Local Replication
SAN
SW1
Storage Array
Port
WG2
Workgroup 2 (WG2)
Port
SW2
Replication
CMD
Workgroup 1 (WG1)
© 2006 EMC Corporation. All rights reserved.
Port
WG1
Port
Warning: Attempted replication
of WG2 devices by WG1 user –
Access denied
Storage Systems Architecture - Introduction - 31
Monitoring: Alerting of Events
 Warnings require administrative attention
– File systems becoming full
– Soft media errors
 Errors require immediate administrative attention
– Power failures
– Disk failures
– Memory failures
– Switch failures
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 32
Monitoring: Challenges
EMC
Hitachi
Storage Arrays
NetApp
CAS
NAS
HP
DAS
IBM
SAN
Cisco
TLU
SUN
Network
Servers
McData
MF
UNIX
WIN
Databases
Oracle
Informix
© 2006 EMC Corporation. All rights reserved.
SAN
Applications
IP
Brocade
MS SQL
Storage Systems Architecture - Introduction - 33
Monitoring: Ideal Solution
Monitoring/Management
One UI
Engine
Storage Arrays
Storage Arrays
Network
CAS
NAS
Servers, Databases,
DAS
SAN
Applications
TLU
Servers
Network
MF
UNIX
WIN
SAN
Databases
© 2006 EMC Corporation. All rights reserved.
IP
Applications
Storage Systems Architecture - Introduction - 34
Without Standards…
 No common access layer between
managed objects and applications –
vendor specific
 No common data model
Network Management
Applications Management
 No interconnect independence
 Multi-layer management difficulty
 Legacy systems can not be
accommodated
 No multi-vendor automated discovery
 Policy-based management is not
possible across entire classes of
devices
© 2006 EMC Corporation. All rights reserved.
Host Management
Storage Management
Database Management
Interoperability!
Storage Systems Architecture - Introduction - 35
Simple Network Management Protocol (SNMP)
 SNMP
– Meant for network management
– Inadequate for complete SAN Management
 Limitations of SNMP
– No Common Object Model
– Security - only newer SAN devices support v3
– Positive response mechanism
– Inflexible - No auto discovery functions
– No ACID (Atomicity, Consistency, Isolation, and Durability) properties
– Richness of canonical intrinsic methods
– Weak modeling constructs
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 36
Storage Management Initiative (SMI)
 Created by the Storage Networking
Industry Association (SNIA)
 Integration of diverse multi-vendor
storage networks
Management Application
 Development of more powerful
management applications
Integration Infrastructure
Object Model Mapping Vendor Unique Features
 Common interface for vendors to
develop products that incorporate the
management interface technology
 Key components
–
–
–
–
–
–
–
Inter-operability testing
Education and collaboration
Industry and customer promotion
Promotions and demonstrations
Technology center
SMI specification
Storage industry architects and
developers
© 2006 EMC Corporation. All rights reserved.
SMI-S
Interface
•Platform Independent
•Distributed
•Automated Discovery
•Security
•Locking
•Object Oriented
CIM/WBEM
Technology
Tape Library
Switch
Array
Many Other
MOF
MOF
MOF
MOF
SNIA’s SMI-S
Storage Systems Architecture - Introduction - 37
Standard
Object
Model per
Device
Vendor
Unique
Function
Storage Management Initiative Specification
(SMI-S)
 Based on:
– Web Based Enterprise
Management (WBEM) architecture
– Common Information Model (CIM)
 Features:
– A common interoperable and
extensible management transport
– A complete, unified and rigidly
specified object model that
provides for the control of a SAN
– An automated discovery system
Management
Graphical User
Storage Resource Management
Performance
Capacity Planning
Removable Media
Management Tools
Container Management
Volume Management
Media Management
Other
Users
Data Management
File System
Database Manager
Backup and HSM
Storage Management Interface Specification
Managed Objects
Physical Components
Removable Media
Tape Drive
Disk Drive
Robot
Enclosure
Host Bus Adapter
Switch
Logical Components
Volume
Clone
Snapshot
Media Set
Zone
Other
– New approaches to the application
of the CIM/WBEM technology
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 38
Common Information Model (CIM)
 Describes the management of data
 Details requirements within a domain
 Information model with required syntax
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 39
Web Based Enterprise Management (WBEM)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 40
Enterprise Management Platforms (EMPs)
 Graphical applications
 Monitoring of many (if not all) data center components
 Alerting of errors reported by those components
 Management of many (if not all) data center components
 Can often launch proprietary management applications
 May include other functionality
– Automatic provisioning
– Scheduling of maintenance activities
 Proprietary architecture
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 41
Monitoring in the Data Center – Summary
Key concepts covered in this module are:
 It is important to continuously monitoring of data center
components to support the availability and scalability
initiatives of any business
– Components include the server, SAN, network, and storage arrays
 The four areas of monitoring:
–
–
–
–
Health
Capacity
Performance
Security
 There are attempts to define a common monitoring and
management model
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 42
Apply Your Knowledge
Upon completion of this topic, you will be able to:
 Describe how EMC ControlCenter can be used to monitor
the Data Center
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 43
EMC ControlCenter Architecture
User Interface Tier
• Console (many)
• Optional applications
Agent Tier
• Master Agent (1)
• Application Agents (many)
Infrastructure Tier
• Server (one)
• Repository (one)
• Store (many)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 44
EMC ControlCenter Console
 Primary interface through which the storage environment is
viewed and managed
 Java-based application supported on Windows and Solaris
platforms
 Objects managed by various agents are organized into groups
such as Storage, Hosts, and Connectivity
 Information about an object can be retrieved by the Console
from the Repository or in real-time directly from the agent
 Any command issued for the object is passed from the
Console to the ControlCenter Server and handled
appropriately
 There can be several Consoles spread across the network
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 45
EMC ControlCenter Server
 ControlCenter Server is the primary interface between the Console and the
ControlCenter infrastructure
 ControlCenter Server provides a diverse collection of services including:
– Web Applications Server – used for installing the Java Console
– Security and access management, such as licensing, login, authentication, and
authorization
– Communication with the Console
– Alert and event management
– Real-time statistics
– Object management to maintain a list of managed objects
– Agent management to maintain a list of available agents
 ControlCenter Server retrieves data from the Repository for display by the
Java and Web Console
 User initiated real-time data requests from some agents, are also handled
by the ControlCenter Server
 Balances Agent to Store communication based on workload
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 46
EMC ControlCenter Repository
 Licensed, embedded Oracle 9i database that holds
current and historical information about the managed
environment
 ControlCenter Server executes transactions on the
Repository to retrieve information requested by the
Console
 Store(s) populate the Repository with persistent data from
the agents
 Repository requires minimal user interaction or
maintenance. The database has restricted access and
can be updated only by ControlCenter applications
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 47
EMC ControlCenter Store
 Store receives the data sent by the agents, processes the
data and updates the Repository
 There can be multiple Stores in the environment,
providing load balancing, scaling, and failover
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 48
EMC ControlCenter Agents
 Master agent:
– One per host
– Manages other agents on the host – start/stop,
monitor agent status and health
 ControlCenter Agents:
– Runs on hosts to collect data and monitor object
health
– Generate alerts
– Multiple agents can exist on a host
– Passes information to the ControlCenter Store and
the ControlCenter Server.
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 49
EMC ControlCenter Support for Storage Arrays
The following Storage Arrays are supported by EMC ControlCenter
 EMC Symmetrix
 EMC CLARiiON
 EMC Centera
 EMC Celerra and Network Appliances NAS servers
 EMC Invista
 Hitachi Data Systems (including the HP and Sun resold versions)
 HP Storageworks
 IBM ESS
 SMI-S (Storage Management Initiative Specification) compliant
arrays
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 50
EMC ControlCenter support for SAN Devices
The following SAN devices are supported by ControlCenter
 EMC Connectrix
 Brocade
 McData
 Cisco
 Inrange (CNT)
 IBM Blade Server (IBM-branded Brocade models only)
 Dell Blade Server (Dell-branded Brocade models only)
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 51
EMC ControlCenter Support for Hosts
The following hosts are supported by ControlCenter
 Dedicated Host agents
–
–
–
–
–
–
–
Microsoft Windows
Hewlett-Packard HP-UX
IBM AIX
IBM mainframe
Linux
Novell Netware
Sun Solaris
 Proxy management via Common Mapping Agent (CMA)
– Compaq Tru64
– Fujitsu-Siemens BS2000
– Windows, Solaris, AIX, Linux, and HP-UX hosts can also be monitored by
Common Mapping Agent proxy
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 52
EMC ControlCenter Support for Database and Backup
The following databases are supported by ControlCenter
 Dedicated database agent
– Oracle
– DB2 on mainframe
 Proxy management via Common Mapping Agent (CMA)
–
–
–
–
SQL Server
Sybase
Informix
DB2
 Dedicated backup agent
–
–
–
–
EMC EDM
IBM Tivoli
EMC Networker
Veritas Netbackup
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 53
Discovery of Managed Objects by Agents
 Automatic Discovery: Many agents discover data objects
automatically
 Assisted Discovery: These agents must discover their
objects by administrator action
– Common Mapping Agent
– Database Agent for Oracle
– Fibre Channel Connectivity Agent
– Storage Agents for CLARiiON, Centera, Invista, NAS, SMI, HP
StorageWorks, HDS and ESS
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 54
Data Collection Policies (DCP)
 Formal set of statements used to manage the data
collected by ControlCenter agents
 Policies specify the data to collect and the frequency of
collection
 ControlCenter agents have predefined collection policy
definitions and templates
–
Default definitions can be easily modified, or new definitions can
be created from the templates provided
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 55
Console View of the Storage Environment
SAN Switch
Server
Dual HBAs
WWN of HBAs
Storage Array
Storage Array Front-end
Directors and Ports
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 56
Alerts - Overview
 Why Alert? - Data availability
– Monitor and report on events that could lead to application
outages
– Every ControlCenter agent can monitor a number of metrics
30 agents and 700+ alerts
 Alert categories
– Health
Examples - Database instance up/down, Symmetrix service
processor down, Connectivity device port status
– Capacity
Examples - File System Space, File/Directory Size Change
– Performance
Examples – Symmetrix Total Hit %, Host CPU Usage
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 57
Alert Notification
Notification capabilities
 Messages are directed to the ControlCenter console by
default
 Messages can be directed to a Management Framework
via Integration Gateway (SNMP) – governed by
Management Policy associated with the Alert
 E-mail notification as specified in the Management Policy
© 2006 EMC Corporation. All rights reserved.
Storage Systems Architecture - Introduction - 58
EMC ControlCenter Console View of Alerts
Message
Object Name
Alert state
© 2006 EMC Corporation. All rights reserved.
Severity
Alert severity
Storage Systems Architecture - Introduction - 59