Transcript Document

Application Performance
Monitoring
One Approach
John Slobodnik
April 18, 2006
1:30 p.m.
CMG Canada
Introduction of Product Suite
• ServerVantage
• ApplicationVantage
• ClientVantage
• VantageAnalyzer
• VantageView
ServerVantage (SV)
• Collects “server” level data.
– Multiplatform: Windows, LINUX, UNIX, etc.
– CPU, memory, disk, network out of the box.
• Collects “application” level data.
– Oracle, SQL server, WebLogic, IPlanet, LDAP, etc.
• One SV agent installed on each client server.
– Runs most of the time.
• Customized counters (metrics) can be written.
ApplicationVantage (AV)
• A sniffer.
– Agent-based application analysis of packet level
communications.
• Gathers all traffic that passes through the Network
Interface Cards (NICs).
• Can merge the data together from multiple servers.
– Can trace, for example, SQL server traffic.
• One AV agent is installed per client server.
– Turned on when required.
– Most often in firefighting mode.
ClientVantage (CV)
• Gathers data on the performance of your
application.
– Done through timings of synthetic business
transactions on CV workstations (robots).
• Scripting of business transactions done with a
tool called QARun.
– We are doing active monitoring.
– There are two other options available here now:
• Passive monitoring using CV
• A hardware-based solution
Vantage Analyzer for J2EE (VA)
• J2EE (Java) based tool to help pinpoint exact locations of
code-level performance problems.
– Locates slow methods, SQL statements and transactions.
• The VA agent runs inside your Application servers JVM
obtaining performance metrics using Byte Code
Instrumentation.
– Data sent in real-time to nucleus server where it is stored and
distributed to VA performance consoles.
• Supports WebLogic, Websphere, etc.
• Customized component
– Allows a transaction to be followed in VantageView.
VantageView (VV)
• Web-based portal for viewing SV, CV, AV and VA
•
•
•
data for monitoring and reporting.
Accessing information from the Vantage suite of
tools, VantageView users check the status of
clients, servers and networks from their intranet
while also providing a near-time service level
perspective on application availability and
performance.
The flexibility of VantageView enables different
levels of users to view pertinent information for
easy problem determination and resolution.
Customized counters (metrics) can be created in
the VV database.
VantageView
Control Servers
ServerVantage
Database
ClientVantage
1
Database
ClientVantage
2
ApplicationVantage
Vantage Analyzer
Nucleus
Vantage Analyzer
SQL DB
Database
Database
Database
Monitored Servers
`
`
`
`
`
`
`
`
`
Application #1
SV Agents
Application #2
SV Agents
`
Application #2 Application #3
CV Agents
CV Agents
`
`
Application #1
CV Agents
Application #1
AV Agents
Application #2
AV Agents
Application #1
VA Agents
A Few Easy Setup Steps
• A summary of the steps to implement the
solution:
–
–
–
–
Install the agents.
Complete Administration
Set Preferences
Management
• Create tasks and apply blackout schedules.
– Create monitoring views.
– Create reports.
• Optional steps taken
– Create dashboards.
– Create custom counters (metrics).
Install the Agents
 This
is a quick, procedural task that
is quick to complete.
– A script is run to do the install followed
by any applicable patches.
 The
product keeps track of the level
of agent installed on each server in a
central repository.
 SV and AV agents are installed on
each server (Window, UNIX, LINUX)
Administration –
Configure Databases
Set up the online database(s).
Configure Historical Database
Define the historical database.
We keep 3 months of data online.
All else goes to historical database.
Control Server Configuration
Set up the control servers.
Define Users
Define VV user profiles.
Preferences
Business applications
4 applications.
Business locations
Various
Canadian cities.
Business transactions
An application (29
transactions) is broken
down into 3 transaction
groupings (14, 7, 8).
Server groupings
Production, preproduction, support,
third-party, etc.
Management - Create Tasks
Create a new task.
Create Tasks
Select the type of server: Windows, UNIX, etc.
Create Tasks
Select the counters you wish to see.
Create Tasks
Add a rule for alerting.
Create Tasks
Set up alerts if you want them. For example: System Thrashing, TCP
Connectivity lost from WL to WL layer, CPU > 90%, etc.
Create Tasks
Alert notification via pager, email, SNMP, etc.
Different audiences for different tasks, DBAs, App. Support, etc.
If you can do it from a command line it can be automated here: shell scripts, bat files.
Perform an action based on a threshold being breached.
(1) Kick off a WL thread dump based on a WL counter below a certain level.
(2) Send an alert based on an ASCII pattern match.
(3) Previous problems can be proactively addressed with this type of instrumentation.
We examine WebLogic logs
Create Tasks
Select the appropriate data sampling interval.
Key to the size of your database.
Create Tasks
Start the task.
Management – Blackout Schedule
• Apply a blackout schedule, if applicable.
– ServerVantage agents do not run when the
application is down daily.
• Client Vantage robots are also set up to
run on a blackout schedule.
– Implemented through CV which uses the
Windows scheduler.
Create Monitoring Views
 Monitoring views contain all data points.
 Flexible: you can plot many different
metrics on the same chart.
 Business metric vs. server performance.
 Application metrics vs. server metrics.
 TeeChart Editor gives you Excel chart
type functionality to modify the look of the
chart.
Monitoring View
Saved as a permanent monitoring view report.
Monitoring View
Monitoring (ad hoc)
Can drill into data point.
Drill into IDP
Intelligent Data Point (IDP)
Create Reports
• Reports contain different levels of
data summarization.
• From all data points to daily
average.
• We have created 12 hour, 2 day, 1
week and 1 month views of all
reports.
• Flexible: you can plot many different
metrics on the same chart.
• TeeChart Editor gives you Excel
chart type functionality to modify the
look of the chart.
Create Reports
Select the metric source.
Create Reports
Select the metric(s) desired.
Create Reports
Select the time range.
Create Reports
Select the display format.
Create Reports
Schedule the report.
Create Reports
Save the report.
Reports
Reports
Reports
Then the Business asked…
How can we prove that the API calls are
performing better?


Custom program installed on WL servers.
Gathers API call response time data, converts it to a
local CSV file, FTP to VV database.


API Response Time report created, queries VV DB.
APIs split between internal vs. outsourced (for reporting
purposes).
 There are a number of activities within each bean
conversation.
API Response Time Report
Sample bean conversation report.
Then Management said…

We need to have a some different
dashboard views.
Each level of dashboard gets more detailed.
 Special dashboard for outsourced
infrastructure.


Dashboards were created using the
integrated VISIO (Vantage Visualizer)
piece of the product.
Management Dashboard
Drill down to Application Availability
Application Availability (bottom)
Drill down to Heat Chart
Drill down to CNS report
Drill down to Application Scorecard
Application Scorecard (bottom)
Drill down to Transaction Scorecard
Drill down to Performance Summary
Drill down to Orders Report
Drill down to Session Current
Count report
Drill down to WL Serviced
Requests report
Geographic Dashboard
We asked ourselves…

How can we make this easier to support?

Customized metrics can be created in VV or SV.


Make non-standard types of metrics available.
Samples of some of the customization created:







Disk usage of SV logs files directory.
Automated removal of SV log files.
Automate push of patches to all agents.
Send a command to run on a server and return the result.
Count the number of SV datafiles.
Agent restart.
Gather SV log files.
More Customization


TCP Connection test from WL layer to WL layer.
Number of Orders.


ASCII file pattern match in WL logs (3).



Customer purchase at store experience.
Individual transaction timings is CV, adding them up is custom.
Network Test / TCP Connection Test


Automatic thread dumps WebLogic.
Average Elapsed Time


SQL query to xml to csv to VV DB.
Traceroute response time for up to 10 hops & alert.
API Response time monitoring.

Average, max, min, std dev
Network Connection Test
Vantage Analyzer
• Installed on production WebLogic servers
during the peak annual sales period.
• Now in the pre-prod environment.
– So bugs can be found before promoting new
code to production.
J2EE JavaScape
Paints a landscape view of your J2EE environment. This view displays component
interactions between JSPs, Servlets and Web services, Session, Entity and
Message-driven Beans, as well as database usage.
Transaction Explorer
The tree is organized by the largest consumers, from top to bottom. The
tree can be sorted by the CPU or Transaction time period.
Transaction Scope
Gives a detailed view on each individual transaction which runs through
your application.
Stalled Threads
Shows thread-level detail of a transaction.
Method HotSpots
Identifies the biggest consumers in your application. The view can be
sorted by Transaction or CPU time.
SQLyzer HotSpots
Lets you pinpoint the largest SQL consumers.
SLA Monitoring
This view displays pre-configured SLA rules and when they were last
violated.
Memory HotSpots
Locate memory leaks as well as memory allocation hot spots to help
assist with server availability and performance.
Summary
Management extremely pleased.
Customized dashboards, peak period success, want more
application’s instrumented.
Business application ran almost 99.9% availability
during peak processing period of the year in large
part due to this solution.
Now instrumented to be more proactive than in the past.
Being used as a model for the rest of the enterprise.
Support teams have embraced the solution
because it makes their lives easier.
DBA’s, application support, system administrator’s,
performance and capacity planners, etc.
Significantly less time wasted determining whose
problem it is (you know, 6 teams in a room…)
during fire-fighting.
[email protected]
(905) 282-3342