ppt presentation

Download Report

Transcript ppt presentation

Modeling Retail Applications
@ a Major Telecom Company
Predictive Analysis in a
Multi-Tier Infrastructure
John Slobodnik
October 21, 2008
CMG Canada
Preparation for Modeling
 Get an application infrastructure
diagram.
 Turn on Solaris Process Accounting.
 Install TeamQuest Manager.
 Install TeamQuest View.
 Gather Key Performance Indicator.
 Perform Workload Characterization
 Perform predictive analysis using
TeamQuest Model.
Infrastructure Diagram
It is important to get this diagram to
understand the infrastructure that this
multi-tier application resides on.
Typically, an application support team is
responsible for keeping these diagrams
up-to-date.
Infrastructure Diagram
Architecture
WL Admin Standby
WL Admin
WebServers
WebLogic Application Cluster
Oracle 10g/RAC DB
Replication LAN
Load Balancer
Load Balancer
Internet
S
TP
HT
WLS
HTTPS
***
WLS
WLS
WLSWLS
Interconnect
Interconnect
HTT
PS
HT
TP
S
Web Portal
Switch
3xWeblogic NodeManager
DMZ
Switch
2xWeblogic NodeManager
SAN
LDAP Cluster
Load Balancer
Internal
User
LDAP Server
Web Portal
WebServers
Outsourced
Vendor
WebLogic
(X6 servers)
Oracle Database Cluster
Billing System
LDAP Server
Switch
Turn on Solaris Process Accounting

Turn on Solaris Process accounting.
 Minimal
additional CPU overhead since the
data is already collected.
 Allows short-running tasks to be captured for
workload characterization.
Normally tasks <0.5 seconds get grouped.
 Certain applications with thousands of short tasks
are prime candidates for this extra level of
accuracy.

Install TeamQuest Manager

Install TeamQuest Manager on at least one server
from each tier of the application architecture.


At least one agent was installed in each of 4 tiers
Customize the TQ database on each server.



Changed retention of 10 minute data to 2 weeks.
Changed retention of 1 minute data to 1 week.
Deactivated reductions.



Requires Process Accounting turned on.
Keep process information for 7 days.
Created a silent install script to install the agent and
customize the database.


Create a script to customize the database (using tqdbu)
with the settings specified in the previous bullet.
Record the silent install script

Syntax

“install.sh –r silentinstallscriptnamehere tqmgr”
Install TeamQuest Manager

Create a specifications file backup for each TQ database daily.

Makes rebuilding the DB, in case of disaster, easier.





The command to create a specifications file called “productionDBspec” is:
teamquesthomedirectory/bin/tqdbu –o productionDBspec
The command to use the specifications file to recreate a new database
is:
teamquesthomedirectory/bin/tqdbu –c productionDBspec
Put disk free space monitoring in place.

With process accounting on a lot of data was gathered on our Oracle
server.


There was barely enough space to keep a week’s worth of data in the
existing filesystem.
Alerts us when there is <20% free space in the filesystem used by the TQ
DB.
Customize TQ Database
Customize TQ Database
Install TeamQuest View
• TQ View was used to ensure consistent
performance across each server.
– This tells us that the workload is consistent and
reliable to use for modeling.
– Data for whole week was analyzed to come up
with the best time frame to use for modeling.
TeamQuest View Data Analysis
Gather Key Performance Indicator


We asked the business what their key
performance indicator (or main business driver
metric) was.
They were tracking these sales numbers by hour
in an Oracle database.
–
–
Using a customized SQL query.
Which you can turn into a “custom metric” and create
historical reports against.
Gather Key Performance Indicator
Workload Characterization
• Purpose: To uniquely identify application-related work that runs on
each server. A pre-requisite for modeling.
• Used TeamQuest View to list all processes that run on each server.
• Identified processes into unique workloads.
– This is the most labor-intensive part of the whole exercise (can take
days or weeks depending upon level of co-operation).
• Requires co-operation of the application experts to help identify processes
which belong to their application.
– Try to keep the number of workloads to as small a number as possible.
• Our goal was to create 2 workloads per server, one for the applicationrelated work and OTHER.
• Define the workload definitions using TeamQuest Manager.
• On each server we created a new “Workload Set” containing a new
“Workload” definition which uniquely identifies application-related activity.
– Left the default “Example” workload set alone.
• “Login =“ uniquely identified application-related work on our Web Services,
authentication, WebLogic, and Oracle servers.
Workload Characterization
Workload Characterization
Using TeamQuest Model


The most important decision to make for modeling is “What
timeframe do I use to base my model upon?”.
 The answer varies upon the peak usage time of the
application from both a system resource and business sales
perspective.
 I use a combination of busiest CPU, I/O and sales to come up
with the timeframe to use.
 This has worked successfully for me using a 1-hour
timeframe to base my modelling upon (5 hour timeframe
as well).
 Stay away from “problem” times.
Then we apply a growth percentage to that timeframe which
equates to what the business said the estimated peak volume
would be at their busiest time of year.
 We frame the growth % (LT & GT 50%).
 If the model did not show any weakness in the
infrastructure at 50% growth we created another model
with enough growth applied to find a weakness.
Using TeamQuest Model

Outcome:
 We have successfully identified the need for an
additional Oracle node in the infrastructure.
 Other outcomes have been:
 Your infrastructure is sufficient to make it
through this years peak period, however, once
the growth from the current state hits 300%
then the Web Services tier will be the
bottleneck, addition of 2 additional servers of
the same build is recommended prior to that
time.
Select data to build the Model
• Select “Generate Input File”
servername
Select data to build the Model
• Fill out time and date and click “Next”
servername
Select data to build the Model
• Confirm Workload Set, click “Next”
servername
Select data to build the Model
• Click “Create Model Input File”
servername
Select data to build the Model
• “Save” the file
servername
Select data to build the Model
• Choose a filename then save.
Select data to build the Model
• Confirmation
servername
TQ Model - Assumptions

TeamQuest was not installed on all the systems in the
environment, so in absence of that data we assume :
 External webservers – The 4 Sun servers are load
balanced.
 WebLogic tier – The 3 Sun servers are load balanced.
The 2 Sun WebLogic instances performs twice the
work as a single WebLogic instance on the larger Sun
server.
 Applications such as iPlanet, WebLogic, and Oracle are
well instrumented.
 The orders are coming from the External Webserver.
TQ Model - Findings,
Recommendations & Results

Findings for multi-tier application
environment:



The number of orders on mm/dd/yyyy from noon until
5 pm was n.
At 300% growth or nn orders from noon till 5 pm, the
CPU in the UNIX web services iPlanet tier is maxed and
the response time is significantly higher than for n
orders, i.e. 382.4% higher.
Recommendations:

Add 2 additional nodes to the external web tier


Plan to add the additional servers in 2009.
Results:

TeamQuest time spent on Model = less than 2 hours
TQ View – CPU Utilization
CPU utilization of all the systems: One day
does not stand out as looking any different
than any other day for CPU & I/O. So, we
chose the afternoon mm/dd/yyyy, 12:0017:00.
We divided the work between application &
non-application (workloads).
TeamQuest Model –
Systems/Tier
TeamQuest Model –Response
Time with 300% growth applied
TQ Model –
CPU Utilization by Workload
Active Resource utilization on
web tier
Active Resource Utilization on
Web tier
Components of Response–3 DB nodes
What if we add 2 servers to
the external web server tier?
What if we model the external
web server on its own?
Frequency of Modeling



During peak time of year for the application and 6
months later (at a minimum).
Prior to and after any major hardware changes to the
infrastructure.
After any major software changes to the infrastructure.


This can be changes to the application code.
Can also be vendor software version change.




New version of WebLogic.
New OS level.
Latest version of Oracle
These happen more frequently, it is not realistic (in my life) to
re-do the exercise monthly.
John Slobodnik
Performance & Capacity Planner
Infrastructure & Technology
CGI
(905) 858-7100 ext. 7355
Mobile: (416) 729-8356
[email protected]
[email protected]