Download.com
Download
Report
Transcript Download.com
By Shanna Epstein
IS 257
September 16, 2008
Cnet.com
Provides information, tools, and advice to help
customers decide what to buy and how to get the most
out of technology:
News
Reviews
Downloads
Cnet.com
Other Sites
Cnet.com
Bnet.com
Zdnet.com
Gamespot.com
TV.com
Chow
Job Responsibilities
Software Engineer for Trax
Lead team of 2 developers
Responsible for development and maintenance of
product
Support Windows servers
Support php websites and SQL Server 2005 databases
What is Trax?
Industry’s most advanced tool for
Tracking
Analyzing Entertainment Data
Consumer Awareness
Interest and Purchase Intent
Competitive Mindshare
Ad campaign effectiveness
Audience profiles
Editorial coverage
Data Sources
Gamespot (Gamespot Trax)
Gamespot.com
GameRankings.com
Download.com
TV (TV Trax)
tv.com
MovieTome (Trax coming soon)
movietome.com
MP3 – Juke (Trax coming soon)
juke.com (coming out soon)
Metrics
About 20 total
Most important:
Users/Avg. Daily Users (by Sessions)
Page Views (hits of pages)
Searches (execute search for a game)
New Trackers/Total Trackers (track games in Favorites)
Downloads
Videos
Data Processing
Stage servers
process heavy jobs at night and early morning
data comes from various sources
direct database queries
text file feeds
xml feeds
replicate to production early in the morning
replicate to development in off hours
Architecture
Development
Code: developer PCs (php)
Database server: SQL Server2005
Staging (data processing, replication, testing)
Windows server, SQL Server 2005, Apache
2 servers: Gamespot Trax, TV Trax
Production
Windows server, SQL Server 2005, Apache
2 servers: Gamespot Trax, TV Trax
Database Maintenance
Optimization
Jobs scheduled in SQL Server Job Agent consists of
several steps involving:
Job status can be retrieved through queries against the
Master db
DBA can be notified of success/failure through user
interface or emails
1st step checks whether job already completed
successfully today
2nd step checks whether dependent jobs completed
Later steps consist of php scripts (read and process data,
send e-mails, etc) and SQL queries
Controlling Database Processes
Task Manager on Windows Server
monitors CPU usage
SQL Server Activity Monitor - ad hoc monitoring tool
Allows to determine volume and general types of activities
lists user connections , locks, open, and blocked transactions
SQL Query Profiler - query optimizer
allows to trace queries for analysis and catch query
bottlenecks
Logging
Automatic system error logging
Database consists of data and log file
Optimizing Database
Performance
Caching
Saves frequently used information into an easy to get to area (usually
memory)
Reduces disk access, computation (CPU), and speeds up query results
Indexes
Data structure that improves speed of operations on db table
Sorted by key values
Query Syntax
tips and tricks
Good database design
normalization
Data cleaning
Old and unused records
Optimizing query performance
Types of Database Caching
Ad hoc - not cached
Easy to debug, but same query executed every time
Parameterized queries - cached queries
Replace constant literal values by variables, and compiles query plans. If
subsequent query differs in only values of constants, it will match against
auto-parameterized query.
Prepare statements – cached handles
batch text is sent once at "prepare" time. SQL Server responds by returning
handle that can be used to invoke the batch at execute time. At execute
time, a handle and parameter values are sent to server.
Stored procedures - cached parameter sniffing
When stored procedure is compiled for first time, values of parameters supplied with
execution call are used to optimize statements within that stored procedure. This
process is known as "parameter sniffing." If these values are typical, then most calls
to that stored procedure will benefit from stored procedures.
Optimizing Query Performance
Indexes
Clustered
Defines physical storing of rows (reorders how records are physically
stored)
Each table can have one clustered index
If clustered index not defined, usually Primary Key becomes clustered
index automatically
Non-Clustered
logical order of the index does not match the physical stored order of the
rows on disk
Contains references to sorting
Slows down modification and insertion process, so keep to a minimum
Good candidates for non-clustered indexes are those columns frequently
used in WHERE, GROUP BY, and HAVING
SQL Server’s Tuning Wizard
recommends indexes after running SQL Server Profiler (query optimization tool that
traces queries)
Optimizing Query Performance
Syntax tricks
Avoid joining tables on non-indexed columns
Most expensive operation SQL Server can do
Specify columns in select statements instead of
SELECT *
Try to avoid DISTINCT when possible
Restrict with WHERE clauses
Select TOP number of rows
Experiment with switching ORDER BY
column positions
Optimizing Query Performance
(Syntax tricks - cont.)
Union ALL faster than Union
UNION ALL does not look for duplicate rows, and
UNION statement does look for duplicate rows, whether
or not they exist
SET NOCOUNT ON – stored procedures
Stops message indicating number of rows and reduces
network traffic because message with number of rows
retrieved is not displayed to client
Backup and Restore Procedures
IT backups
Stored on tape
Ship to offsite
Backup requested when needed
SQL Server runs nightly backups
Backups stored on server and replaced by new versions
Types of files backed up:
Master file stores system information, user logons, user
permissions, etc.
Msdb file stores job content and scheduling information
Other database files
Conclusion
Questions?