Download.com

Download Report

Transcript Download.com

By Shanna Epstein
IS 257
September 16, 2008
Cnet.com
 Provides information, tools, and advice to help
customers decide what to buy and how to get the most
out of technology:
 News
 Reviews
 Downloads
Cnet.com
Other Sites
 Cnet.com
 Bnet.com
 Zdnet.com
 Gamespot.com
 TV.com
 Chow
Job Responsibilities
 Software Engineer for Trax
 Lead team of 2 developers
 Responsible for development and maintenance of
product
 Support Windows servers
 Support php websites and SQL Server 2005 databases
What is Trax?
 Industry’s most advanced tool for
 Tracking
 Analyzing Entertainment Data






Consumer Awareness
Interest and Purchase Intent
Competitive Mindshare
Ad campaign effectiveness
Audience profiles
Editorial coverage
Data Sources
 Gamespot (Gamespot Trax)
 Gamespot.com
 GameRankings.com
 Download.com
 TV (TV Trax)
 tv.com
 MovieTome (Trax coming soon)
 movietome.com
 MP3 – Juke (Trax coming soon)
 juke.com (coming out soon)
Metrics
 About 20 total
 Most important:
 Users/Avg. Daily Users (by Sessions)
 Page Views (hits of pages)
 Searches (execute search for a game)
 New Trackers/Total Trackers (track games in Favorites)
 Downloads
 Videos
Data Processing
 Stage servers
 process heavy jobs at night and early morning

data comes from various sources
 direct database queries
 text file feeds
 xml feeds
 replicate to production early in the morning
 replicate to development in off hours
Architecture
 Development
 Code: developer PCs (php)
 Database server: SQL Server2005
 Staging (data processing, replication, testing)
 Windows server, SQL Server 2005, Apache
 2 servers: Gamespot Trax, TV Trax
 Production
 Windows server, SQL Server 2005, Apache
 2 servers: Gamespot Trax, TV Trax
Database Maintenance
Optimization
 Jobs scheduled in SQL Server Job Agent consists of
several steps involving:
 Job status can be retrieved through queries against the




Master db
DBA can be notified of success/failure through user
interface or emails
1st step checks whether job already completed
successfully today
2nd step checks whether dependent jobs completed
Later steps consist of php scripts (read and process data,
send e-mails, etc) and SQL queries
Controlling Database Processes
 Task Manager on Windows Server
 monitors CPU usage
 SQL Server Activity Monitor - ad hoc monitoring tool
 Allows to determine volume and general types of activities
 lists user connections , locks, open, and blocked transactions
 SQL Query Profiler - query optimizer
 allows to trace queries for analysis and catch query
bottlenecks
 Logging
 Automatic system error logging
 Database consists of data and log file
Optimizing Database
Performance
 Caching
 Saves frequently used information into an easy to get to area (usually
memory)
 Reduces disk access, computation (CPU), and speeds up query results
 Indexes
 Data structure that improves speed of operations on db table
 Sorted by key values
 Query Syntax
 tips and tricks
 Good database design
 normalization
 Data cleaning
 Old and unused records
Optimizing query performance
Types of Database Caching
 Ad hoc - not cached
 Easy to debug, but same query executed every time
 Parameterized queries - cached queries
 Replace constant literal values by variables, and compiles query plans. If
subsequent query differs in only values of constants, it will match against
auto-parameterized query.
 Prepare statements – cached handles
 batch text is sent once at "prepare" time. SQL Server responds by returning
handle that can be used to invoke the batch at execute time. At execute
time, a handle and parameter values are sent to server.
 Stored procedures - cached parameter sniffing
 When stored procedure is compiled for first time, values of parameters supplied with
execution call are used to optimize statements within that stored procedure. This
process is known as "parameter sniffing." If these values are typical, then most calls
to that stored procedure will benefit from stored procedures.
Optimizing Query Performance
Indexes
 Clustered
 Defines physical storing of rows (reorders how records are physically
stored)
 Each table can have one clustered index
 If clustered index not defined, usually Primary Key becomes clustered
index automatically
 Non-Clustered
 logical order of the index does not match the physical stored order of the
rows on disk
 Contains references to sorting
 Slows down modification and insertion process, so keep to a minimum
 Good candidates for non-clustered indexes are those columns frequently
used in WHERE, GROUP BY, and HAVING
 SQL Server’s Tuning Wizard

recommends indexes after running SQL Server Profiler (query optimization tool that
traces queries)
Optimizing Query Performance
Syntax tricks
 Avoid joining tables on non-indexed columns
 Most expensive operation SQL Server can do
 Specify columns in select statements instead of
SELECT *
 Try to avoid DISTINCT when possible
 Restrict with WHERE clauses
 Select TOP number of rows
 Experiment with switching ORDER BY
column positions
Optimizing Query Performance
(Syntax tricks - cont.)
 Union ALL faster than Union
 UNION ALL does not look for duplicate rows, and
UNION statement does look for duplicate rows, whether
or not they exist
 SET NOCOUNT ON – stored procedures
 Stops message indicating number of rows and reduces
network traffic because message with number of rows
retrieved is not displayed to client
Backup and Restore Procedures
 IT backups
 Stored on tape
 Ship to offsite
 Backup requested when needed
 SQL Server runs nightly backups
 Backups stored on server and replaced by new versions
 Types of files backed up:
 Master file stores system information, user logons, user
permissions, etc.
 Msdb file stores job content and scheduling information
 Other database files
Conclusion
 Questions?