we know all the incoming query templates beforehand

Download Report

Transcript we know all the incoming query templates beforehand

Performance Driven Database Design for
Scalable Web Applications
Jozsef Patvarczki, Murali Mani, and Neil Heffernan
Abstract
Scaling up web applications requires distribution of load
across multiple application servers and across multiple database
servers.
Distributing load across multiple application servers is fairly
straightforward; however distributing load (select and UDI
queries) across multiple database servers is more complex
because of the synchronization requirements for multiple copies
of the data.
Different techniques have been investigated for data
placement across multiple database servers, such as replication,
partitioning and de-normalization.
In this poster, we describe our architecture that utilizes these
data placement techniques for determining an optimum layout
of data. Our solution is general, and other data placement
techniques can be integrated within our system. Once the data
is laid out on the different database servers, our efficient query
router routes the queries to the appropriate database server/(s).
Our query router maintains multiple connections for a
database server so that many queries are executed
simultaneously on a database server, thus increasing the
utilization of each database server. Our query router also
implements a locking mechanism to ensure that the queries on a
database server are executed in order. We have implemented
our solutions in our system, that we call SIPD (System for
Intelligent Placement of Data). Preliminary experimental results
illustrate the significant performance benefits achievable by our
system.
Proposed Solution
• We propose a generic architecture for balancing load
across multiple database servers for web applications;
• There are two core parts of our system:
(a) the intelligent data placement algorithm produces a
layout structure, describing how the data needs to be
laid out across multiple database servers to achieve
better performance and performs the actual layout;
(b) the query router that utilizes the layout structure
produced by the intelligent data placement
algorithm for routing queries and ensuring that the
database servers are being utilized effectively;
• To achieve a better performance, the data placement
algorithm uses the given query workload, the time it takes
to execute a select/UDI query type, and the time it takes to
execute a select/UDI query if the table/(s) are partitioned,
replicated or de-normalized;
Problem Statement
• A characteristic of web applications such as our ASSISTment
system, is that we know all the incoming query templates
beforehand as the users typically interact with the system
through a web interface [1];
•There are thousands of web applications, and these systems
need to figure out how to scale up their performance;
• Web applications typically have a 3-tier architectures
consisting of clients, application, and database server that are
working together [2];
• The increasing load of the database layer can lead to slow
response time, application error, and in the worst case, to
different type of system crashes;
• We have integrated our data placement algorithm and our
query router into a prototype system that we call System
for Intelligent Placement of Data (SIPD).
Future Work
•
•
•
•
Data modeling;
Effective locking mechanisms;
Distributed query processing;
Considering fault tolerance as an application
constrain and handling inconsistencies.
References
1. Tobias Groothuyse, Swaminathan Sivasubramanian, Guillaume Pierre, “Globetp: template-based database
replication for scalable web applications”, WWW07, Banff, Canada, pp. 301-310
2. Jozsef Patvarczki, Shane F. Almeida, Joseph E. Beck, and Neil T. Heffernan,, “Lessons Learned from
Scaling Up a Web-based Intelligent Tutoring System”, ITS2008, Montreal, Canada, pp. 766-770
Contact: Jozsef Patvarczki, [email protected]