Database Farming
Download
Report
Transcript Database Farming
Database Farming
For Improved Performance
Presented By: Russell Yong
Supervisor: Prof Wentworth
Problem at Hand
Database solution for a large corporation
Expensive software (Oracle Database
Enterprise Edition +- US $40k)
Top-end hardware
Microsoft’s SQL Server 2000
Not same level of confidence
Solution
Adapt the popular technique of backend
server farming
Apply it to databases – to create a high
performance database web service
Backend setup being invisible to the user
Hypothesis
Technique will create a more cost effective
database farm
Eradicate some problems associated with
dealing with large databases
Our Plan
Standard 3-Tier Model
Our Plan
Adapted 3-Tier Model
Conceptually
Pool of Connections
http request
Web-server
DataSet
DataSet
http request
DataSet
DataSet
DataSet
http request
Clients
Multiple Threads
of Execution
Web Service
Farm of Databases
DataSet Object
In-memory cache of data
Comparable to a mini-database
Multiple tables
Relationships
Constraints
DataSet Object
Easily serialized into and back out of XML
Structure (tables, columns, etc) described in
an XML schema
View and manipulate using either relational or
XML methods (unified programming model)
Compatible with other XML speaking
applications
DataSet Object
Disconnected Model
Sub-queries fill individual datasets
Collector Object
Collect and merge individual sub-queries
Returned to the client
Typed DataSet
Has an implicit schema
Allows for more efficient filling
Faster access
Created via Form Designer,
programmatically, or at run time via XSD
XSD File
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="PingDataSet" xmlns=""
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemasmicrosoft-com:xml-msdata">
<xs:element name="PingDataSet" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="PingInfo">
<xs:complexType>
<xs:sequence>
<xs:element name="ip" type="xs:string" />
<xs:element name="seen" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:unique name="PK_PingPrimaryKey" msdata:PrimaryKey="true">
<xs:selector xpath=".//PingInfo" />
<xs:field xpath="ip" />
<xs:field xpath="seen" />
</xs:unique>
</xs:element>
</xs:schema>
Implications for Web-Applications
Resource sensitive approach
“Bulk” approach to communication
Access local cache
Ideal for non-volatile data
Implications for Web-Applications
Optimistic concurrency model
Most applications ?
Improved performance (no locking)
No persistent connection required (resources)
Minimize required server resources
Connections used more effectively
Exceptions are dealt with accordingly
Our Database
Excess of “10 Million Records”
Network traffic information
Partitioned in 10 segments
Initial difficulty
Distributed over 3 machines (SQL Server 2000)
Simulating a completely distributed environment
Data Providers
SQL Server .NET Data Providers
SqlConnection
SqlDataAdapter
SqlCommand
OLE DB .NET Data Providers
ODBC .NET Data Providers (separate
download)
Data Providers
Our Framework
MyQueryHandler
Farming Layer
An instance for each individual user query
Distributor (spawns threads)
Collector, merging DataSets as they return
All encompassing DataSet
Pluggable
MyThreadHandler
Represents individual threads
Fills separate DataSets for each of the
partitions in the farm
Returns DataSet to QueryHandler
Pluggable
Specifying Queries
Couple queries hard-coded
Defined according to a parameter
Future Extensions…
Tests and Results
Ran queries 100 times
Gauge mean
Filter out any possible influencing factors
Influencing factors
Network traffic
Active machines
Testing and Results
Simple query
“SELECT * FROM ping WHERE (ip = 2464643887) OR (ip =
2464643464) OR (ip = '2464639301') OR (ip = '2464625293')”
Returning 11 853 rows
Farming Method
Averaged 35 seconds
Normal Method
Averaged 94 seconds
Testing and Results
01:17.760
01:09.120
01:00.480
00:51.840
00:43.200
00:34.560
00:25.920
00:17.280
00:08.640
00:00.000
Normal Method
Time to Query
1
12 23 34 45
56 67 78 89 100
Query Number
Query Time
Query Time
Farming Method
02:18.240
02:00.960
01:43.680
01:26.400
01:09.120
00:51.840
00:34.560
00:17.280
00:00.000
Time to Query
1
12 23 34 45
56 67 78 89 100
Query Number
“SELECT * FROM ping WHERE (ip = 2464643887) OR (ip =
2464643464) OR (ip = '2464639301') OR (ip = '2464625293')”
Hypothesis
Technique will create a more cost effective
database farm
Eradicate some problems associated with dealing
with large databases
Possible Extensions
Full access to DB via HTTPS
Front-end
Query construction wizard
Investigate partitioning techniques
“Intelligent” querying
Questions ?