Database Farming

Download Report

Transcript Database Farming

Database Farming
For Improved Performance
Presented By: Russell Yong
Supervisor: Prof Wentworth
Problem at Hand
 Database solution for a large corporation
 Expensive software (Oracle Database
Enterprise Edition +- US $40k)
 Top-end hardware
 Microsoft’s SQL Server 2000
 Not same level of confidence
Solution
 Adapt the popular technique of backend
server farming
 Apply it to databases – to create a high
performance database web service
 Backend setup being invisible to the user
Hypothesis
 Technique will create a more cost effective
database farm
 Eradicate some problems associated with
dealing with large databases
Our Plan
Standard 3-Tier Model
Our Plan
Adapted 3-Tier Model
Conceptually
Pool of Connections
http request
Web-server
DataSet
DataSet
http request
DataSet
DataSet
DataSet
http request
Clients
Multiple Threads
of Execution
Web Service
Farm of Databases
DataSet Object
 In-memory cache of data
 Comparable to a mini-database
 Multiple tables
 Relationships
 Constraints
DataSet Object
 Easily serialized into and back out of XML
 Structure (tables, columns, etc) described in
an XML schema
 View and manipulate using either relational or
XML methods (unified programming model)
 Compatible with other XML speaking
applications
DataSet Object
 Disconnected Model
 Sub-queries fill individual datasets
 Collector Object
 Collect and merge individual sub-queries
 Returned to the client
Typed DataSet




Has an implicit schema
Allows for more efficient filling
Faster access
Created via Form Designer,
programmatically, or at run time via XSD
XSD File
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="PingDataSet" xmlns=""
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemasmicrosoft-com:xml-msdata">
<xs:element name="PingDataSet" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="PingInfo">
<xs:complexType>
<xs:sequence>
<xs:element name="ip" type="xs:string" />
<xs:element name="seen" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:unique name="PK_PingPrimaryKey" msdata:PrimaryKey="true">
<xs:selector xpath=".//PingInfo" />
<xs:field xpath="ip" />
<xs:field xpath="seen" />
</xs:unique>
</xs:element>
</xs:schema>
Implications for Web-Applications




Resource sensitive approach
“Bulk” approach to communication
Access local cache
Ideal for non-volatile data
Implications for Web-Applications
 Optimistic concurrency model






Most applications ?
Improved performance (no locking)
No persistent connection required (resources)
Minimize required server resources
Connections used more effectively
Exceptions are dealt with accordingly
Our Database
 Excess of “10 Million Records”
 Network traffic information
 Partitioned in 10 segments
 Initial difficulty
 Distributed over 3 machines (SQL Server 2000)
 Simulating a completely distributed environment
Data Providers
 SQL Server .NET Data Providers
 SqlConnection
 SqlDataAdapter
 SqlCommand
 OLE DB .NET Data Providers
 ODBC .NET Data Providers (separate
download)
Data Providers
Our Framework
MyQueryHandler






Farming Layer
An instance for each individual user query
Distributor (spawns threads)
Collector, merging DataSets as they return
All encompassing DataSet
Pluggable
MyThreadHandler
 Represents individual threads
 Fills separate DataSets for each of the
partitions in the farm
 Returns DataSet to QueryHandler
 Pluggable
Specifying Queries
 Couple queries hard-coded
 Defined according to a parameter
 Future Extensions…
Tests and Results
 Ran queries 100 times
 Gauge mean
 Filter out any possible influencing factors
 Influencing factors
 Network traffic
 Active machines
Testing and Results
 Simple query

“SELECT * FROM ping WHERE (ip = 2464643887) OR (ip =
2464643464) OR (ip = '2464639301') OR (ip = '2464625293')”
 Returning 11 853 rows
 Farming Method
 Averaged 35 seconds
 Normal Method
 Averaged 94 seconds
Testing and Results
01:17.760
01:09.120
01:00.480
00:51.840
00:43.200
00:34.560
00:25.920
00:17.280
00:08.640
00:00.000
Normal Method
Time to Query
1
12 23 34 45
56 67 78 89 100
Query Number
Query Time
Query Time
Farming Method
02:18.240
02:00.960
01:43.680
01:26.400
01:09.120
00:51.840
00:34.560
00:17.280
00:00.000
Time to Query
1
12 23 34 45
56 67 78 89 100
Query Number
“SELECT * FROM ping WHERE (ip = 2464643887) OR (ip =
2464643464) OR (ip = '2464639301') OR (ip = '2464625293')”
Hypothesis
 Technique will create a more cost effective
database farm 
 Eradicate some problems associated with dealing
with large databases 
Possible Extensions
 Full access to DB via HTTPS
 Front-end
 Query construction wizard
 Investigate partitioning techniques
 “Intelligent” querying
Questions ?