EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS
Download
Report
Transcript EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS
EXTERNAL SORTING ALGORITHMS
AND IMPLEMENTATIONS
05011004 Ayhan KARGIN
05011027 Ahmet MERAL
Contents
External Sorting Needs and Usage Areas
External Sorting Algorithms
Environments for Implementing External Sorting Algorithms
Used Technologies
Phases of External Sorting
K-Way Merge Sort
Multi-Step K-Way Merge Sort
Replacement Selection Sort
SimpleDB
Layered Components of SimpleDB
Classes of SimpleDB
Query Layer of SimpleDB
Relational Algebra, that SimpleDB Supports
Relational Algebra
Preparatory Work
External Sorting on SimpleDB
External Sorting Needs
and Usage Areas
DBMS
Group By, Join, Order By
Data Warehouse (ETL)
Data Mining
Data Processing
External Sorting Algorithms
K-Way Merge Sort
Multi-Step K-Way Merge Sort
Replacement Selection Sort
Environments for Implementing
External Sorting Algorithms
MinSQL
PosgreSQL
SimpleDB
Used Technologies
Java VM
Java SE
JDBC
Java RMI
Eclipse
Phases of External Sorting
Run Generation Phase
Merge Phase
K-Way Merge Sort
Multi-Step K-Way Merge Sort
Multi-Step K-Way Merge Sort
Replacement Selection
Replacement Selection
Stage Area
Disc Access Count
Run Time
K (Run Count)
1x
3817
0.690
17
2x
420
0.742
8
3x
252
1.049
6
4x
258
1.420
4
5x
266
1.952
4
6x
279
2.333
3
Test results depending on stage area size
Note: Main memory is 8x.
SimpleDB
The Client Side
The Basic Server
That contains the JDBC interfaces and
implements the JDBC driver
Which provides complete funcionality of
DB but ignores efficiency issues
Extensions
To the basic server that support
efficient query processing.
Layered Components of SimpleDB
Remote
Parse
Create an execution strategy for an SQL statement, and translate it to a
relational algebra plan.
Query
Extract the tables, fields, and predicate mentioned in an SQL statement.
Planner
Perform JDBC requests received from clients.
Implement queries expressed in relational algebra.
Metadata
Maintain metadata about the tables in the database, so that its records and
fields are accessible.
Layered Components of SimpleDB
Record
Transaction
Maintain a cache of pages in memory to hold recently-accessed user
data.
Log
Support concurrency by restricting page access. Enable recovery by
logging changes to pages.
Buffer
Provide methods for storing data records in pages
Append log records to the log file, and scan the records in the log file.
File
Read and write between file blocks and memory pages.
Classes of SimpleDB
Classes of SimpleDB
Classes of SimpleDB
Query Layer of SimpleDB
Relational Algebra
Select
Project
Product
Sort
Relational Algebra,
that SimpleDB Supports
Relational Algebra
SQL
Select
Project
Product
Sort
Where
Select
Join
Order By
Relational Algebra
Select SId, SName, DName from
STUDENT, DEPT where
MajorId=DId and DName='math'
Q1: Product (STUDENT, DEPT)
Q2: Select (Q2, MajorId=DId )
Q3: Select (Q2, DName='math')
Q4: Project (Q3, {SId, SName,
DName })
Relational Algebra
Preparatory Work
We have to introduce order by operator (in SQL) to parser layer
of SimpleDB.
The columns, which will be sort, must be presenced in QueryData
class in structural form.
If sorting will be happened, sorting plans must be called by
BasicQueryPlanner class.
Calculating number of random accesses per file on FileMgr layer.
Calculating duration of transaction with adding start time and end
time to Transaction class.
Calculating number of unsorted records on each transaction.
We may have to compare, copy and exchange congeneric records
on different scans. So, RecordExchange class is written for these
purposes.
StepCalculator class is written. It splits any number to two close
integer multipliers. For example; from 29, 6 and 5; from 49, 7
and 7; from 50, 8 and 7; etc.
External Sorting on SimpleDB
Materialize Plan
Edward Sciore’s Merge Sort Plan&Scan
K-Way Merge Sort Plan&Scan
Multi-Step K-Way Merge Sort
Plan&Scan
Replacement Selection Sort Plan&Scan
Multi-Step Replacement Selection Sort
Plan&Scan
Algorithms Choice Dependency