EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS

Download Report

Transcript EXTERNAL SORTING ALGORITHMS AND IMPLEMENTATIONS

EXTERNAL SORTING ALGORITHMS
AND IMPLEMENTATIONS
05011004 Ayhan KARGIN
05011027 Ahmet MERAL
Contents
















External Sorting Needs and Usage Areas
External Sorting Algorithms
Environments for Implementing External Sorting Algorithms
Used Technologies
Phases of External Sorting
K-Way Merge Sort
Multi-Step K-Way Merge Sort
Replacement Selection Sort
SimpleDB
Layered Components of SimpleDB
Classes of SimpleDB
Query Layer of SimpleDB
Relational Algebra, that SimpleDB Supports
Relational Algebra
Preparatory Work
External Sorting on SimpleDB
External Sorting Needs
and Usage Areas

DBMS




Group By, Join, Order By
Data Warehouse (ETL)
Data Mining
Data Processing
External Sorting Algorithms



K-Way Merge Sort
Multi-Step K-Way Merge Sort
Replacement Selection Sort
Environments for Implementing
External Sorting Algorithms



MinSQL
PosgreSQL
SimpleDB
Used Technologies





Java VM
Java SE
JDBC
Java RMI
Eclipse
Phases of External Sorting


Run Generation Phase
Merge Phase
K-Way Merge Sort
Multi-Step K-Way Merge Sort
Multi-Step K-Way Merge Sort
Replacement Selection
Replacement Selection
Stage Area
Disc Access Count
Run Time
K (Run Count)
1x
3817
0.690
17
2x
420
0.742
8
3x
252
1.049
6
4x
258
1.420
4
5x
266
1.952
4
6x
279
2.333
3
Test results depending on stage area size
Note: Main memory is 8x.
SimpleDB

The Client Side


The Basic Server


That contains the JDBC interfaces and
implements the JDBC driver
Which provides complete funcionality of
DB but ignores efficiency issues
Extensions

To the basic server that support
efficient query processing.
Layered Components of SimpleDB

Remote


Parse


Create an execution strategy for an SQL statement, and translate it to a
relational algebra plan.
Query


Extract the tables, fields, and predicate mentioned in an SQL statement.
Planner


Perform JDBC requests received from clients.
Implement queries expressed in relational algebra.
Metadata

Maintain metadata about the tables in the database, so that its records and
fields are accessible.
Layered Components of SimpleDB

Record


Transaction


Maintain a cache of pages in memory to hold recently-accessed user
data.
Log


Support concurrency by restricting page access. Enable recovery by
logging changes to pages.
Buffer


Provide methods for storing data records in pages
Append log records to the log file, and scan the records in the log file.
File

Read and write between file blocks and memory pages.
Classes of SimpleDB
Classes of SimpleDB
Classes of SimpleDB
Query Layer of SimpleDB

Relational Algebra




Select
Project
Product
Sort
Relational Algebra,
that SimpleDB Supports
Relational Algebra
SQL
Select
Project
Product
Sort
Where
Select
Join
Order By
Relational Algebra


Select SId, SName, DName from
STUDENT, DEPT where
MajorId=DId and DName='math'
Q1: Product (STUDENT, DEPT)
Q2: Select (Q2, MajorId=DId )
Q3: Select (Q2, DName='math')
Q4: Project (Q3, {SId, SName,
DName })
Relational Algebra
Preparatory Work








We have to introduce order by operator (in SQL) to parser layer
of SimpleDB.
The columns, which will be sort, must be presenced in QueryData
class in structural form.
If sorting will be happened, sorting plans must be called by
BasicQueryPlanner class.
Calculating number of random accesses per file on FileMgr layer.
Calculating duration of transaction with adding start time and end
time to Transaction class.
Calculating number of unsorted records on each transaction.
We may have to compare, copy and exchange congeneric records
on different scans. So, RecordExchange class is written for these
purposes.
StepCalculator class is written. It splits any number to two close
integer multipliers. For example; from 29, 6 and 5; from 49, 7
and 7; from 50, 8 and 7; etc.
External Sorting on SimpleDB






Materialize Plan
Edward Sciore’s Merge Sort Plan&Scan
K-Way Merge Sort Plan&Scan
Multi-Step K-Way Merge Sort
Plan&Scan
Replacement Selection Sort Plan&Scan
Multi-Step Replacement Selection Sort
Plan&Scan
Algorithms Choice Dependency