Transcript NEUStore

NEUStore: A Simple Java Package
for the Construction of Disk-based,
Paginated, and Buffered Indices
Donghui Zhang
CCIS, Northeastern University
http://www.ccs.neu.edu/home/donghui/research/neustore/
A DB index should be stored as a disk file.
Let’s check what
Java.io.RandomAccessFile
provides…
java.io.RandomAccessFile
• Views a file as a sequential list of bytes
• Has operators seek, read, write
// In the file “./mydata”, copy bytes 10-19 to 0-9.
RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”);
byte[] buf = new byte[10];
file.seek(10); file.read(buf);
file.seek(0); file.write(buf);
java.io.RandomAccessFile
• Views a file as a sequential list of bytes
buf: 10 bytes in memory
// In the file “./mydata”, copy bytes 10-19 to 0-9.
RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”);
byte[] buf = new byte[10];
file.seek(10); file.read(buf);
file.seek(0); file.write(buf);
java.io.RandomAccessFile
• Views a file as a sequential list of bytes
buf: 10 bytes in memory
// In the file “./mydata”, copy bytes 10-19 to 0-9.
RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”);
byte[] buf = new byte[10];
file.seek(10); file.read(buf);
file.seek(0); file.write(buf);
java.io.RandomAccessFile
• Views a file as a sequential list of bytes
buf: 10 bytes in memory
// In the file “./mydata”, copy bytes 10-19 to 0-9.
RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”);
byte[] buf = new byte[10];
file.seek(10); file.read(buf);
file.seek(0); file.write(buf);
java.io.RandomAccessFile
• Views a file as a sequential list of bytes
buf: 10 bytes in memory
// In the file “./mydata”, copy bytes 10-19 to 0-9.
RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”);
byte[] buf = new byte[10];
file.seek(10); file.read(buf);
file.seek(0); file.write(buf);
java.io.RandomAccessFile
• Views a file as a sequential list of bytes
buf: 10 bytes in memory
// In the file “./mydata”, copy bytes 10-19 to 0-9.
RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”);
byte[] buf = new byte[10];
file.seek(10); file.read(buf);
file.seek(0); file.write(buf);
There is a gap between
what Java.io.RandomAccessFile provides
and
what a DB index implementation needs.
Consider implementing a B+-tree
class BPlusTree {
private RandomAccessFile file;
private DBBuffer buffer;
private int rootPageID;
public BPlusTree(String filename, DBBuffer buffer);
public void Insert(Key, Data);
public void Delete(Key);
public Data Search(Key);
}
Consider implementing a B+-tree
class BPlusTree {
……
public BPlusTree(String filename, DBBuffer buffer) {
file = new RandomAccessFile(filename, “rw”);
this.buffer = buffer;
rootPageID = ……
}
……
}
public void Insert(Key, Data) {
// Find the leaf page where the record should be inserted,
// pinning all pages along the path.
int pageID = rootPageID;
DBPage page = buffer.readPage(file, rootPageID);
while ( page.nodeType != LEAF ) {
buffer.pinPage(file, pageID);
pageID = ((BPlusTreeIndexPage)page).Search(Key);
page = buffer.readPage(file, pageID);
}
buffer.pinPage(file, pageID);
// Insert the record into the leaf page
((BPlusTreeLeafPage)page).Insert(Key, Data);
// Handle Overflow…
}
public void Insert(Key, Data) {
// Continued
……………………
if (page.isOverflow()) {
int newPageID = allocate(); // in Delete, freePage()
DBPage newPage = new DBPage(newPageID);
Move half of the records from page to newPage.
Read the right sibling and fix the sibling pointers.
buffer.writePage(file, pageID, page);
buffer.unpin(file, pageID);
buffer.writePage(file, newPageID, newPage);
buffer.writePage for the right sibling page.
Recursive split if needed.
}
Unpin all pages.
}
A DB index…
• Uses a random access disk file to store data.
• Views a file as a sequential list of pages.
0
1
2
3
4
5
6
7
• Needs to support allocate and freePage.
• Needs to manage the empty pages in the
middle.
• Needs to utilize a buffer…
A set of DB indices
readPage
writePage
file1, page0
file2, page3
A DB buffer
file1
0
1
2
3
4
5
6
7
file2
0
1
2
3
4
5
6
7
file3
0
1
2
3
4
5
6
7
Summary of the gap
• Java.io.RandomAccessFile provides:
seek(), read(), write()
• Implementing a DB index needs:
allocate(), freePage(),
buffer.readPage(), buffer.writePage()
NEUStore fills the gap!
• class DBIndex {
allocate()
freePage()
}
• class DBBuffer{
readPage()
writePage()
}
Download and setup
• See the PDF file!
• Download the .tgz file and unzip. A
neustoreroot directory will be created.
• Use Eclipse to create a new project.
• Use javadoc.
• Smartly apply certain changes due to newer
version of Java and Eclipse.
neustoreroot has 3 directories
• doc: the document generated using Javadoc.
• neustore: the storage package. It contains:
– base: the base of the storage package. It contains
definitions of core classes DBIndex, DBBuffer, DBPage,
and so on.
– heapfile: a generic implementation of the heapfile.
Here the class HeapFile was derived from DBIndex.
• test: sample test programs. It contains:
– naiveheapfile: implementation of a naive heapfile, and
the test program for it.
– heapile: test program for using neustore.heapfile.
Packages and classes
class DBPage
• Abstract base class for a memory-version page.
• E.g. derive BPlusTreeIndexPage from it.
• Constructor:
DBPage(int nodeType, int pageSize);
creates a DBPage.
nodeType -1 and 0 are preserved.
1 and above can be user-defined. E.g. nodeType 1
means BPlusTreeIndexPage and nodeType 2
means BPlusTreeLeafPage.
class DBPage
• Convert to/from disk-version page (byte[]):
– abstract void read( byte[] page );
Reads the content of this DBPage from a byte array.
– abstract void write( byte[] page );
Writes the content of this DBPage to a byte array.
• Example usage of neustore.base.ByteArray:
byte[] b = ……
ByteArray ba = new ByteArray( b, ByteArray.READ );
int firstInt = ba.readInt( );
int secondInt = ba.readInt( );
class DBBuffer
• Constructor:
DBBuffer(int bufferSize, int pageSize);
bufferSize: # pages
pageSize: # bytes/page
• Some useful functions:
–
–
–
–
pin()
unpin()
clearIOs()
getIOs()
DBBuffer.readPage(file, pageID)
• What should the return type be?
• DBPage -- problematic to implement:
DBPage readPage(RandomAccessFile file, int pageID) {
if (file, pageID) exists in the buffer
return the corresponding DBPage object;
else {
choose a buffered page to throw away;
read from the file pageSize bytes to a byte[];
convert to a DBPage and return; ?? but how??
}
}
DBBuffer.readPage(file, pageID)
• A DBBuffer is general. It buffers pages
from different indices.
• So whenever a page is physically read from
disk, it should be kept as a byte[], period.
• What if store byte[] in DBBuffer?
– Works for C++: cast (char*) to (MyDBPage*)
– Extremely inefficient for Java!! Every time to
access a page, needs to convert from byte[].
Solution: allow storing in both types!
• When physically read, store as byte[]. Later when the
application calls writePage(file, pageID, DBPage dbpage),
store as DBPage.
• return type of readPage should be:
public class DBBufferReturnElement {
public boolean parsed;
public Object object;
public int nodeType;
}
parsed means the object is a DBPage; and non-parsed
means the object is a byte[].
DBBuffer’s abstract functions
• add(): to add a new page into buffer;
• find(): to find a buffered page;
• flush(): to empty the buffer while writing dirty
pages to disk.
• A derived class should implement them.
• neustore.base includes a derived class: LRUBuffer.
But the existing implementation did not use
hashing.
DBIndex
• Constructor:
DBIndex(DBBuffer buffer, String filename,
boolean isCreate)
Create or open a DBIndex.
• int allocate()
• void freePage(int pageID)
• void close(): // should be called before terminating
the application to write head page and flush buffer.
DBIndex
• The head page (having pageID=0) uses the first
OVERHEAD=16 bytes to store four integers:
– -1 : in every disk page, the first four bytes is nodeType.
Type -1 means head page. Type 0 means empty page.
– pagesize
– numOfPages: number of non-empty pages
– free: pageID of the first empty page. All empty pages
are linked together.
• The rest (pageSize – OVERHEAD) bytes store
user-specified meta data, e.g. rootPageID.
Meta data in DBIndex head page
• To manage the meta data, a derived class
needs to implement the three abstract
functions:
– initIndexHead(): called when the index is built;
e.g. set rootPageID = allocate();
– readIndexHead(): called when the index is
opened; e.g. read existing rootPageID;
– writeIndexHead(): called right before the index
is closed; e.g. write the updated rootPageID.
Case study 1: naïve heap file
• Simplified requirement:
– Insert = add an integer at the end of the file.
– Search = tell whether an integer exists.
• Three files in neustoreroot/test/naiveheapfile:
– NaiveHeapFilePage.java: derive from DBPage
– NaiveHeapFile.java: derive from DBIndex
– TestNaiveHeapFile.java: contains main()
– NaiveHeapFilePage.java: derive from DBPage
• Needs to implement abstract functions:
read() and write()
– NaiveHeapFile.java: derive from DBIndex
– TestNaiveHeapFile.java: contains main()
public class NaiveHeapFilePage extends DBPage {
private Vector<Integer> records;
public NaiveHeapFilePage( int _pageSize ) {
super(1, _pageSize);
records = new Vector<Integer>();
}
public int numRecs() { return records.size(); }
public void insert( int key ) {
records.add( new Integer(key) );
}
public boolean search( int key ) {……}
protected void read( byte[] b ) {……}
protected void write( byte[] b ) {……}
}
public class NaiveHeapFilePage extends DBPage {
……
public boolean search( int key ) {
for ( Integer e: records ) {
if (e.intValue() == key) return true;
}
return false;
}
protected void read( byte[] b ) {……}
protected void write( byte[] b ) {……}
}
public class NaiveHeapFilePage extends DBPage {
……
protected void read( byte[] b ) {
ByteArray ba = new ByteArray(b, ByteArray.READ);
ba.readInt(); // nodeType, ignore
int numRecs = ba.readInt();
records.removeAllElements();
for ( int i=0; i<numRecs; i++ ) {
int key = ba.readInt();
records.add(new Integer(key));
}
}
protected void write( byte[] b ) {……}
}
public class NaiveHeapFilePage extends DBPage {
……
protected void write( byte[] b ) {
ByteArray ba=new ByteArray(b, ByteArray.WRITE);
int numRecs = records.size();
ba.writeInt( nodeType );
ba.writeInt( numRecs );
for ( Integer key: records ) {
ba.writeInt( key.intValue() );
}
}
}
– NaiveHeapFilePage.java: derive from DBPage
– NaiveHeapFile.java: derive from DBIndex
• Needs to implement abstract functions
initIndexHead(), readIndexHead() and
writeIndexHead().
– TestNaiveHeapFile.java: contains main()
public class NaiveHeapFile extends DBIndex {
private int pageCapacity; // max # recs in a page
private int lastPageID; // ID for last page
private NaiveHeapFilePage lastPage;
public NaiveHeapFile(DBBuffer buffer, String filename,
boolean isCreate) {……}
protected void initIndexHead() {……}
protected void readIndexHead(byte[] head) {……}
protected void writeIndexHead(byte[] head) {……}
public void insert( int key ) {……}
public boolean search( int key ) {……}
protected NaiveHeapFilePage myReadPage( int pageID ) {……}
}
public class NaiveHeapFile extends DBIndex {
……
public NaiveHeapFile(DBBuffer _buffer, String filename,
boolean isCreate) {
super(_buffer, filename, isCreate);
pageCapacity = (pageSize-8)/4; // nodeType & numRecs
if(isCreate)
lastPage = new NaiveHeapFilePage(pageSize);
else
lastPage = myReadPage(lastPageID);
}
……
}
public class NaiveHeapFile extends DBIndex {
……
protected void initIndexHead() {
lastPageID = allocate();
}
protected void readIndexHead(byte[] head) {
ByteArray ba = new ByteArray(head, ByteArray.READ);
lastPageID = ba.readInt();
}
protected void writeIndexHead(byte[] head) {
ByteArray ba = new ByteArray(head, ByteArray.WRITE);
ba.writeInt(lastPageID);
}
……
}
public class NaiveHeapFile extends DBIndex {
……
public void insert( int key ) {
if ( lastPage.numRecs() >= pageCapacity ) {
lastPageID = allocate();
lastPage = new NaiveHeapFilePage(pageSize);
}
lastPage.insert( key );
buffer.writePage(file, lastPageID, lastPage);
}
……
}
public class NaiveHeapFile extends DBIndex {
……
public boolean search( int key ) {
for ( int currentPageID=1;
currentPageID<=lastPageID;
currentPageID++ )
{
NaiveHeapFilePage currentPage =
myReadPage(currentPageID);
if ( currentPage.search(key) ) return true;
}
return false;
}
……
}
public class NaiveHeapFile extends DBIndex {
……
protected NaiveHeapFilePage myReadPage( int pageID ) {
NaiveHeapFilePage page;
DBBufferReturnElement ret = buffer.readPage(file, pageID);
if ( ret.parsed ) {
page = (NaiveHeapFilePage)ret.object;
}
else {
page = new NaiveHeapFilePage(pageSize);
page.read((byte[])ret.object);
}
return page;
}
}
– NaiveHeapFilePage.java: derive from DBPage
– NaiveHeapFile.java: derive from DBIndex
– TestNaiveHeapFile.java: contains main()
public class TestNaiveHeapFile {
public static void main(String[]args) {
System.out.println( "***** Creating index *****" );
LRUBuffer buffer = null;
buffer = new LRUBuffer( 5, 20 );
NaiveHeapFile hp = new NaiveHeapFile(buffer, filename,
DBIndex.CREATE);
System.out.println( "***** Inserting keys *****");
hp.insert(10);
System.out.println( "***** Searching keys *****" );
hp.search(10);
System.out.println( “***** Closing index ******” );
hp.close();
}
}
Limitations of the naïve heap file
1. A record has no data (only key).
2. No deletion is allowed.
3. Key type is fixed (as integer).
Naïve heap file  neustore.heapfile
1.
2.
3.
A record has no data (only key).
Solution: define a private class HeapFileRecord, inside
class HeapFilePage, which contains a key and a data.
No deletion is allowed.
Solution: allow deletion. To enable leaving space in the
middle, maintain a double linked list of full pages and a
separate double linked list of non-full pages.
Key type is fixed (as integer).
?? How to implement a generic heapfile, which can be
used for any type of key (and data)?
Generic key (data)
• Define two interfaces Key and Data. Use them as key and
data types to HeapFile (and B+-tree you are about to
implement).
• Two important member functions of both Key and Data
are:
– public abstract void read(DataInputStream in):
Reads the content of the Key object from an input stream;
Example use: to convert a byte[] to a HeapFilePage, read the
number of records in the page, and then repeatedly generate a Key
object (and a Data object) and call its read function.
– public abstract void write(DataOutputStream out):
Writes the content of the object to an output stream.
• neustore.base implements IntKey and StringData.
A challenge in generic key (data)
• In the implementation of the (generic) heap file, we need to
create a Key object and call its read() member function.
However, Key is an interface and therefore
Key key = new Key()
does not work!!
• Solution: require the constructor of HeapFile (and also
HeapFilePage) to take as input a sample key and a sample
data.
When a new Key object is needed: sampleKey.clone();
Summary