Representing Data Elements

Download Report

Transcript Representing Data Elements

Representing Data Elements
Fields, Records, Blocks
Variable-length Data
Modifying Records
Source: our textbook
1
Overview
Attributes are represented by sequences of
bytes, called fields
Tuples are represented by collections of
fields, called records
Relations are represented by collections of
records, called files
Files are stored in blocks, using specialized
data structures to support efficient
modification and querying
2
Representing SQL Data Types
integers and reals: built-in
CHAR(n): array of n bytes
VARCHAR(n): array of n+1 bytes (extra
byte is either string length or null char)
dates and times: fixed length strings
etc.
3
Representing Tuples
For now, assume all attributes (fields)
are fixed length.
Concatenate the fields
Store the offset of each field in schema
0
name
CHAR(30)
30 bytes
30
286
address
VARCHAR(255)
256 bytes
287
birthdate
gender
CHAR(1) DATE
10 bytes
1 byte
297
4
More on Tuples
Due to hardware considerations, certain
types of data need to start at addresses
that are multiples of 4 or 8
Previous example becomes:
0
name
CHAR(30)
30 bytes
+2
32
288
address
VARCHAR(255)
256 bytes
292
gender
CHAR(1)
1 byte
+3
birthdate
DATE
10 bytes
+2
304
5
Record Headers
Often it is convenient to keep some
"header" information in each record:
 a pointer to schema information
(attributes/fields, types, their order in the
tuple, constraints)
 length of the record/tuple
 timestamp of last modification
6
Packing Records into Blocks
Start with block header:
 timestamp of last modification/access
 offset of each record in the block, etc.
Follow with sequence of records
May end with some unused space
header record 1 record 2
…
record n-1record n
7
Representing Addresses
Often addresses (pointers) are part of
records:
 the application data in object-oriented databases
 as part of indexes and other data structures
supporting the DBMS
Every data item (block, record, etc.) has two
addresses:
 database address: address on the disk
(typically 8-16 bytes)
 memory address, if the item is in memory
(typically 4 bytes)
8
Translation Table
Provides mapping from database
addresses to memory addresses for all
blocks currently in memory
Later we'll discuss how to implement it
9
Pointer Swizzling
When a block is moved from disk into main
memory, change all the disk addresses that
point to items in this block into main memory
addresses.
Need a bit for each address to indicate if it is
a disk address or a memory address.
Why? Faster to follow memory pointers (only
uses a single machine instruction).
10
Example of Swizzling
Disk
Main Memory
read into
main memory
swizzled
Block 1
unswizzled
11
Block 2
Swizzling Policies
Automatic swizzling: as soon as block is
brought into memory, swizzle all relevant
pointers
Swizzling on demand: only swizzle a
pointer if and when it is actually followed
No swizzling: always refer to translation
table
Programmer control
12
Automatic Swizzling
Locating all pointers within a block:
 refer to the schema, which will indicate where
addresses are in the records
 for index structures, pointers are at known
locations
Update translation table with memory
addresses of items in the block
Update pointers in the block (in memory)
with memory addresses, when possible, as
obtained from translation table
13
Unswizzling
When a block is moved from memory
back to disk, all pointers must go back
to database (disk) addresses
Use translation table again
Important to have an efficient data
structure for the translation table
14
Pinned Records and Blocks
A block in memory is pinned if it cannot
be safely written back to disk
Indicate with a bit in the block header
Reasons for pinning:
 related to failure recovery (more later)
 because of pointer swizzling
If block B1 has swizzled pointer to an
item in block B2, then B2 is pinned.
15
Unpinning a Block
Consider each item in the block to be
unpinned
Keep in the translation table the places
in memory holding swizzled pointers to
that item (e.g., with a linked list)
Unswizzle those pointers: use translation
table to replace the memory addresses
with database (disk) addresses
16
Variable Length Data
Data items with varying size (e.g., if
maximum size of a field is large but
most of the time the values are small)
Variable-format records (e.g., NULLs
method for representing a hierarchy of
entity sets as relations)
Records that do not fit in a block (e.g.,
an MPEG of a movie)
17
Variable-Length Fields
Store the fixed-length fields before the
variable-length fields in each record
Keep in the record header
 record length
 pointers to the beginnings of all the
variable-length fields
Book discusses variations on this idea
18
Variable Length Fields
other
header
info
to var len
field 2 to var len
field 3
fixed len fixed len var len
field 1
field 2 field 1
var len
field 2
var len
field 3
record
length
19
Variable-Format Records
Represent by a sequence of tagged
fields
Each tagged field contains
 name
 type
 length, if not deducible from the type
 value
20
Splitting Records Across Blocks
Called spanned records
Useful when
 record size exceeds block size
 putting an integral number of records in a block
wastes a lot of the block (e.g., record size is 51%
of block size)
Each record or fragment header contains
 bit indicating if it is a fragment
 if fragment then pointers to previous and next
fragments of the record (i.e., a linked list)
21
Record Modification
Modifications to records:
 insert
 delete
 update
issues even with fixed-length records
and fields
even more involved with variable-length
data
22
Inserting New Records
If records need not be any particular
order, then just find a block with
enough empty space
Later we'll see how to keep track of all
the tuples of a given relation
But what if blocks should be kept in a
certain order, such as sorted on primary
key?
23
Insertion in Order
header
unused
record
4
record
3
record
2
record
1
If there is space in the block, then add the record
(going right to left), add a pointer to it (going left
to right) and rearrange the pointers as needed.
24
What if Block is Full?
Records are stored in several blocks, in
sorted order
One approach: keep a linked list of
"overflow" blocks for each block in the
main sequence
Another approach is described in the
book
25
Deleting Records
Try to reclaim space made available
after a record is deleted
If using an offset table, then rearrange
the records to fill in any hole that is left
behind and adjust the pointers
Additional mechanisms are based on
keeping a linked list of available space
and compacting when possible
26
Tombstones
What about pointers to deleted
records?
We place a tombstone in place of each
deleted record
Tombstone is permanent
Issue of where to place the tombstone
Keep a tombstone bit in each record
header: if this is a tombstone, then no
need to store additional data
27
Updating Records
For fixed-length records, there is no
effect on the storage system
For variable-length records:
 if length increases, like insertion
 if length decreases, like deletion except
tombstones are not necessary
28