Representing Data Elements
Download
Report
Transcript Representing Data Elements
Representing Data Elements
Fields, Records, Blocks
Variable-length Data
Modifying Records
Source: our textbook
1
Overview
Attributes are represented by sequences of
bytes, called fields
Tuples are represented by collections of
fields, called records
Relations are represented by collections of
records, called files
Files are stored in blocks, using specialized
data structures to support efficient
modification and querying
2
Representing SQL Data Types
integers and reals: built-in
CHAR(n): array of n bytes
VARCHAR(n): array of n+1 bytes (extra
byte is either string length or null char)
dates and times: fixed length strings
etc.
3
Representing Tuples
For now, assume all attributes (fields)
are fixed length.
Concatenate the fields
Store the offset of each field in schema
0
name
CHAR(30)
30 bytes
30
286
address
VARCHAR(255)
256 bytes
287
birthdate
gender
CHAR(1) DATE
10 bytes
1 byte
297
4
More on Tuples
Due to hardware considerations, certain
types of data need to start at addresses
that are multiples of 4 or 8
Previous example becomes:
0
name
CHAR(30)
30 bytes
+2
32
288
address
VARCHAR(255)
256 bytes
292
gender
CHAR(1)
1 byte
+3
birthdate
DATE
10 bytes
+2
304
5
Record Headers
Often it is convenient to keep some
"header" information in each record:
a pointer to schema information
(attributes/fields, types, their order in the
tuple, constraints)
length of the record/tuple
timestamp of last modification
6
Packing Records into Blocks
Start with block header:
timestamp of last modification/access
offset of each record in the block, etc.
Follow with sequence of records
May end with some unused space
header record 1 record 2
…
record n-1record n
7
Representing Addresses
Often addresses (pointers) are part of
records:
the application data in object-oriented databases
as part of indexes and other data structures
supporting the DBMS
Every data item (block, record, etc.) has two
addresses:
database address: address on the disk
(typically 8-16 bytes)
memory address, if the item is in memory
(typically 4 bytes)
8
Translation Table
Provides mapping from database
addresses to memory addresses for all
blocks currently in memory
Later we'll discuss how to implement it
9
Pointer Swizzling
When a block is moved from disk into main
memory, change all the disk addresses that
point to items in this block into main memory
addresses.
Need a bit for each address to indicate if it is
a disk address or a memory address.
Why? Faster to follow memory pointers (only
uses a single machine instruction).
10
Example of Swizzling
Disk
Main Memory
read into
main memory
swizzled
Block 1
unswizzled
11
Block 2
Swizzling Policies
Automatic swizzling: as soon as block is
brought into memory, swizzle all relevant
pointers
Swizzling on demand: only swizzle a
pointer if and when it is actually followed
No swizzling: always refer to translation
table
Programmer control
12
Automatic Swizzling
Locating all pointers within a block:
refer to the schema, which will indicate where
addresses are in the records
for index structures, pointers are at known
locations
Update translation table with memory
addresses of items in the block
Update pointers in the block (in memory)
with memory addresses, when possible, as
obtained from translation table
13
Unswizzling
When a block is moved from memory
back to disk, all pointers must go back
to database (disk) addresses
Use translation table again
Important to have an efficient data
structure for the translation table
14
Pinned Records and Blocks
A block in memory is pinned if it cannot
be safely written back to disk
Indicate with a bit in the block header
Reasons for pinning:
related to failure recovery (more later)
because of pointer swizzling
If block B1 has swizzled pointer to an
item in block B2, then B2 is pinned.
15
Unpinning a Block
Consider each item in the block to be
unpinned
Keep in the translation table the places
in memory holding swizzled pointers to
that item (e.g., with a linked list)
Unswizzle those pointers: use translation
table to replace the memory addresses
with database (disk) addresses
16
Variable Length Data
Data items with varying size (e.g., if
maximum size of a field is large but
most of the time the values are small)
Variable-format records (e.g., NULLs
method for representing a hierarchy of
entity sets as relations)
Records that do not fit in a block (e.g.,
an MPEG of a movie)
17
Variable-Length Fields
Store the fixed-length fields before the
variable-length fields in each record
Keep in the record header
record length
pointers to the beginnings of all the
variable-length fields
Book discusses variations on this idea
18
Variable Length Fields
other
header
info
to var len
field 2 to var len
field 3
fixed len fixed len var len
field 1
field 2 field 1
var len
field 2
var len
field 3
record
length
19
Variable-Format Records
Represent by a sequence of tagged
fields
Each tagged field contains
name
type
length, if not deducible from the type
value
20
Splitting Records Across Blocks
Called spanned records
Useful when
record size exceeds block size
putting an integral number of records in a block
wastes a lot of the block (e.g., record size is 51%
of block size)
Each record or fragment header contains
bit indicating if it is a fragment
if fragment then pointers to previous and next
fragments of the record (i.e., a linked list)
21
Record Modification
Modifications to records:
insert
delete
update
issues even with fixed-length records
and fields
even more involved with variable-length
data
22
Inserting New Records
If records need not be any particular
order, then just find a block with
enough empty space
Later we'll see how to keep track of all
the tuples of a given relation
But what if blocks should be kept in a
certain order, such as sorted on primary
key?
23
Insertion in Order
header
unused
record
4
record
3
record
2
record
1
If there is space in the block, then add the record
(going right to left), add a pointer to it (going left
to right) and rearrange the pointers as needed.
24
What if Block is Full?
Records are stored in several blocks, in
sorted order
One approach: keep a linked list of
"overflow" blocks for each block in the
main sequence
Another approach is described in the
book
25
Deleting Records
Try to reclaim space made available
after a record is deleted
If using an offset table, then rearrange
the records to fill in any hole that is left
behind and adjust the pointers
Additional mechanisms are based on
keeping a linked list of available space
and compacting when possible
26
Tombstones
What about pointers to deleted
records?
We place a tombstone in place of each
deleted record
Tombstone is permanent
Issue of where to place the tombstone
Keep a tombstone bit in each record
header: if this is a tombstone, then no
need to store additional data
27
Updating Records
For fixed-length records, there is no
effect on the storage system
For variable-length records:
if length increases, like insertion
if length decreases, like deletion except
tombstones are not necessary
28