Creating and maintaining a Database(powerpoint)

Download Report

Transcript Creating and maintaining a Database(powerpoint)

Creating and Maintaining a
Database
The DBA’S Job
New York Conference 2005
Tasks

Design

– Logical Design
– Physical Design
– Documentation

Implementation
–
–
–
–
Test
Performance
Security
Concurrent Updates
Maintenance
– Backup
– Recovery
– Data Integrity

New Releases
– SIR
– Application
Designing a
Relational Database


Normalization
– Eliminate redundant data
– Identify data dependencies – keys
1st Normal Form
– One value per column
– Unique primary key

2nd Normal Form
– No subsets of data in multiple rows of a table

3rd Normal Form
– All columns fully dependant on primary key
Example
Order #
Customer #
Address
Part #
Description
Unit Price
Quantity
Total
1234
409
xxxxxx
10
xxxxxx
$10.00
5
50.00
1234
409
xxxxxx
20
xxxxxx
$15.00
3
45.00
Possible Tables

Order - Order #
– Customer #

Order Item - Order # Line #
– Product Code
– Qty
– Unit Price ?

Customer – Customer #
– Address

Product – Product #
– Description
– Unit Price
Keys
Must be unique
 Good if real world

– Employee Id/Product Code etc.
May not be the only access required
 Should be short
 Avoid unformatted alphabetic
 If subordinate repeating group,
consider sequence number

Normalized Implementation

Know the rules
– Know the application

Alternatives
– How many repeats of a column/group?
– Dependent data volatility/convenience

Document
– Variables – labels, descriptions
– Records – keys, variables, foreign keys
SIR Schema

Case Definition
– Case Id
– Max Counts

Record Definition
–
–
–
–

Key Fields
Max Counts
Default Security
Variables within records
Documentation
command for case and
record

Variable Definition
–
–
–
–
–
–
–
–
Format & Position
Missing Values
Valid Values
Value Labels
Categorical Vars
Variable Ranges
Variable Label
Extended label for
variable documentation
– Variable Security
Schema functions in PQL



60+ database functions
30 tabfile functions
Examples:
–
–
–
–
–
–
–
–
–
NRECS
RECNAME
NKEY
KEYNAME
NVARS
VARNAME
VARLABSC
VFORMAT
VTYPE

Sec Index Functions
–
–
–
–
DBINDS
DBINDR
DBINDV
DBINDT
Quick Data Dictionary

Four Record Types
– Variables
– Records
– Record keys
– Record data
Populate from any database
 Check consistency

Example Data Dictionary


Create
Populate from MNYR
– 55 record types
– 2216 variables in records

Check consistent use of variables
– Labels
– Formats
– Types


Identify foreign keys
Look at secondary indexes
SIR Structures

Multiple Database
– Until SIR2000 exactly one database in SIR session
– Design suggests separate databases for separate
hierarchies
– Had to use ‘dummy’ cases in single database

Inverted Lists
– Until SIR2002 no secondary index
– Had to use ‘dummy’ cases for inverted list

Auto Increment Keys
Physical Structure
Single Data File
 Two types of blocks

– Index




Contain keys plus pointers to other blocks
Single top level block
From one to six further levels
Bottom level points to data block
– Data

Contain keys and data
SIR Data File
Top Level Index
Index Level 1
Bottom Level Index
Data Blocks
Index Level 1
Index Level 1
Data Record

Header
– Size
– Update level
– Lock status

Separate Key in front of record
– All keys same size in single database
– Case id, record number,record key fields
– Special so can be searched

Record organized by data format
– Real8, real4, I4, I2, I1, Character
CIR
One per case
 Count for each record type

– e.g. max rec types – 100
100 integers
– I1 – up to 123
– I2 – up to 32,000
– I4 – over this

Common variables
Size of Blocks

Index
– Calculated from key size and number of
records
– Minimum 2K (253 dwords) Maximum 32K

Data
– Calculated from maximum record size and
number of records
– Minimum 2K (254 dwords) Maximum 32K
– Minimum 4 records per block
Index Calculation

Example: Key Length 16
Number of records 1 million
– At 4 per block need 250,000 blocks
– Each index entry takes 3 dwrds

–
–
–
–
–
Key in dwords + 1 dwrd for pointer & count)
Minimum block holds 253/3 – 84 entries
Top level Single block 84 entries
Second level 84 blocks each with 84 entries
Two level index points to 7,056 data blocks
Three level points to 592,704 data blocks
Data Block Growth


Add first record
Start with 3 blocks

– Find data block
– Put record in block in
correct key sequence
– If new record first in
block, update higher
level that pointed
– Top Level Index



1 entry
key of record
Points to second level
– Second level index



1 entry
Key of record
Points to data block
– Data Block - 1 entry
Add records

Block Too Big
– Split into two blocks
– Add new key to higher
level
Loading Factor


How to split block
Records being added randomly
– Split block in middle
– Any block added to likely to have space
– .5 loading factor

Records added in ascending key
– Leave original block as full as possible
– All adding to new block
– .99 loading factor
Suggested Loading Factors

Standard Updates - .5
– If set very high and activity all on original block,
lots of empty new blocks

IMPORT - .99
– Let standard updates split blocks when needed

RELOAD - .99
– no block splitting
– Block filled to loading factor
– Space for largest record
List Stats Info
Number of Index Levels
2
 Max Entries Per Index Block 42
 Index/Data Block Size
253/3314
 Active/Inactive Data Blocks
92/0
 Active/Inactive Index Blocks
4/0
 Keysize In Bytes
34
 Min/Max Record Size
0/808

Inactive Blocks
New Blocks added at end
 List maintained of empty blocks
 Re-used when new block needed
 All records deleted in stand alone
 Block update strategy in Master

Master
Allow multiple users to update a
database concurrently
 Intended primarily for multiple
interactive users
 Communicates via TCP/IP
 Machine dependent database access
 Provides a consistent database view for
independent retrievals from database

Master Operation

Start Master
– Starts with an address
– Waits for client message
– Does nothing else, NO database access
Master is NOT permanently connected
to any specific database
 Client tells master which database to
connect to

Database Access

Database is opened and closed during
SIR session as needed
– During PQL retrievals
– During utilities (No master)


Batch Data Input
Export, Unload, Spreadsheet, …
– During schema updates (No master)
– ‘Old’ Forms while form is running
Database Access

Database open for write
– Single User
– Master

- Exclusive Use
- Shared Read
SIR database files
– sr1/sr2 - meta data – needed by both client (read
only) and master
– sr3 - data
– controlled by master
– sr4 - procedures
– controlled by client
– sr5 - journal
– controlled by master
– sr6 - sec. index
– controlled by master
Single user SIR allows
Multiple Readers OR
a Single Writer to a database
Master:
How it Works
Part 1
SIR
(Copy 1)
User A
Reads & Writes
Exclusively
SIR
(Copy 2)
OR
SIR
(Copy 3)
Users B & C
Share
Read Only
Master:
How it Works
Part 2
SIR
(Copy 1)
Master allows multiple writers,
readers plus independent
readers
SIR
(Copy 2)
SIR
(Copy 3)
Users A & B
Reads & Writes
User C
Independent
Read Only
Master
SIR Data File
SIR Data File
How does Master work?
Client changes access to use Master
 Lock Manager for clients accessing
through Master
 Delayed view of updates ‘Difference File
Copy’ for independent readers

Client

SIR session switches from single user to use
a specific master
– Master must be available at this point
– All subsequent retrievals then automatically use
Master
– Various utilities not available

Sends Master a request for single data record
at a time
– Master selects on key values
– Client does any selection on data values

Data requests preceded by lock requests
Master

Gets initial logon from client
– Allocates identifier

Gets database open from client
– Checks if already known (open by another client)
– Allocates identifier

Database Identification
– Full pathname is passed by client
– Path is ‘as seen’ by client

Client needs to find database
– Master needs to find database
– Master needs to know that database referenced
by multiple clients is same database
Master Resource Control

Gets request for lock on resource (e.g.
case/record key)
– Checks lock table
– Creates entry if resource available
Gets request for record retrieval
 Gets request for record update

Lock Types

Transmitted by client from PQL. Checks
existing lock on resource
1 = Null - becomes exclusive in Update, concurrent
read in retrieval
2 = Concurrent read - Fails if exclusive
3 = Concurrent write - Fails if protected or exclusive
4 = Protected read - Fails if concurrent write,
protected write or exclusive
5 = Protected write - Fails if concurrent write,
protected or exclusive
6 = Exclusive - Fails if any prior lock
Locks in PQL


Ignored in single user mode
RETRIEVAL
– LOCK = CR, CW,PR,PW,EX (2,3,4,5,6)
– CIRLOCK, RECLOCK
– Default: Update –Ex Retrieval – CR

CASE/RECORD commands
– LOCK = numeric_expression
– Nested case/records inherit outer lock
– Lock held until NEXT or EXIT at this level
Locked Case/Record
Block is entered
 Variables set to undefined
 Test status with functions:

– SYSTEM(36) = 1 Record available
– SYSTEM(37) = 1 Case available
– SYSTEM(38) = 1 Master mode

Wait and retry, tell user with option,…
– RETRY CASE|RECORD
‘Delayed’ Updates





Enables independent retrieval to have
consistent view of data i.e. no updates seen
while retrieval running
Master creates local copy of master index
block
Whenever index or data block rewritten for
first time, Master allocates new block
Keeps list of redundant blocks (index & data)
Identical process on secondary indexes
Difference File Copy
Makes updates available to independent
retrieval
 Increments update level
 Creates journal header
 Writes master index
 Makes redundant blocks available if no
other users (can get exclusive access)

Managing Master

Start Parameters
– MST =
– PW =
– DFC=
MST = parameter



Master finds machine name, port 3000
MST = change port number to even_number
CLIENT MST = machine_name[:port]
– The machine name consists of a host and a
domain. It makes the start up for clients faster to
quote both the host and domain name (DNS)
– Start Master - Master started SirNT:3000
– Start Forms


MST=SirNT
MST=SirNT.sir.com.au
Other parameters

PW = password
– Any remote user who wants to administer
master must specify a matching password

DFC = minutes since a difference file
copy which would force an automatic
copy
Administering Master

Interrupt
–
–
–
–

No users being served
Commands
No password
Usage Statistics
Remote
– Other users still active
– Menu driven
– Password Protected
Administering Master
– List logged on users
– List attached databases
– Stop


Immediately
After users logoff
– Difference File copy

Set interval
Backup and Recovery

Unload
–
–
–
–
Header
Internal copy of database
Machine specific/SIR version specific
Brings all unloaded records up to current schema
definition
– Can have multiple unloads on same file
– Accessed sequentially

Journal
– Header
– Copy of database record after update
Update Level & Restructure





Update level incremented when database
open/closed for update
When record written, update level held in
record header
If schema changed old and new version kept
with update level
When record read, if record update level
earlier than schema change record is
transformed
When record written, in new format
Immediate Unload

Length of key changes
– Record type in key so increase in length
forces (Number of record types > 123)
Key definition changes for existing
record
 Record is locked until reloaded

Reload
Reload takes specific unload
 Defined by update level
 Database is recreated
 If complete set of journals, can be
applied to bring reloaded database up
to date
 ITEMIZE lists unloads or journals

EXPORT
Text version of database
 Machine Independent
 IMPORT rebuilds completely
 Machine Independent
 SIR version independent
 Choose for long term archive

VERIFY

Walks index
– Retrieves each data block
– Checks counts and pointers
– Reports structural problems
– Patch puts calculated value in counts
– Clear corruption flag

Walks secondary indexes
Creating and Maintaining
Databases
New York Conference 2005