Transcript LES05

Creating the Physical Model
Designing the Physical Model
Phase IV: Defining the physical model
Database Object Naming Conventions
• Keep the logical and physical names similar and
descriptive.
• Capitalize table and attribute names.
• Use underscores instead of spaces to delineate
separate words in an object’s name.
• Use a suffix of _PK to indicate primary keys.
• Use a suffix of _ID to indicate production keys.
• Find a good balance between using very specific
and very vague names.
Database Object Naming Conventions
• Develop a reasonable list of abbreviations.
• List all the objects’ names, and work with the user
community to define them.
• Resolve name disputes.
• Document your naming standards in the metadata
document.
• Plan for the naming standards to be a living
document.
Translating the Dimensional
Model into a Physical Model
• Apply the naming standards to the tables and
attributes of the dimensional model.
• List table columns with primary keys listed first.
• Label primary keys consistently.
• Identify the format and length of columns.
• Label unique keys with a (#).
• Label column optionality with NULL (o) or NOT
NULL (*) constraints.
• Label foreign keys with _FK.
• Use synonyms for user tables.
Physical Model
Product
# *Product _PK
# *Channel_PK
# *Promotion_PK
n
n
n
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
v(11)
v(125)
v(35)
v(20)
v(25)
v(20)
v(10)
n
v(20)
v(20)
v(10)
v(10)
v(10)
d
d
PRODUCT_ID
PRODUCT_DESC
PRODUCT_NAME
CATEGORY_ID
CATEGORY_DESC
SUPPLIER_ID
PRODUCT_STATUS
LIST_PRICE
CATALOG_ID
PRODCUT_TYPE
PRODUCT_CODE
PROMOTION_CODE
WHSE_LOCATION
VALID_FROM_DATE
VALID_TO_DATE
Defining the Hardware
Transforming the base dimensional data model into
the physical model includes some of the following:
• Defining naming and database standards
• Performing an initial sizing
• Designing tablespaces
• Defining an initial indexing strategy
• Using partitioning to split table and index data into
smaller, more manageable chunks
• Determining where to place database objects on
disk (RAID, striping, disk mapping)
• Using parallel processing
Architectural Requirements
Scalability
Manageability
Availability
Extensibility
Flexibility
Integrated
Accessibility
Reliability
User
Business
Budget
Technology
Architecture Characteristics
•
Robust
•
Available
• VLM (very large
memory)
•
Reliable
• 64-bit
•
Extensible
• Connective
•
Scalable
• Open
•
Supportable
•
Recoverable
•
Parallel
Hardware Requirements
• SMP (Symmetric multiprocessing)
• Cluster and MPP (massively parallel processing)
• Hybrids using SMP and MPP
Evaluation Criteria
Determine the platform for your needs:
SMP
Clusters
MPP
High
Low
Scalability
High
Maturity
Low
Parallel Processing
• Parallel daily operations
• Shared resources
– Memory
– Disk
– Nothing
• Loosely or tightly coupled
Operating system
Hardware
Application
Database
Making the Right Choice
• Requirements differ from operational systems
• Benchmark
– Available from vendors
– Develop your own
– Use realistic queries
• Scalability important
Symmetric Multiprocessing (SMP)
• Communication by shared memory
• Disk controllers accessible to all CPUs
• Proven technology
CPU
CPU
CPU
CPU
Common bus
Shared memory
Shared disks
SMP
CPU CPU CPU CPU
Benefits:
• High concurrency
• Workload balancing
Shared memory
• Moderate scalability
• Easy administration
Limitations:
• Memory (cluster for improvements)
• Bandwidth
Clusters
Node 1
Node 2
Node 3
CPU CPU CPU
CPU CPU CPU
CPU CPU CPU
Shared
memory
Shared
memory
Common high-speed bus
Shared
memory
Common high-speed bus
Shared disks
Clusters
•
•
•
•
•
Shared disk, loosely coupled
Dedicated memory
High-speed bus
Shared resources
SMP node
Massively Parallel Processing (MPP)
CPU
CPU
CPU
CPU
Memory
Memory
Memory
Memory
Disk
Disk
Disk
Disk
MPP nCube Arrangements
•
•
•
•
•
•
•
A shared nothing architecture
Many nodes
Fast access
Exclusive memory on a node
Low cost per node
Scalable
nCUBE configuration
MPP Benefits
•
•
•
•
•
Unlimited incremental growth
Very scalable
Fast access
Low cost per node
Good for DSS
MPP Limitations
•
•
•
•
•
•
Rigid partitioning
Cache consistency
Restricted disk access
High memory cost per node
High management burden
Careful data placement
Architectural Tiers
Tiered structures:
• Modular
• Logical separation
DB server
Distributed structures:
• Two-tier
• Three-tier
• Four-tier (and more)
Apps server Workstations Web server
Internet
Sample System Architecture
Middleware
Technologies for integration
Gateway
Database Server Requirements
•
•
•
•
•
•
•
•
Robust
Available
Reliable
Extensible
Scalable
Supportable
Recoverable
Parallel
Parallelism
•
•
•
•
•
•
•
Database
Query
Load
Index
Sort
Backup
Recovery
Further Considerations
•
•
•
•
•
•
•
Optimization strategy
Partitioning strategy
Summarization strategy
Indexing techniques
Hardware and software scalability
Availability
Administration
Parallel Processing
A large task broken into smaller tasks:
• Concurrent execution
• One or more processors
Elapsed time
Not parallel
Processor 1
Parallel
Processor 1
Processor 2
Processor 3
Processor 4
Parallel Database
• Increased speed
• Improved scalability
• Performance gains
– Availability
– Flexibility
– More users
Parallel
Processor 1
Processor 2
Processor 3
Processor 4
Parallel Query
SQL code split among server processes
Subquery
Subquery
Query
Subquery
Parallel Load
Bypass SQL processing to speed throughput
Jan 98
Feb 98
Order table
Mar 98
Parallel Processing
• Index: reduces the time to create
• Sort: allocates memory in cache efficiently
Parallel Processing
• Backup: runs simultaneously from any node
(online and offline)
• Recovery: runs simultaneously from redo logs
• Summaries: uses the CREATE TABLE AS SELECT
statement