Title Slide No more than 2 lines
Download
Report
Transcript Title Slide No more than 2 lines
Database Design:
Solving Problems
Before they Start!
Ed Pollack
Database Administrator
CommerceHub
Agenda
Preventing future headaches, right now!
•
•
•
•
•
•
•
2
Why does database design matter to us?
Discussion of tables.
Examples of good design practices.
Important decisions when designing a database.
Notes on important database design topics.
Normalization
Conclusion
Why Design Matters
•
•
•
•
•
•
3
Match business needs with implementation.
Ensure acceptable performance.
Scalability. Needs of tomorrow != Needs of today.
Good design = easy to understand.
Decrease maintenance costs.
Avoid scorn of future DBAs & developers.
Understand the Application
•
•
•
•
•
4
Good design = knowing the business need.
Talk to consumers of data.
Ask lots of questions!
May be technical or non-technical.
What is the database used for?
Scalability
•
•
•
•
•
5
Design for today AND tomorrow.
Understand cardinality.
Determine contention and concurrency.
Build overhead to support the near future.
Always re-evaluate and ensure scalability for the future.
Hardware Considerations
•
•
•
•
•
•
6
CPU (speed, cores, MDOP, etc…)
Memory
Storage (attached, SAN, SSD, cloud, etc…)
Network bandwidth
On-premises vs. cloud
File & backup locations
Designing Good Tables
• One table = a distinct entity.
• A column represents a unique element.
• Table can be one of three basic types:
• Entity
• Lookup
• Relationship (linking)
• Schemas can classify name spaces/security/classification.
• Avoid “double-duty” tables!!!
7
Data Types
• Choose the correct data type.
• Choose the correct data length/precision/size.
• Consider standards when unsure of representation
• Ie: ISO5218 for gender, ISO4217 for currency, ISO3166 for country
• Bit = 1 or 0. Do not use INT, string, etc…
• Consider mathematical operations for numeric types.
8
Dates & Times
• Be consistent across all tables!
• Do (or will) time zones matter?
• If so, use DATETIMEOFFSET or UTC time.
• Store in correct sized data type.
• Smalldatetime vs. Datetime2 vs. Date vs. Time
• Do not use strings for dates, times, timestamps, etc...
• Differentiate between a date/time and duration.
9
NULL vs. NOT NULL
• NULL = non-existence.
• Do not make up “De-nullifiers”.
• -1, 1/1/1900, ‘N/A’, ‘ ‘, ‘00:00’
•
•
•
•
10
NOT NULL = required by the application
NULL has meaning. Define this during design phase.
Beware NULL behavior. ORDER BY, GROUP BY, =, <>…\
Always explicitly name ALL objects!
Naming Objects
• What is it? Don’t name for location, time, usage, etc…
• A data element name should be unique!
• Column names should be positive, not negative
• Ie: Is_Not_Active vs. Is_Active
• Avoid abbreviations/shorthand. Don’t fear long names.
• Avoid spaces and special characters.
• DOCUMENTATION!!!
11
Uniqueness
All rows in a table should be unique.
PRIMARY KEY determines row identity/uniqueness.
Natural key vs. surrogate key.
Clustered key doesn’t have to be the primary key, but
often will be.
• Unique constraints can enforce additional important
business rules.
•
•
•
•
12
Other Schema Elements
• Stored Procedures: Use for efficient/controlled data
access.
• Indexes: Speed up important read operations.
• Functions*: Encapsulate often-used logic.
• Triggers*: Enforce auditing, ETL, or business rules.
• Constraints*: Enforce important relationships/rules.
• Views: Provide a convenient/common path to data.
• ***Always ensure elements are needed before creation!
13
Iteration vs. Set-Based Operations
• SQL Server is built/optimized set-based transactions.
• Use of loops, such as CURSOR, WHILE, GOTO are often
indicative of poor/suboptimal design.
• Consider alternatives to avoid row-by-row work:
• OUTPUT
• Window functions
• CTE
14
OLAP vs. OLTP
• The needs of transactional processing are different than
the needs of analytics processing.
• Separate reporting from transactional processing, or
build infrastructure to support a shared environment.
• Denormalize data for OLAP, not for OLTP.
• Consider different tools for each, if needed.
15
Old Data
•
•
•
•
Do we delete, inactivate, partition, archive, etc…?
Is archiving allowed/desired/required?
How to handle data associated with archived data?
Data retention policy
• Compliance needs.
• Contractual obligations.
• Prevent data from getting too large
16
Normalization
• Normalization reduces data redundancy by separation
into tables.
• Avoids inconsistency via duplicated/missing/bad data.
• Consider 1:1 vs. 1:many vs. many:many vs. optional.
• DEMO: Normal forms.
17
Formatting
Databases store and return data
Formatting/presentation is best left to the application.
Formatting can complicate scripts.
Making output pretty can lead to poor performing
queries.
• Be consistent across all objects!
•
•
•
•
18
Conclusion
•
•
•
•
•
19
Good database design saves immense time & effort.
Generates database schema that is easy to understand.
Allows for easier upgrades, releases, migrations.
Poor design = technical debt!
Future DBAs and developers will see your initials and
love you!
Questions???
Contact Info & Links
Ed Pollack
[email protected]
@EdwardPollack
SQL Shack
SQL Server Central
SQL Saturday Albany (2016)
Thank you!!!