MIT Data Warehouse
Download
Report
Transcript MIT Data Warehouse
Best Practices in Higher Education
Student Data Warehousing Forum
Northwestern University
October 21, 2003
Mary Weisse
Team Leader
MIT Data Warehouse
[email protected]
Warehouse
Overview
Design
Architecture and implementation
Integrity checking and controls
Warehouse Overview
Read only
Integrated reporting
Institute wide
Multiple subject areas
Varied modes of access
Hub for data extraction by other systems
Warehouse Design
Transaction vs reporting design
Star schema
Fact table
Dimensions
No user interface
Star Schema Benefits
Intuitive joins
Limit on dimensions
Reuse of dimension tables in multiple star schemas
Star Schema Example
Dimensions
Fact
Dimensions
Star Schema Example
Dimensions
Fact
Dimensions
Star Schema Example
Dimensions
Fact
Dimensions
No User Interface
All security at the database level
Naming of fields, and tables critical
No place to code around problems, give messages etc.
Design Assumptions
Minimal support & operational costs
Standard (open) interfaces & components
Scaleable / able to evolve over time
Secure
Risks
Run away queries
Poor data quality
Misunderstanding of the data by users may lead to
erroneous reporting results
Security
Machine security
Data encryption
Oracle roles
Access control
Dynamic views
Roles
Roles Web
Architectural Components
DBMS
– Store, Manage, and Control Access to the Data
Metadata
Extract
– Data definitions, load control, data conversion rules
– Data taken from source systems
Transfer
– Data copied to the warehouse server securely
Convert
– Data translated into reporting format & structures
Load
– Data loaded into database & indexes created
Transport
– Data is securely transferred from the db to desktop
Query Tool
– Retrieve data & create export
Data Load Processing
Assumption: Information is better stale than incorrect
Grouping data loads
Error tolerances may vary
Checking status at each stage
Process Files
Cron
1
2
3
4
5
6
7
8
9
10
11
Check file existence
Move to secure directory
Decrypt
Optional pre-conversion processing
Convert
Remove data
Remove indexes
Load data
Optional post-load processing
Restore indexes
Optional post-batch processing
Compute Statistics
Calculate & add fields
Create aggregate tables
12 Archive files
External
Systems
(SAP)
Meta Data
Data
1
6, 7, 8, 9
5
2
10
3
4
Transfers
Encrypted
Decrypted
Converted
Archive
Extraction
Minimize impact on production systems
Minimal data transformation done on source system
Performance
Data transformed in only one place
Incremental control
Extracted by date from last date run successfully
Control files to ensure that extracted data is complete
Integrity Checks
Correct files on hand before job runs
Record & byte counts
Comparisons of control file to data file
Extract file structure is checked against metadata
DBMS constraints enforced
Control of jobs
Cron–scheduling
Error checking system
What jobs should have run?
Did they run successfully?
Data scanned for discrepancies
Mail sent to appropriate staff and users