Data Warehouse System - Center
Download
Report
Transcript Data Warehouse System - Center
SQL 2005 Customer Evidence
Contact: Ram Ramanathan
SQL Server 2005 TB+
Customers 5 TB Retail Data Warehouse on HP Superdome
Data Warehousing w/Uses
Relational
Query
SQL Server OLAP, SSIS, SSRS
5.3 TB Credit Card DW, 5 Mil card holders
300 users, OLAP, SSIS, SSRS, Office BSM
4 TB Consumer Packaged Goods (CPG) DW
350 users, complex relational query, SSIS & SSRS
2 TB DW, originally migrated from Informix
300 users, complex query, OLAP & Data Mining
2 TB of Clinical DW, 50 TB storage, 90 OLAP Marts
US Dept of Veterans Affairs, 1200 facilities
2 TB Healthcare DW, HMC is part of Wellpoint Inc.
Complex ETL, 800 SSIS packages, OLAP, RS
SQL Server 2005 TB+
Customers 3 TB document management system, Sybase move
General VLDB & DW/BI Workloads
State of Alaska Dept of Revenue, Permanent Fund
1 TB custom document management system (VS.Net)
State of Washington Archive Department, 600 users
1 TB Real-time store reporting system (SSRS)
Data loaded continually (MSMQ) and daily (SSIS)
2.8 TB SAP Business Warehouse system
In the process of migrating SAP R/3 OLTP to SS2005
1.5 TB BASEL II Compliance DW, complex SSIS
Nationwide Building Society, UK
1 TB Staging Data Warehouse at Telecom Denmark
CUBUS is massive BI system with up to 10,000 users
SQL Server 2005 TB+
Customers 1 TB gaming data warehouse, 24 casino
Data Warehousing w/Heavy
Relational
Query
daily loads, relational query
1 TB data warehouse for CRM support
Runs core systems SQL Server, BASELL II on SS2005
6 TB retail data warehouse from 700 stores, OLAP, RS
15 TB total across multiple instances
2 TB normalized DW, 250 concurrent BI users
Nightly transaction loads: 900 TPS sustained
5 TB retail DW, 10 TB storage, data from 1000 stores
Largest retailer in African continent, OLAP, RS, DTS
3 TB relational DW, DB2 migration
Heavy Complex Relational Query
SQL Server 2000 TB+
Customers 20 TB European digital payment archive system
General VLDB & DW/BI Workloads
Unisys Payments Services Archive
13 TB Geospatial DW, to reach 25 TB
4 TB data mart, 2500 field offices, 55 TB storage
Multiple 1 TB instances, credit scoring analysis
Uses complex OLAP & SQL Server ETL
Used by 13,000 agents in 220 offices
Built portal with Visual Studio .NET, OLAP
700 restaurants, 30,000 users, 700+ concurrent
OLTP/LOB reporting on inventory & sales data
Reporting on 1.6 TB of data, 35K+ Tx/Min
325 concurrent connections
SQL Server 2000 TB+
Customers 20 TB European digital payment archive system
General VLDB & DW/BI Workloads
Unisys Payments Services Archive
13 TB Geospatial DW, to reach 25 TB
4 TB data mart, 2500 field offices, 55 TB storage
1.5 TB DW, plus multiple OLAP data marts
Data feeds from JD Edwards ERP system
1 TB DW, Informix migration
Heavy concurrent usage on OLAP
1.7 TB DW, 2900 DW users ramping up to 5000
Runs on HP Superdome
6 TB Call Detail Record (CDR) DW, OLAP usage
20-way HP Superdome
SQL Server 2000 TB+
Customers 15 TB credit card DW, 11 mil card holders
Private NDA References
30 TB total storage, several TB+ instances
1 TB Retail DW, relational query + complex OLAP
Acorn profitability analysis Package
1 TB relational instance with Complex OLAP
Loss Prevention DW for store management
Major RS operation on entire reporting tier
Runs on 8 dedicated reporting servers
Real-estate services, up to 10K report runs/day
Mortgage closing: 3000 checks (Reports) /day
Monitoring strategic ventures
Visual Studio .NET & .NET Framework
How Do You Fit 22 Million
Books Into 1 Data Warehouse?
“SQL Server 2005 gives us the performance we need at a price
that is just far superior to anything else we’ve seen.”
Chris Troia, Chief Information Officer, Barnes & Noble
World’s largest
bookseller
821 bookstores, 7.3
million retail items
Needed improved
business intelligence
for merchandising and
inventory planning
4.5 TB end-to-end data
warehouse running on
SQL Server 2005 64-bit
Storing 3 years of
transaction data; will
grow to 5 years
Insight enables better
out-of-stock predictions
Faster access to
information
Deeper view into key
performance indicators
and trends
Better decisions for
greater profitability
Improved customer
experience
Barnes & Noble Retail DW
Customer & Project Profile
Scenario/Business
Largest bookseller in the USA, 850 stores, plus online
Live on SS2005!
40K employees, 1.5 mil titles across all stores, .75 mil at www.bn.com
Retail Data Warehouse
Broad use of SQL Server BI, DW & cubes started with SQL Server 2000
Moved relational DW to SQL Server 2005 with June CTP now
First went live with SQL Server Integration Services in 2004 on Oct CTP
Loading Data from mostly Oracle sources
Workload/Architecture
2 TB sales & inventory data warehouse on SQL Server 2000
Currently holds 3 years of transactions, will build up to 5 years
SSIS using Slowly Changing Dimension transform, lookups, and configurations
Will test Data Mining on IDW13 for a DM project
Out-of-stock prediction using Decision Trees, potential for enhanced sales of 150K/week
OLAP is planned to move pre-RTM:
OLAP Subjects: Store sales, Store inventory, Distribution Center inventory
Key dimensions: Store, Item, Time, Demographic, Buyer, Vendor, Customer
DB Platform
Windows Server 2003 64-bit
HP Superdome w/20 CPU partition for DW, 20 TB EMC SAN
Barnes & Noble Retail DW
DW Architecture
Live on SS2005!
Oracle
SRS
database
POS transactions
Dimension
Updates
Inserts / Updates
Staging
database
IS
IS
Relational
data
warehouse
Inventory deltas
Oracle
IMM
database
Sales
Store
inventory
Proclarity
interactive
reports
Distr.
Center
inventory
Reporting
Services
reports
Strategy
First Premier Bankcard DW
Technical Profile
Scenario/Business
Specialized high-risk credit card issuer, 10th largest in the USA
5 million card holder portfolio, 25K average new bookings per month
Runs 100% of business on SQL Server
Great cost savings and business agility
Looking at SS2005 for entire BI stack: SSIS, Partitioning, SSRS & SSAS
Workload/Architecture
4.2 TB single instance for customer data
Growth of approximately 300 GB per month, using DTS for data loading
Planning to upgrade this to SQL Server 2005
Ad hoc relational query workload
30-100 concurrent in-flight queries, 40 concurrent users at a time
Analysis Services OLAP usage through Proclarity & Outlooksoft
25-30 cubes, back-end DW is star-schema/snow-flake
Broad adoption of Reporting Services across the company
500+ unique reports, 1500+ report users
Core 1.5 TB OLTP LOB system runs the business
2500 customer reps connected all day, 200+ TPS frequently
Planning to upgrade this to SQL Server 2005
DB Platform
Windows Server 2003, SQL Server 2000 64-bit
HP 64-bit 12-way, 8-way relational, 4-way OLAP, 27 TB EMC SAN
First Premier Bankcard DW
Data Flow
Data is imported on a
nightly basis from the
different data sources and
then standardized and
validated through the
Extract, Transform and
Load (ETL) process. After
the data has been
validated, it is loaded into
the data warehouse. It is
then available to the
Reporting and Analytic
Departments for data
analysis and development
of reports.
First Premier Bankcard DW
Data Flows
At the end of
every month,
FDR transfers a
cardholder
master database
containing
months of
transactions
conducted by
customers of PBI
to the PBI
network, where
the information
is downloaded to
the data
warehouse. PBI
uses SQL Server
2000 Reporting
Services to
create
standardized
company
reports.
First Premier Bankcard DW
Change Control
The PBI data
warehouse
uses a 64-bit
application
with 16 GB of
memory. The
diagram
layout
emphasizes
the testing
and
development
environment
more than
the 64-bit
technology.
CROSSMARK Data Warehouse
System Architecture
Live on SS2005!
Presentation
Layer
Microsoft
Excel
Adobe
Acrobat
HTML
HTTP
SQL Server 2005 Reporting Services
Report Portal Framework
Data
Sources
Transformation
Layer
Data
Processing
Layer
OLE DB
Staging
Weekly
Transactions
Volumetrics
ODS
52-week History
Reports
Report & Portal
Information
SQL Server 2005
Integration Services
AC Nielson
Volumetric Data
(CSV)
TDLinx
SalesTrak
ACNielsen
ICE
ACNielsen
Volumetrics
CROSSMARK Data Warehouse
SS2005 Solution Profile
Scenario/Business
Live on SS2005!
Consumer Packaged Goods (CPG) services provider and data processor
16,000 employees, one of the largest CPG services companies in the USA
Employs in-store teams across the country
Over 4 million store visits per year providing various inventory, stock and efficiency services
CROSSMARK retains top CPG manufacturers as customers
Processes data on 60K UPC codes from sales in over 30K stores
Workload/Architecture
4 TB of data in a single SQL Server instance
25 TB of raw storage
Uses Table Partitioning feature to increase manageability and reduce complexity
Ad hoc complex query workload through
350 users (85 full-time power users)
SQL Server Integration Services for Data loads
Data imported from 4 sources including, biweekly AC Nielsen data loads
Permitted expansion of DW from 13 weeks to 1 yr of history
SQL Server Reporting Services
DB Platform
Windows Server 2003, separate servers for ETL, Rel DW and Reporting
HP ProLiant DL585 w/4 dual-core 2.4 GHz AMD 64-bit Opteron & 16 GB of RAM
HP Storageworks
Talbots Retail Data
Warehouse
Technical Profile
Scenario/Business
Leading retailer & E-tailer with global catalog operation
1000+ stores in USA, Canada & UK
WW catalog & online operation reaching 46 mil customers in 140 countries
Uses SQL Server as strategic database along with IBM mainframe
Key e-commerce and catalog order systems on SQL Server
SQL Server 2005: Retail Data Warehouse
Established for business agility in retail operation
Assess store operation efficiency
Workload/Architecture
1 TB of data, using SQL Server 2005 Table Partitioning feature
36-month of transaction history, partitioned using SQL Server by month
Dual trickle plus batch data loading with SSIS
Continuous real-time trickle data, approx 1 million unaudited sale transactions/day
SSIS batch loading of audited sales data, 1 million rows in 3 hour batch-window
2 million rows total added to the database every day
Extensive usage of SS2005 Reporting Services
Parametrized reports from 50+ users hit the database all day
DB Platform
Windows Server 2003, HP 64-bit Itanium
Clalit Health Patient DW
SS2005 Solution Profile
Scenario/Business
Largest HMO in Israel, 2nd largest WW, covers 60% of pop (3.7 mil)
14 hospitals, 1400 clinics, 6,000 doctors, 500 pharmacies, 32K employees
IBM Informix/Sun 10000 Solaris migration in 2002
67% TCO savings, 25% perf improvements: 15x in some OLAP queries
Also runs main clinic OLTP system on SQL Server
Migrating 1000 local SQL Servers in clinics to central DB, 2 TB expected
Workload/Architecture
5 TB of total data, multiple servers & instances
2 TB in DW SQL Server instance
Ad hoc query workload (Business Objects), 20 concurrent in-flight often
300 users (100 full-time) users, 10K queries/month
OLAP cubes using Microsoft Analysis Services
3000 OLAP users, 45K queries/month ISV front-end app: Procalrity
ETL: Ascential Software’s Datastage ETL from S/390 mainframes
Data Mining: Patient Treatment Cost Prediction
Working with local DM specialist partner G-Stat
DB Platform
Windows Server 2003, Unisys 12-way, 64-bit, EMC SAN
Clalit Health Patient DW
Before & After Informix Migration
Before:
Legacy
Systems
SUN
6000
Informix
Source
Files
Staging
DWH
SUN
10000
Informix
Production
DWH
Unisys
SQL
Server
Files
for
OLAP
OLAP
Replicated
DWH
for OLAP
After:
Proclarity
Partitioned Unisys 64 bit
Legacy
Systems
Unisys
32-bit
SQL
Server
Unisys
64-bit
SQL
Server
Unisys
64-bit
SQL
Server
Business Objects
Staging
DWH
Production OLAP
DWH
8 CPU
32 GB
1.5 TB
4 CPU
16 GB
Proclarity
Clalit Health DW
Subject Areas
Health Services
& Expenses:
Human
Resources:
• Manpower
• Payroll
• Outside contractors
Customer Data:
•
•
•
•
•
Demographic data
Call Center
Mailing
Churn analysis
Complaints
Logistics:
• Supplies
• Purchases
• Specialized
medical
equipment
•
•
•
•
•
•
•
•
Medical services
Laboratory
Hospital services
Medication
Specialists
Medical imaging
Chronic illnesses
and more…
Integrative
Universes:
• Clinic budget
• Medical KPI’s
• Medical services
US Veteran’s Health Admin CDW
Technical Profile
Scenario/Business
United States Department of Veterans Affairs, Veteran’s Health
Administration
172 Hospitals, 1260+ total points of care (including clinics & nursing homes)
5 million patients, $26 Billion budget, 200K employees
VHA Corporate Data Warehouse
Uses end-to-end Microsoft SQL Server and BI technologies
Currently in the midst of 50 TB project to add new relational ODS and EDW back-ends
Workload/Architecture
4 TB largest instance, 17 TB storage
Heavy SQL Server Analysis, Reporting and Transformation Services
100’s of cubes and subject areas
Main Data Marts: Lab (53 tests), Radiology, Pharmacy, Prosthetics
Outpatient: (Appointments, Encounters, Primary Care Panels)
Inpatient: (Movement, Discharges)
Other: Non VA Care, Human Resources, Financial Accounting
Planned: Nursing, Dental, Purchasing, Health Data Repository
Presentation Tier: ProClarity Analytics & Dashboard Servers, MS MapPoint
DB Platform
MS Windows Server 2003 Enterprise Edition
4 x HP RX5670 64-bit 4-way Itianium-2 Servers, 32 GB RAM, HP EVA SAN
VHA Data Warehousing Framework
Closed Loop Information System
Source
Systems
VistA
HDR
Metadata
Repository
Conformed
Dimensions
OP
Data
Consultants
VHAc
ADR
ABC
G/L
VHAaf
DoD
CDC
CMS
Extract, Transform, Load
Diabetes
Common
Query, Reporting,
Analysis, and
Data Mining
Tools
Wait
Times
Data
Warehouse
Other
Research
Data Marts
1
Acquire
Data
VISN
Warehouses
2
Prog Office
Data Marts
PBM
Value Added Data
Populate
Warehouse
3
Create
Marts
Program Offices
•Pharmacy Benefits
•Prosthetics
•Dental
4
Access
Information
VHAc – VHA clinical systems
VHAaf – VHA administrative & financial systems
VHA Dimensional Map
Clinical
Demographic
Financial
Geographic
DimALBProductionUnit
DimICD9
DimPatient
ICD9ID
int
ICD9Txt
ICD9Prefix
ICD9Name
ICD9Category
ICD9Class
ICD9SubClass
varchar(9)
varchar(5)
varchar(50)
varchar(50)
varchar(100)
varchar(100)
DimDRG
DRGID
smallint
DRGName
DRGCategory
DRGClass
DRGType
ValidYear
varchar(70)
varchar(70)
varchar(7)
char(3)
int
DimCPT
CPTID
varchar(5)
CPTName
CPTCategory
CPTClass
Inactive
WorkRVU
PrExpRVU
MalPrRVU
TotalRVU
varchar(30)
varchar(75)
varchar(40)
varchar(8)
numeric(10,2)
numeric(10,2)
numeric(10,2)
numeric(10,2)
DimARCTrtType
TrtCodeID
varchar(5)
TrtCodeName
varchar(50)
DimARCDxClass
DXClassID
smallint
DXClassName
DXClassARC
varchar(50)
nvarchar(2)
ProdUnitID
ScrNumID
char(2)
int
ScrSSN
PtSSN
ICN
DoB
DoD
DoDSource
Gender
Race
EnrZip
HmFIPS
PreferredVISN
PreferedSta3n
PreferredSta6a
CFYVERAClass
PFYVERAClass
Eligibility
EnrPriority
EnrStatus
PCProvider
Diabetes
Hypertensive
varchar(9)
varchar(9)
char(10)
smalldatetime
smalldatetime
varchar(10)
varchar(1)
varchar(25)
int
int
int
int
varchar(6)
char(3)
char(3)
char(3)
varchar(3)
varchar(50)
char(10)
char(1)
char(1)
DimAge
AgeID
smallint
AgeGrp2
AgeGrp4
AgeGrp10
AgeGrp14
AgeGrpARC
varchar(15)
varchar(15)
varchar(15)
varchar(15)
varchar(15)
DimARCEligibility
EligibilityID
char(2)
EligibilityName
EligibilityCategory
varchar(100)
varchar(100)
PUName
PUCategory
PUClass
varchar(125)
varchar(50)
varchar(50)
DimCostCtr
CostCtrID
smallint
CCName
CCCategory
CCClass
varchar(100)
varchar(50)
varchar(50)
Hospital Location
DimVISN
DimStop
VISNID
StopID
VISNTxt
VISNName
City
State
Zip
FIPS
FTE
Budget
smallint
StopName
Category
Class
Type
char(3)
varchar(50)
varchar(75)
char(2)
int
int
smallint
money
smallint
Sta3nName
VISN
City
State
Zip
FIPS
MCG
FTE
varchar(100)
smallint
varchar(50)
varchar(2)
int
int
smallint
smallint
DimALBAcct
AccountID
smallint
AcctDesc
varchar(50)
DimBOC
BOCID
smallint
BOCName
BOCCategory
BOCClass
varchar(50)
varchar(50)
varchar(50)
DimARCIncome
IncomeID
smallint
IncomeGroup
varchar(50)
DimDate
smallint
DateID
int
varchar(50)
varchar(50)
varchar(50)
char(10)
MDate
FYr
FQtr
CMth
CDay
FP
DoW
DoM
smalldatetime
char(4)
char(6)
char(5)
varchar(30)
smallint
char(3)
smallint
DimBedsection
DimFacility3n
Sta3nID
Time
BedsectionID
smallint
BedsectionName
Category
Class
Type
varchar(50)
varchar(50)
varchar(50)
char(10)
DimARCTrtLoc
TrtLocID
char(3)
TrtLocName
TrtLocCategory
varchar(100)
varchar(100)
Standardized
DimFacility6a
Sta6aID
varchar(6)
Sta6aName
Sta6aType
Sta3n
VISN
VASTID
Sta6aCity
Sta6aState
Sta6aZip
Sta6aFIPS
AdminParentCode
AdminParentCity
AdminParentState
AdminMCG
varchar(100)
varchar(50)
smallint
smallint
smallint
varchar(100)
char(2)
int
int
int
varchar(100)
char(2)
smallint
Type/Size
conventions
Naming conventions
Verified
“Gold” standard
Business rules
Optimized
DimZip
DimFIPS
FIPSID
int
ZipID
int
State
County
VISN
VISNCARES
CARESMkt
VetPop
char(2)
varchar(30)
smallint
smallint
varchar(100)
int
State
City
char(2)
varchar(100)
Primary keys
Indexed
Refreshed
VHA CDW Hardware Architecture
Gigabit Ethernet
Yukon R&D
Cube R&D
Power User
Cube Dev
Delimited
&
Fixed Length
Text Files
Cube Dev
FTP, Import,
Staging
Cube Prod
Master
Database
Web Server
General User
Report Services
VISN
Data
Warehouse
(2x) 32-Bit Server
(4x) 32-Bit Server
Executive User
(4x) 64-Bit Server
(4x) 32-Bit R&D Server
VHA Corporate Data Warehouse System
(Primary Function: ETL Import and Export, Data Warehouse Storage)
HP ProCurve 3400cl Gb Switch
Database
Subsystem
ETL
Subsystem
Streams
ETL2 - DL385 - x64DC - 4x
ETL3 - DL385 - x64DC - 4x
DB1 - RX7620 - IA64 - 8x
Batches
Clustered
ETL4 - DL585 - x64DC - 8x
M5314A Enclosure - 14x 142GB - 15k RPM
M5314A Enclosure - 14x 142GB - 15k RPM
ETL4 - DL385 - x64DC - 4x
Large
Batches
(HP EVA 8000 - 24TB)
M5314A Enclosure - 14x 142GB - 15k RPM
ETL1 - DL385 - x64DC - 4x
Nightly
Updates
Storage
Subsystem
M5314A Enclosure - 14x 142GB - 15k RPM
M5314A Enclosure - 14x 142GB - 15k RPM
ETL5 - DL585 - x64DC - 8x
AAC Firewall
DB2 - RX7620 - IA64 - 8x
M5314A Enclosure - 14x 142GB - 15k RPM
HSV210 FC Controller
Dual Core
SAN Switch
(2Gb FC)
HSV210 FC Controller
M5314A Enclosure - 14x 142GB - 15k RPM
Future Expansion
Future Expansion
M5314A Enclosure - 14x 142GB - 15k RPM
M5314A Enclosure - 14x 142GB - 15k RPM
•Discussed many scenarios
and obtained many
configurations.
•64-bit end-to-end
M5314A Enclosure - 14x 142GB - 15k RPM
MSL6060 - 2
M5314A Enclosure - 14x 142GB - 15k RPM
M5314A Enclosure - 14x 142GB - 15k RPM
42U Cabinet
42U Cabinet
42U Cabinet
Server Legend:
ETL Server Type 1: HP DL385, AMD Opteron Dual Core, 2 cores/4 processors, 16GB, (6) 72GB Drives, (2) Gb NICs, no SAN HBA
ETL Server Type 2: HP DL585, AMD Opteron Dual Core, 4 cores/8 processors, 16GB, (4) 72GB Drives, (2) Gb NICs, (2) SAN HBA’s
DB Server Type 1: HP RX7620, Intel Itanium-2, 8 processors, 32GB, (4) 72GB Drives, (2) Gb NICs, (2) SAN HBA’s, Clustered
64-bit: Windows, SSIS, SSDE
64-bit: Windows, SSIS, SSDE
64-bit: Windows, SSDE
SSIS - SQL Server Integration Services
SSDE - SQL Server Database Engine