7-8. Bank and Subscription Businesses

Download Report

Transcript 7-8. Bank and Subscription Businesses

Adv. DBMS & DW
Chapter 7: DW for a large Bank
Hachim Haddouti
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
General Motivation for projects
•
•
•
•
•
•
•
•
•
Forecasting
Comparing performance of units
Monitoring, detecting fraud
Visualization
Conducted survey to see what customers were interested in
new model car
Want to select customers for advertising campaign
Selecting according to demographic parameters
Mining of mobile phone calls with schema ( 7 dimensions,
geographic location can be determined with a precision of 100
m, etc.)
Call Center
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
Bank
•
•
•
•
•
Datawarehouse for a large Bank with objective to build a household DW
where we can track all the accounts owned, to see all the individual account
holders and residential ad commercial groupings
contrast to tangible product businesses (value chain)
portfolio of services: checking/savings accts, loans, cards, etc.
goal: market effectively to households with existing accounts
requirements:
1.
2.
3.
4.
5.
6.
7.
5 years data on each account, by end-of-month snapshot
valid snapshot as of yesterday for the current month
primary balance/account type. group kinds of accounts to compare primary
balances
custom dimension attr, numeric facts/account type
account -> household.
Accounts, owners come & go several
x/year/household
differing records of individual names & addresses over years
demographic information, by individuals and households; behavior
scores/activity
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
Bank
core fact table
• dimensions: account, household, branch, product, status, time (p
110)
• Grain by fact/month; include primary balance, transaction counts,
etc.
• Why separating Account and Household dimensions, although
correlated? Product and Account?
Dirty dimensions
“..a typical bank is doing well if it can find more than 80% of the
actual instances where the same individual has multiple accounts.”
if individual account holder were extracted as a separate
dimension, it would have many duplicates and extraneous
entries.(actually, household is usually equally dirty)
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
Bank
• Semiadditive account balances (over Time dim)
• PD: «
Average period balances in financial data warehouses and in
inventory data warehouses can be calculated by generalizing the SQL AVG
function to instead compute Average Period Sum. Until the DBMS vendors
provide the functionality, Average Period Sum must be computed in the end
user’s application. ».
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
Heterogeneous products
• highly varied nature of financial services -- would makes dimension with
many attributes (p 112), because of heterogeneity of products
• DP: “In data warehouses where a dimension must describe a large number of
heterogeneous items, the recommended technique is to create a core fact table
and a core dimension table in order to allow queries to cross the disparate types,
and to create a custom fact table and a custom dimension table for querying
each individual type in depth.” (Fig 7.3 p 114)
• every core fact table entry is expanded in just one custom fact table entry
(custom fact table as tail of respective records in a core table)
• Why duplicating Primary Balance and Transaction Account in each custom fact
table?
•DP: »The primary core facts should be duplicated in the custom fact tables.
This virtually eliminates the need to access two fact tables in a single query in a
heterogeneous product schema.” This makes sense only if the num o f the core
facts is small.
•Using big-dimension techniques for Household and Account dim
(minidimensions)
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
Chapter 8: Subscription Businesses
• Subscription Businsess: Cable TV supplier
• Issue:
relationship between receipt of money and counting it as income is
complicated ( all pay-in-advance biz such as insurance, publisher, )
• Subscription transactions: Modeling a large metropolitan cable TV supplier
with more than 1M customers.
• Transactions for cable television include:
• pen/change account;
• purchase/upgrade/renew package,
• purchase PPV
• cancel package (with reason)
• cancel ppw (with reason)
• downgrade package (with reason),
• refund purchase (with reason)
• close account (with reason)
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
Chapter 8: Subscription Businesses
•
Marketing e.g. want to fetch no of new subscribers monthly, no of renewers of
packages, canceler or downgrader and why, if the promotion was profitable or
not. Operations wants to see what call load is, call traffic in order to plan
staffing..CEO wants to know revenue each month, slicing and dicing revenue
numbers by customer, package, by promotion, by boradcast time,
average minute of PayPerView
•
Dimensions: transaction entry date/time, effective data/time, customer, sales
rep, product, promotion, transaction (Fig 8.1, p 119)
•
Grain: Every sales transaction
•
only one fact, amount (meaning variable by transaction)
•
date granularity = day
•
customer dimension cleaner than in Bank (household and account issue).
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6
Chapter 8: Subscription Businesses
•
•
Payments in advance
booked as liability, not asset -- must be paid back if service not delivered
(ie, income earned)
•
earned income calculations -- may depend on days in specific month; may
overlap year end, or beginning of database, complicated
upgrade/downgrade transactions
Transaction-grained fact table not practical for calculating earned
income.
PD: “Pay-in-advance business scenarios typically require the combination of a
•
•
transaction-grained fact table as well as a monthly-snapshot-grained f act table in order
to answer questions of transaction frequency and timing as well as questions of earned
income in a given month.. »
•
 monthly snapshot table must be built to store the earned income (batch job
to calculate carefully the month’s earned money for each account)
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6