y* **z - the Phoenix DAMA Chapter!

Download Report

Transcript y* **z - the Phoenix DAMA Chapter!

Strategy for
Data Governance
Replace with your name &
organization
• Las Vegas • February 18, 2008
© Copyright 2012 Your organization
1
Outline

Benefits of a data governance strategy

Components of a data governance strategy

Organization, roles and responsibilities

Impact of a data governance strategy on BI and IT

How to implement a data governance strategy program
© Copyright 2012 Your organization
2
Why you need a data governance
strategy
I would like an
accounting of
the company’s
financial assets
CEO
Uhh … let me see. I think
we still have enough
money in our bank
accounts to cover payroll
this month, and uhh
…I’m not sure if there
are any outstanding
accounts receivables …
Uhh and – hmm … let
me think …
© Copyright 2012 Your organization
CFO
3
Why you need a data governance
strategy
I would like an
accounting of
the company’s
information
assets
CEO
Uhh … let me see. I
don’t really have an
inventory of all the data,
and I’m not sure what
data is in which
database, or how much
of that data is redundant
and inconsistent. I also
can’t vouch for the
quality of the data …
Uhh and – hmm … let
me think …
© Copyright 2012 Your organization
CIO
4
Do these problems exist in your
organization?

Replace with your problems
© Copyright 2012 Your organization
5
Do these problems exist in your
organization?

Room for more problems and issues
© Copyright 2012 Your organization
6
Motivations for Data Governance





SEC audits and risk of losing investors
Risk of fines and incarceration due to inaccurate
regulatory reporting
Risk of losing customers due to poor data quality
Loss of productivity due to excessive and uncontrolled
redundancy
Suboptimal business performance
© Copyright 2012 Your organization
7
Technology Solutions





Enterprise Resource Planning (ERP)
Data Warehousing (DW & BI)
Customer Relationship Management (CRM)
Supply Chain Management (SCM)
…
© Copyright 2012 Your organization
8
Data Warehousing
DW Promises
DW Reality

Data integration


No more uncontrolled data redundancy


Consistency of data content




Improved data quality
Historical enterprise data
Unlimited ad-hoc reporting





Reliable trend analysis reporting
Business intelligence capabilities


Stove-pipe data marts and departmental data
warehouses
Continued redundancy, sometimes even increased
data redundancy
Data is still inconsistent among data marts and data
warehouses (no central staging area, no
reconciliation totals)
Little improvement to data quality
Historical data is limited to departmental views
Limited ad-hoc reporting (too complicated, missing
relationships, poor performance)
Inconsistent trend analysis reports among data marts
BI capabilities compromised by inconsistent and
unreliable key performance indicators (KPI)
© Copyright 2012 Your organization
9
Customer Relationship Management
CRM Promises
CRM Reality


Data integration
Non-redundant customer data




Data quality
Increased customer satisfaction



Product pricing customization


Knowledge of customer wallet share

More stove-pipe systems
Continued redundancy, more departmental views,
purchased packages not integrated
Dirty customer data continues
Decreased customer dissatisfaction because of poorquality customer data
Wrong pricing because of departmental views, still
not cross-organizational
Privacy issues and dirty data led to government
regulations
© Copyright 2012 Your organization
10
The Lesson?
You cannot keep doing
what you have always done
and expect the results to be different.
Not even with
new technology.
“That wouldn’t be logical”
Spock, Star Trek
© Copyright 2012 Your organization
11
Data Governance Defined …
Consultants

“The execution and enforcement of authority over the
management of data assets and the performance of data
functions”
(Robert Seiner)

“The process by which you manage the quality, consistency,
usability, security, and availability of your organization’s data”
(Jane Griffin)

“A process and structure for formally managing information as a
resource. Ensures the appropriate people representing business
processes, data, and technology are involved in the decisions that
affect them; includes an escalation and decision path for identifying
and resolving issues, implementing changes, and communicating
resulting actions”
(Danette McGilvray)
© Copyright 2012 Your organization
12
Data Governance Defined …

Clients
“A framework of accountabilities and processes for making
decisions and monitoring the execution of data management.”
(BMO)

“Resolving data issues using a horizontal perspective of the
organization and focusing on the major “pain points” for our
business areas.”
(Sallie Mae)

“Unites people, process, and technology to change the way data
assets are acquired, managed, maintained, transformed into
information, shared across the company as common knowledge,
and consistently leveraged by the business to improve profitability.”
(Wachovia)
© Copyright 2012 Your organization
13
Data Governance Defined …

Vendors
“The orchestration of people, process, and technology to enable
the leveraging of data as an enterprise asset. It includes policies,
procedures, organization, roles, and responsibilities, with
associated communication and training required to design,
develop, and provide ongoing support for the effort.”
(SAP)

“An organization-wide commitment to data quality, with data
stewardship recognized as an essential business role.
(DataFlux)
© Copyright 2012 Your organization
14
Data Governance Defined …
Other

The execution of authority over the management of data

Data quality – including conformance to valid values, uniqueness,
non-redundant, complete, accurate, understood, timely, referential
integrity

Metadata creation and maintenance – information about data, both
technical and business

Master data management (MDM)

Data integration

Data categorization for performance, availability, and security
© Copyright 2012 Your organization
15
Outline

Benefits of a data governance strategy

Components of a data governance strategy

Organization, roles and responsibilities

Impact of a data governance strategy on BI and IT

How to implement a data governance strategy program
© Copyright 2012 Your organization
16
Components of a DG strategy









Data standardization
Data integration
Data modeling
Data quality
Metadata management
Security and privacy
Performance and measurement
DBMS and product selection
Business intelligence
© Copyright 2012 Your organization
17
Data standardization






Formal data definitions
Business data naming standards
Class words lexicon
Technical data naming standards
Common words lexicon
Data domain standards
© Copyright 2012 Your organization
18
Our Situation with Standardization

Insert your standardization status
© Copyright 2012 Your organization
19
Formal Data Definitions

A data definition must reflect the real-world meaning

A data definition explains the content and meaning of the
unique data element

A data definition must be complete enough to ensure a
thorough understanding of the data element
Example:
Well Depth Feet
Bad definition:
“The depth of the well in feet”
Good definition:
“The total depth of the well in feet from the surface
of the surrounding ground to the deepest point dug
or drilled regardless of the depth of the well casing.”

Data definitions are short and precise (one paragraph)
and (optionally) may contain examples

Data definitions should never contain information about
the source or use of the data elements
Source: The DW Challenge by Michael Brackett
© Copyright 2012 Your organization
20
Data Naming Standards
- Business

The name of an attribute should be derived from its
definition

Attribute names are always fully spelled out

Attribute names should have 3 components:
– Prime word
Example:
– Qualifiers (modifiers) “Checking Account Monthly Average Balance”
– Class word

Attribute names should be fully qualified

Attribute names should always end with an approved class
word

Use only class words from an approved class words lexicon

Attribute name components should be business terms, not
technical terms
© Copyright 2012 Your organization
21
Class Words Lexicon
Approved and Published
Amount . . . Dec 9,2
Balance . . . Dec 13,2
Code . . . Char 1-5
Count . . . Small Int
Indicator . . . Char 1
Name . . . Char 15-40
Number . . . Integer
Percent . . . Dec 5,2
Date . . . Date
Description . . .Vchar
Identifier . . Integer
Quantity . . . Small Int
Rate . . . Dec 6,4
Text . . . Varchar 250
Business Data Domains
© Copyright 2012 Your organization
22
Data Naming Standards
- Technical

The name of a column is composed of abbreviated
attribute name components

Use only abbreviations from an approved common words
lexicon (abbreviations list)

Column name components should always be
abbreviated if an approved abbreviation exists whether
the column name is too long or not
Example:
“CHKG_ACCT_MTHLY_AVG_BAL”

When column names are too long, qualifiers should be
eliminated starting with the least significant qualifier to the
second least significant qualifier, etc.
© Copyright 2012 Your organization
23
Common Words Lexicon
Approved and Published
Account . . . ACCT
Amount . . . AMT
Average . . . AVG
Balance . . . BAL
Checking . . . CHKG
Certificate of Deposit ...CD
Code . . . CDE
Count . . . CNT
Date . . . DTE
Description . . .DESC
Identifier . . . ID
Indicator . . . IND
Monthly . . . MTHLY
Name . . . NM
Number . . . NBR
Percent . . . PCT
Quantity . . . QTY
Rate . . . RTE
Savings . . . SVG
Text . . . TXT
Abbreviations List
© Copyright 2012 Your organization
24
Data Domain Standards

Every attribute (data element) must be atomic

Every attribute must be unique (no synonyms, no
homonyms)

Every attribute identifies or describes only one business
object (entity) in the real world

Every attribute must have business metadata (name,
definition, business rules, owner, source, etc.)

Every attribute must have a predefined data domain

Data domains must be based on EDM data quality rules

Business metadata and data domains are defined and
maintained by business people
© Copyright 2012 Your organization
25
Data Standardization – Best Practices






Provide training in data administration principles
Create formal data definitions
Create fully qualified business data names
Apply the data domain standards
Create and use class words and common words
lexicons
Publish the data standards
© Copyright 2012 Your organization
26
Standardization – What we need to do

Enter your proposed actions
© Copyright 2012 Your organization
27
Data Integration

Look for potential duplicate entities by examining:
–
–
–
Entity definitions
Semantic intent
Entity content

Ensure that each entity has one unique business
identifier

Put one fact (attribute) in one place (entity) using the
normalization rules

Look for potential duplicate attributes by examining:
– Attribute definitions
– Semantic intent
– Domains

Capture real world business actions between entities as
data relationships (not reporting patterns)
© Copyright 2012 Your organization
28
Single Version of The Truth
Customer
Account
Payment
Account
Customer
Method
Product
Order
Product
Part
Product
Existing
Customer
Potential
Customer
Payment
Salesperson
Based on
normalization
rules
Product
Category
Part
Salaried
Salesperson
Org Unit
Supplier
Shipment
Commissioned
Salesperson
Org
Structure
Warehouse
© Copyright 2012 Your organization
29
Unstructured data




Storage and administration
– Enterprise content management systems
(ECMS)
– Check-in and check-out functionality
– Retention and archiving
– Backup and recovery
– Secure objects
Content reusability
Search and delivery
Combining structured and unstructured data
© Copyright 2012 Your organization
30
Data Integration – Best Practices





Determine data integration benefits and costs
Create an inventory of all your data
Use logical data modeling and normalization rules to
find and remove synonyms and homonyms
Use a metadata repository to document the names and
definitions of your business data
Don’t forget to integrate unstructured data with
structured data
© Copyright 2012 Your organization
31
Data Integration – Our Status

Focus on the important data such as customer,
supplier, agents, inventory, parts, loans, or whatever it
is that runs your business. Include examples of where
you are integrated and where not.
© Copyright 2012 Your organization
32
Data Integration – This is what we need
to do

Enter your integration actions
© Copyright 2012 Your organization
33
Data modeling
Logical Data Model
 Business view of data
 Process Independent
 Project-specific model
Business model
Enterprise Data Model
 Business view of data
 Process Independent
 Enterprise-wide model
Enterprise
information
architecture
Physical Data Model
Database model
 Database view of data
 Process Dependent
 Database-specific model
© Copyright 2012 Your organization
34
Data Modeling – Our Situation
© Copyright 2012 Your organization
35
Logical Data Model

Captures what an organization is and
what it does in terms of:
–
–
–
–
–

Business objects (entities)
Business data (attributes)
Business activities (relationships)
Business rules (metadata)
Business policies (metadata)
Not tailored for:
– Query or reporting pattern or tool
– Access or storage requirements
– Performance
© Copyright 2012 Your organization
36
Process Independence

Access path independent

Program independent

Query / report independent

Database independent

Tool independent (OLAP)

Language independent

Platform independent
© Copyright 2012 Your organization
37
Purpose of Logical Data Modeling

Facilitate data integration

Facilitate business analysis

Facilitate communication among business people

Improve productivity through reusability

Focus on data ownership as opposed to system ownership

Bring data quality problems to the surface

Separate process logic from data

Serve as the baseline data architecture for database design
© Copyright 2012 Your organization
38
Enterprise Data Model
“Single Version of the Truth”
Customer
Account
Payment
Payment
Account
Method
o
Integrated 360 business view!
Customer
Product
Order
Product
Part
Product
Existing
Customer
Potential
Customer
Salesperson
Supported by
common
data definitions,
domains, and
business rules.
Product
Category
Part
Salaried
Salesperson
Org Unit
Supplier
Shipment
Commissioned
Salesperson
Org
Structure
Warehouse
© Copyright 2012 Your organization
39
Physical Data Model

Database design based on physical attributes:
–
–
–
–
–
–

Access patterns
Size of tables
Number of business users
Location of business users
Platform (Processor, DBMS)
OLAP tools
Tailored for:
– Query or reporting pattern or tool
– Access and storage requirements
– Performance
© Copyright 2012 Your organization
40
Process Dependent

Access path dependent

Program dependent

Query / report dependent

Database dependent

Tool dependent (OLAP)

Language dependent

Platform dependent
© Copyright 2012 Your organization
41
Purpose of Physical Data Modeling

Facilitate database design

Focus on performance

Architect database structures:
–
–
–
–
–
Tables
Columns
Primary keys
Foreign keys
Referential integrity rules
© Copyright 2012 Your organization
42
Data Modeling – Best Practices




Always create a logical business data model – do not
just focus on database modeling
Sell the importance of creating an enterprise information
architecture (enterprise data model) to management
Assign data modeling responsibilities (the enterprise
data model should not be created by database
designers)
Create a process to link the physical data models
to the enterprise data model
© Copyright 2012 Your organization
43
Data Modeling – This is what we need
to do

Enter your proposed data modeling actions
© Copyright 2012 Your organization
44
Data quality
At what level of
DQ maturity is
your organization?
Program
“abends”
1
Data profiling
Data cleansing
Discovery
by accident
2
Correcting
source data
and programs
Limited
data analysis
3
short
term
Enterprise-wide
DQ methods &
techniques
Addressing
root causes
4
1
2
3
4
5
Uncertainty
Awakening
Enlightenment
Wisdom
Certainty
Proactive
prevention
5
long
term
Continuous
process
improvements
Optimization
(based on CMM)
© Copyright 2012 Your organization
45
Data quality costs
Direct Costs of Non-Quality Information
Marketing
Campaign
Per
Instance
Number
of
Instances
Total
Number
Per Year
Time: ($60/hour loaded rate)
Creating redundant occurrence
Researching correct address
Correcting address errors
Handling complaints from customers
Mail preparation
2.4 min
10 min
0.3 min
5.5 min
0.1 min
167,141
5,000/mo
6,000/mo
974/yr
393,273
1
12
12
1
4
Materials, Facilities, Equipment:
Marketing brochure
Postage
Warehouse storage
Shipping equipment and maintenance
$1.96
$0.52
$0.01
$5,000/yr
393,273
393,273
393,273
36%
4
4
4
1
$0.02/trans
$0.001/mo
$0.005/mo
393,273
393,273
393,273
4
12
12
Computing resources:
CPU transactions
Data storage
Data backup
© Larry English,
Improving DW and BI Quality
Total Annual Costs
Total
Cost
Per Year
$
$
$
$
$
401,138
600,000
21,600
5,357
157,309
$3,083,260
$ 818,008
$ 15,731
$
1,800
$
$
$
31,462
4,719
23,596
$5,163,980
© Copyright 2012 Your organization
46
Data quality costs
Information Development Cost Analysis
Category
Infrastructure Basis:
Enterprise architected DBs
Enterprise reusable
create/update programs +
Total Infrastructure expenses
Value Basis:
Total retrieve equivalent pgms +
Total value-adding expenses
Cost-adding Basis:
Redundant create/update pgms
Interface/extract programs
Redundant database files
Total cost-adding expenses
Lifetime Total **
Portfolio
Total
Number
Relative
Weight
Factor*
Average
Unit
Dev/Maint
Costs
Total
Infrastructure
Total
Value-adding
Dev/Maint Cost-adding
Expenses**
Expenses
200
0.75
$ 15,000
$ 3,000,000
300
1.50
$ 30,000
$ 9,000,000
300
500
400
600
1,500
1.00
1.50
1.00
0.75
© Larry English,
Improving DW and BI Quality
$ 20,000
$ 30,000
$ 20,000
$ 15,000
% of
Budget
Expenses
$12,000,000
24%
$ 6,000,000
12%
$32,000,000
64%
$50,000,000
100%
$ 6,000,000
$15,000,000
$ 8,000,000
$ 9,000,000
3,800
* Determine relative effort to develop average unit of each category using effort to develop a retrieve program as “1.00”
+ For programs that retrieve some data and create/update other data, determine the percent of retrieve only attributes and percent of
create/update attributes (e.g., to retrieve customer data to create an order)
**Based on 3,800 application programs and database files in portfolio and $50 Million in development
© Copyright 2012 Your organization
47
Dummy (default) values
Defaults for mandatory fields
SSN
999-99-9999
Age
999
Zip
99999
Income 9,999,999.99
Inability to determine customer profiles
Inability to determine customer demographics
© Copyright 2012 Your organization
48
“Intelligent” dummy values
Defaults with meaning
SSN
888-88-8888
Income 999,999.99
Age
000
Source Code ‘FF’
Non-resident alien
Employee
Corporate customer
Account closed prior to 1991
Inability to write straight forward queries without
knowing how to filter data
© Copyright 2012 Your organization
49
Missing Values
Operational systems do not always require
informational or demographic data
Gender
Ethnicity
Age
Income
Referring Source
Inability to analyze marketing channels
© Copyright 2012 Your organization
50
Multi-purpose fields
ONE field explicitly has MANY meanings
» Which business unit enters the data
» At what time in history it was entered
» A value in one or more other fields
Appraisal Amount
redefined as
Advertised Amount
25 redefines = 25 attributes !
redefined as
Not mutually exclusive !
Sold Date
Loan Type Code
Only the value of one
is known for each record !
redefined as ...
Inability to judge product profitability
© Copyright 2012 Your organization
51
Cryptic values (1)
Often found in “Kitchen Sink” fields
» Usually one byte (if not one bit)
» Highly cryptic (A, B, C, 1, 2, 3, ...)
» Non-intelligent, non-intuitive codes
» Often not mutually exclusive
Inability to empower end users to write
their own queries
© Copyright 2012 Your organization
52
Cryptic values (2)
ONE field implicitly has MANY meanings
Master_Cd
{A, B, C, D, E, F, G, H, I}
{A, B, C}
{D, E, F}
{G, H, I}
© Copyright 2012 Your organization
Type of customer
Type of supplier
Regional constraints
53
Free-form address lines
Unstructured text
» no discernable pattern
» cannot be parsed
address-line-1:
address-line-2:
address-line-3:
address-line-4:
ROSENTHAL, LEVITZ, A
TTORNEYS
10 MARKET, SAN FRANC
ISCO, CA 95111
Inability to perform market analysis
© Copyright 2012 Your organization
54
Contradicting values
Values in one field are inconsistent with
values in another related field
1488 Flatbush Avenue
New York, NY 75261
Texas Zip
Type of real property:
Single Family Residence
Number of rental units:four
Income property
Inability to make reliable business decisions
© Copyright 2012 Your organization
55
Violation of business rules
Business Rule: Adjustable Rate Mortgages must have
» Maximum Interest Rate ( Ceiling)
» Minimum Interest Rate ( Floor)
Business Rule: A Ceiling is always higher than a Floor
ceiling-interest-rate:
floor-interest-rate:
8.25
14.75
switched ?
Inability to calculate product profitability
© Copyright 2012 Your organization
56
Reused primary key
Little history, if any, stored in operational files
» primary keys are customarily re-used
» may have a different rollup structure
January ‘94:
August ‘97:
branch 501 = San Francisco Main
region 1
area SW
branch 501 = San Luis Obispo
region 2
area SW
Inability to evaluate organizational performance
© Copyright 2012 Your organization
57
Non-unique primary key
Duplicate identification numbers
» Multiple customer numbers
Customer Name
Philip K. Sherman
Philip K. Sherman
Philip K. Sherman
Phone Number
818.357.5166
818.357.7711
818.357.8911
Cust. Number
960601
960105
960003
» Multiple employee numbers
Employee Name
July 1995: Bob Smith
January 1996: Bob Smith
August 1999: Bob Smith
Department
213 (HR)
432 (SRV)
206 (MKT)
Empl. Number
21304762
43218221
20684762
Inability to determine customer relationships
Inability to analyze employee benefits trends
© Copyright 2012 Your organization
58
Missing data relationships
Data that should be related to other data in a
dependent (parent-child) relationship
Branch
Employee
Benefit
» Branch number 0765 does not exist in the
BRANCH table
Inability to produce accurate rollups
© Copyright 2012 Your organization
59
Inappropriate data relationships
Data that is inadvertently related, but should not be
» two entity types with the same key values
Purchaser:
Seller:
Jackie Schmidt
Robert Black
837221
837221
Inability to determine customer or vendor
relationships
© Copyright 2012 Your organization
60
Management Support




Management awareness of importance of data quality
Cost justification of data quality initiative
Ongoing commitment
Finding a business management sponsor
© Copyright 2012 Your organization
61
Triage - Prioritization





Which data to cleanse
Justification for cleansing
Ease of cleansing
Possibility of cleansing
Political support for cleansing
© Copyright 2012 Your organization
62
Cost of Cleansing



Automatic versus manual
– Tools to perform automatic cleansing
– Effort to support use of tools
Use of defaults
Knowledge/experience of those performing manual
cleansing
© Copyright 2012 Your organization
63
Responsibility for Data Quality





“It’s not enough to say that data quality is everyone’s
responsibility.”
Data Quality Administrator
Ongoing commitment
Data ownership responsibility
Operational versus data warehouse responsibility
© Copyright 2012 Your organization
64
Data Quality – Best Practices




Inventory the quality of your data
Sell the importance of data quality to management
Assign data quality responsibility
Triage the cleansing process
© Copyright 2012 Your organization
65
Data Quality – Our Status

Enter all the major problems you have or anticipate with
data quality and don’t limit yourself to one slide.
© Copyright 2012 Your organization
66
Data Quality – What Steps We Should
Take to Improve

Enter all the practical steps you should take and
prioritize them. Don’t limit yourself to one slide.
© Copyright 2012 Your organization
67
Metadata Management
Tables
Columns
Keys (primary/foreign)
Ref. Integrity Rules
Indexes
ETL rules
Process logic
Business Names
Data Definitions
Data Domains
Data Relationships
Business Rules
DQ Rules
Data Integrity Rules
User’s View
Business Metadata
Data Lineage
Data Location
Data Usage
Data Volumes
Load Statistics
Error Statistics
Master
Metadata
Developer’s View
Administrator’s View
Technical Metadata
Usage Metadata
© Copyright 2012 Your organization
68
Metadata is everywhere
Technicians and
Business People
Word
Processing
Files
Business
Analysts
Data
Administrator
Spreadsheets
CASE
Tools
Database
ETL
Administrator Developer
DBMS
Dictionaries
ETL
Tools
Application
Developer
Data Mining
Expert
OLAP
Tools
Data Mining
Tools
Metadata Migration Process
Metadata
Repository
Technician’s
View
Business Person’s View
Business Metadata
Technical Metadata
© Copyright 2012 Your organization
69
Metadata as the Keystone




Single version of the truth
It’s the inventory of information
Tears down dysfunctional information fiefdoms
Opportunities for data standardization
© Copyright 2012 Your organization
70
Management Support for Metadata





IT and the Business
Management understanding of the importance of
metadata
Impact on project schedules
Long term benefit of metadata
Importance for operational and data warehouse
© Copyright 2012 Your organization
71
Which Metadata to Capture




Don’t boil the ocean
What metadata is valuable
Ease and cost of capture
Political issues relating to capture
© Copyright 2012 Your organization
72
Responsibility for Capturing Metadata



Incentive for capturing
Management direction
Automatic and manual
© Copyright 2012 Your organization
73
Responsibility for Maintaining Metadata



Where does Metadata Repository Administration
report?
Why is administration and maintenance important?
Long-term commitment
© Copyright 2012 Your organization
74
How Metadata Is Used

Business
– Understanding the data
– Understanding the meaning of results
– Avoiding incorrect conclusions

IT
– Research
– Impact analysis
– Tool interchange
© Copyright 2012 Your organization
75
Metadata – Best Practices




Determine which metadata to capture and use
Determine how the tools will capture and use metadata
Sell management on the importance of metadata
Assign metadata responsibility
© Copyright 2012 Your organization
76
Metadata – Where are we?

Include anything you have done including a glossary or
business and IT definitions.
© Copyright 2012 Your organization
77
Metadata – What Should We be Doing

As you enter these actions, consider including
responsibility but make sure you have talked to those
people or departments before presenting to
management.
© Copyright 2012 Your organization
78
Security and privacy
A
Workstation
Terminals
Communication
Server
B
C
Remote
Access
G
Database Server
Mainframe
H
D
E
LAN File Server
F
Internet
Access
Legend:
Security exists
No security
Conn.
Path
Mainframe
Security
Package
LAN
Security
Package
PC
Security
Package
Password
Security
Encryption
Function
DBMS
Security
Generic
Security
Package
A
B
C
D
E
F
G
H
© Copyright 2012 Your organization
79
Categorization for Security/Privacy




Does all data have the same security/privacy
requirements?
Who determines security/privacy requirements of data?
What are the regulatory requirements for security and
privacy?
Does your organization have a Security Office? What
authority do they have?
© Copyright 2012 Your organization
80
Responsibility For Data Security





Security Office
Internal auditors
Data Owners
Responsibility for administering
Testing security and privacy
© Copyright 2012 Your organization
81
Mechanism For Establishing Security
Procedures



Security requirements
– Internal
– Regulatory
Tools that implement security
Communicating security requirements to those who
implement
© Copyright 2012 Your organization
82
Security Audit





Validating procedures
Validating training
Testing and probing
Recommending mitigation
Frequency of audits
© Copyright 2012 Your organization
83
Regulatory Issues





Health Care – HIPPA
Finance
Brokerage - SEC
Insurance
Media – FCC
© Copyright 2012 Your organization
84
Security & Privacy – Best Practices





Raise the consciousness of security and privacy
requirements
Connect with your Security Office
Determine security capabilities of tools
Assign responsibilities
Test and validate
© Copyright 2012 Your organization
85
Security & Privacy – What exposures
do we have?

Hopefully you have talked to your Security Officer and
anyone else who is responsible for the security of data.
© Copyright 2012 Your organization
86
Security & Privacy – What Steps do we
Need to Take

Be sure to clear these actions with those responsible for
security and privacy.
© Copyright 2012 Your organization
87
Performance






Benchmarking
Capacity planning
Designing (optimal schemas)
Coding (efficient SQL calls)
Monitoring and measuring
Tuning
– Database structures
– DBMS parameters and OS
– Communication links
– Hardware
© Copyright 2012 Your organization
88
Categorization for Performance




How good does response time need to be?
How does it differ from application to application?
What is the cost-benefit of excellent response time?
Were performance considerations included in the
architecture?
© Copyright 2012 Your organization
89
Categorization for Availability





Scheduled hours (24 X 7, 18 X 6,…)
Availability during scheduled hours
How does it differ from system to system?
Is excellent availability cost justified?
Was availability included in the architecture?
© Copyright 2012 Your organization
90
Capacity Planning







Database size
Number of users
Number of transactions
Number of queries/reports
Time and day of usage
Complexity of transactions/queries/reports
Proactive response to capacity increase
© Copyright 2012 Your organization
91
Monitoring/Measuring





Response time
Resource utilization (CPU, disk access, network)
Who is using the system
When is the system being used
Chargebacks
© Copyright 2012 Your organization
92
Service Level Agreements








Response time
Availability
– Schedule hours (hours/day, days/week)
– Availability during scheduled hours
Timeliness of data
Response to problems
Response to new requests
Who establishes agreements?
What’s realistic?
Incentives to meet SLAs
© Copyright 2012 Your organization
93
Reporting performance

IT
– Who needs to take action
– Who needs to see reports/alerts

Business
– Matching project agreements
– Expectations
© Copyright 2012 Your organization
94
Tuning



Awareness of problems – measurement tools and
responsibilities
Tuning capability of platform, RDBMS, tools
Responsibility for tuning
© Copyright 2012 Your organization
95
Measurement Tools




Performance
Usage
Resource utilization
Network
© Copyright 2012 Your organization
96
Performance & Measurement – Best
Practices




Determine what is advantageous to measure
Assign responsibilities
Designate tools for measurement
Report metrics to management
© Copyright 2012 Your organization
97
DBMS/Product Selection
Industrial-strength
Enterprise Server
Mid-range
Workgroup Server
Desktop Remote Client
© Copyright 2012 Your organization
98
Relational DBMS



Which RDBMS is the standard
Relation to platform
What applications is it being used for
© Copyright 2012 Your organization
99
Why standardize the RDBMS?






Minimize the number of RDBMSs
Less training required
More leverage on RDBMS vendor
Flexible assignments
Fewer interface problems
Fewer interface programs
© Copyright 2012 Your organization
100
Relation to platform



RDBMS performance impacted by platform
Platform may dictate (or strongly recommend)
RDBMS choice
Which decision comes first?
Desktop Remote Client
© Copyright 2012 Your organization
Mid-range
Workgroup
Server
Industrial-strength
Enterprise Server
101
How DBMS is being used


Operational/OLTP
Data Warehouse/Business Intelligence
OM
ODS
EDW
Operational Systems
DM
DW Databases
© Copyright 2012 Your organization
102
Tools/Utilities






Platform dependent
DBMS dependent
Expensive
33% on the shelf
Lots of product duplication
Necessary?
© Copyright 2012 Your organization
103
Standards for Products




Who sets standards?
Are the standards known?
Are they standards or guidelines?
Who can give dispensation?
© Copyright 2012 Your organization
104
Criteria for Selection



Need
Cost
Vendor
– Support
– Reputation
– Financial stability
© Copyright 2012 Your organization
105
Responsibility for Selection



Technical evaluators
Strategic architect
Management
© Copyright 2012 Your organization
106
Single Vendor vs Best of Breed

Single vendor
– Possibly a better relationship
– Leverage
– Not always the best products
– Products should all work together

Best-of-breed
– Need to integrate yourself
– Finger pointing when problems
– Potential incompatibilities
© Copyright 2012 Your organization
107
Deals/Negotiations



Have someone else negotiate
Don’t let vendor know you have chosen them before
you negotiate
www.dobetterdeals.com (Joe Auer – ComputerWorld)
© Copyright 2012 Your organization
108
Relationship with Vendors





Partnerships
Money Issues
Support
Conferences
Being a reference
© Copyright 2012 Your organization
109
Databases Required by the Application
Packages




Packages do not support all DBMSs
Packages do not support all DBMSs equally well
Does preferred DBMS violate database standard
Are support personnel (DBAs) available?
© Copyright 2012 Your organization
110
Impact of Package



Machine Requirements
Performance
Availability
© Copyright 2012 Your organization
111
DBMS/Product Selection – Best
Practices




Determine real requirements
Establish software standards
Make use of existing software whenever possible
Talk to organizations who are using the products
© Copyright 2012 Your organization
112
Business intelligence (BI)
Source: TDWI
trend

 




metric


same store sales
customer retention
new customers
charge cards issued
30 day past-due accounts
60 day past-due accounts
90 day past-due accounts
merchandise return rate
inventory turnover rate
Financial
Performance
Meters
actual
target variance
$108.0m $120.0m
- 10%
96%
95%
+0.9%
3.8k
5.0k
-24.0%
8.5k
12.0k
-33.3%
500
400
+2.0%
regulatory
warning
Daily Sales
market
opportunity
Market
Growth
… provides
decision makers
a 360o view of
their business
compliance
violation
Alerts
Trends
© Copyright 2012 Your organization
Forecasts
113
Goals and Objectives




Why have a data warehouse?
Have goals and objectives been identified?
Have they been communicated?
Are they measured post-implementation?
© Copyright 2012 Your organization
114
Architecture



Platform
Tools/products
How the data flows
© Copyright 2012 Your organization
115
DW and BI Tools






RDBMS
Data Modeling
ETL
Access and Analysis
Data quality (Cleansing)
Measurement
© Copyright 2012 Your organization
116
Data Mining


Data farming
Data mining

Verification of assumptions

Discovery of the unknown

Results based on known
data relationships

Inferred results from data
found in database

Deductive method

Inductive method

Yields information that can
be proven to be factual

Yields information that is
assumed to be true for some
probability
© Copyright 2012 Your organization
117
Data Sources for Data Mining
Operational databases
DW databases
Orders
Shipments
E
T
L
Enterprise
Data Warehouse
Account
Master
Customer DM
Billing
Sales DM
Data Mining
Databases
Data Mining Applications
© Copyright 2012 Your organization
118
Spiral BI/DW Methodologies
Business
Goals
Assessment
& Strategy
Project
Plan
Business
Opportunity
Post-Impl.
Review
Data
Requirement
BI/DW
Applications
Business
Analysis
Data
Inventory
Application
Design
Implementation
Testing
Development
© Copyright 2012 Your organization
119
Software Release Concept
“Extreme scoping”
Projects
First
Release
- Larissa Moss
Second
Release
Final
Release
“feels like
prototyping”
BI Application
Reusable &
Expanding
Third
Release
Fifth
Release
Fourth
Release
“Refactoring”
- Kent Beck
Project =/ Application
© Copyright 2012 Your organization
120
Using the Software Release Approach






Unstable requirements can be tested and enhanced in
small increments
Scope is very small and manageable
Technology infrastructure can be tested and proven
Data volumes (per release) are relatively small
Project schedules are easier to estimate because the
scope is very small
Development activities can be iteratively refined, honed,
and adapted
Mistakes are less expensive to fix
early in the development process!
© Copyright 2012 Your organization
121
Using the Software Release Approach






Unstable requirements can be tested and enhanced in
small increments
Scope is very small and manageable
Technology infrastructure can be tested and proven
Data volumes (per release) are relatively small
Project schedules are easier to estimate because the
scope is very small
Development activities can be iteratively refined, honed,
and adapted
And the quality of the release deliverables (and ultimately
the quality of the BI applications) will be higher!
And the development process will get faster and faster!
© Copyright 2012 Your organization
122
Software Release Guidelines






Deliver every three to six months
(first release will take longer)
Strictly control the scope and
keep it very small
Keep expectations realistic
First
Release
Second
Release
Final
Release
BI Application
Third
Release
Fifth
Release
Fourth
Release
The enterprise infrastructure must be robust
(technical and non-technical)
Metadata must be an integral part of each release;
otherwise, the releases will not be manageable
Designs, programs, and tools must be flexible
© Copyright 2012 Your organization
123
Iterative BI Application Development
Release 6
Release 1
Business
Case
Assessment
Release
Implementatn
Release
Implementatn
Planning
Post-Impl.
Review
Meta
Data
Reposit
ory
Testing
Application
Prototyping
Meta Data
Repository
Development
ETL
Design
ETL
Design
Business
Case
Assessment
Meta
Data
Reposit
ory
Testing
Requiremts
& Application
Prototyping
Meta Data
Repository
Analysis
Application
Development
Release
Implementatn
Meta
Data
Repository
Design
Data Mining
BI
Application
Requiremts
& Data
Analysis
ETL
Testing
Release 5
ETL
Design
Data
Analysis
Planning
Application
Prototyping
Application
Prototyping
Data
Analysis
Post-Impl.
Review
Application
Testing
Meta Data
Repository
Analysis
ETL
Design
Meta Data
Repository
Design
Data Mining
Requiremts
& Application
Prototyping
ETL
Development
ETL
Development
Release
Implementatn
Meta
Data
Reposit
ory
Testing
Application
Testing
Meta Data
Repository
Analysis Application
Development
Application
Prototyping
Requiremts
& Data
Analysis
ETL
Testing
Requiremts
& Application
Prototyping
Application
Development
Meta Data
Repository
Development
Planning
Post-Impl.
Review
Requiremts
& Data
Analysis
ETL
Testing
Application
Testing
Business
Case
Assessment
Business
Case
Assessment
Planning
Post-Impl.
Review
Requiremts
& Data
Analysis
ETL
Testing
Application
Testing
Meta
Data
Reposit
ory
Testing
Requiremts
& Application
Prototyping
Meta Data
Repository
Analysis
Application
Development
Meta Data
Repository
Development
Release 2
Application
Prototyping
Application
Prototyping
ETL
Design
ETL
Development
Meta Data
Repository
Development
Application
Prototyping
Application
Prototyping
ETL
Design
ETL
Design
ETL
Development
ETL
Design
Release
Meta Data
Implementatn
Repository
Design
Data Mining
Business
Case
Assessment
Requiremts
& Data
Analysis
ETL
Testing
Application
Testing
Meta
Data
Reposi
tory
Testing
Application
Testing
Meta Data
Repository
Development
Application
Prototyping
ETL
Design
ETL
Development
ETL
Design
Data Mining
Data
Analysis
Meta Data
Repository
Design
Data
Analysis
Planning
Post-Impl.
Review
Meta Data
RepositoryApplication
Analysis Development
Application
Prototyping
Business
Case
Assessment
Requiremts
& Data
Analysis
ETL
Testing
Requiremts
& Application
Prototyping
Application
Development
Meta Data
Repository
Development
Release
Implementatn
Planning
Post-Impl.
Review
Data
Analysis
Meta Data
Repository
Design
Data Mining
Meta
Data
Reposit
ory
Testing
Requiremts
& Application
Prototyping
Meta Data
Repository
Analysis
Application
Prototyping
Application
Prototyping
ETL
Design
ETL
Development
ETL
Design
Data Mining
Meta Data
Repository
Design
Release 3
Data
Analysis
Release 4
© Copyright 2012 Your organization
124
Business Intelligence – Best Practices






Set goals and objectives
Set expectations early and often
Establish cost justification
Find a terrific sponsor
Use a spiral methodologies
Deliver often with software releases
© Copyright 2012 Your organization
125
BI & DW – How well are we doing?

Include applications, departments, number of users,
usage, user satisfaction, ROI, management
perception,…
© Copyright 2012 Your organization
126
DW & BI – What are we going to do to
make our DW and BI Sing?

This might include training, selling to management and
end users, new BI tools, new organizational
responsibilities,…
© Copyright 2012 Your organization
127
Outline

Benefits of a data governance strategy

Components of a data governance strategy

Organization, roles and responsibilities

Impact of a data governance strategy on BI and IT

How to implement a data governance strategy program
© Copyright 2012 Your organization
128
Organization, roles and responsibilities









Data owner
Data steward
Data strategist
Strategic architect
Database administrator/designer
Data administrator (EIM)
Metadata administrator (EIM)
Data quality analyst (EIM)
Security officer
© Copyright 2012 Your organization
129
Data owner




Assigned to business people
(often data originators)
Typically hold a senior position
(directors or managers)
Have authority to set policies and dictate
business rules and security for the data
Are accountable to the information consumers
in the organization
© Copyright 2012 Your organization
130
Data steward





Should be assigned to business people,
but could be performed by senior business
analysts from IT
Must know the industry and the organization very well
(often people with seniority)
Requires an enterprise-wide understanding of the data
and the business rules
Have authority to communicate and enforce policies,
business rules, and security for the data
Mediate data disputes among business people and
facilitate resolutions
© Copyright 2012 Your organization
131
Data strategist

Understands the strategic business goals

Knows the government regulations and
governmental reporting requirements

Understands the DBMS platforms and operating
systems

Knows the internal application databases
(operational and BI)

Is aware of future data demands and data volumes

Creates and maintains the data governance
strategy
© Copyright 2012 Your organization
132
Strategic architect

Develops the overall architecture for both
operational and BI environments to include:
–
–
–
–


Software
Utilities
Tools
Interfaces
Determines if the BI/DW environment will be one-tier or
multi-tier and what the platform components should be
Participates in architecting databases and data flows
© Copyright 2012 Your organization
133
Database administrator/designer



Understands user requirements and how
databases are accessed and updated
Knows different database design techniques
(relational, multi-dimensional) and when to apply them
Is responsible for the physical aspects of application
databases:
–
–
–
–
–

Logical and physical database design
Partitioning and indexing
Dataset placement
Performance and tuning (databases and SQL)
Backup and recovery
Maintains the application databases
© Copyright 2012 Your organization
134
Data administrator

Knows the industry and the business processes

Understands the data and the business rules that
are used by those processes

Has expertise in E/R modeling and knows the
normalization rules

Standardizes and integrates the data (logically)
through the enterprise information architecture

Creates and enforces data naming standards

Collects and maintains business metadata:
– Data names (fully spelled out business names)
– Data definitions and metrics definitions
– Business rules (data rules and process rules)
© Copyright 2012 Your organization
135
Metadata administrator





Knows industry metadata standards
Understands DW databases and ETL architectures
Builds and maintains a metadata repository or
administers a purchased MDR product
Selects and installs metadata integration and access
tools
Integrates and loads metadata from various BI and
developer tools (Data Modeling, Data Profiling, DBMS,
ETL, OLAP)
© Copyright 2012 Your organization
136
Data quality analyst







Knows the internal application databases and
how to extract data from them
Is familiar with data profiling and data cleansing tools
Understands the user requirements, the business
processes, and the business rules
Audits operational source data to find and report
violations of business rules and other DQ problems
Participates in writing data cleansing specs
Identifies root causes for dirty data
Facilitates negotiations between data originators and
information consumers about DQ improvements
© Copyright 2012 Your organization
137
Security officer




Knows the governmental security and privacy
regulations (HIPAA)
Understands the business requirements for securing
the data
Understands security features and capabilities of the
application components (DBMS, BI tools, Web portals)
Ensures that appropriate security settings are placed on:
–
–
–
–
Databases
BI tools
Developer tools
Web portals
© Copyright 2012 Your organization
138
Organization – Do we have the right
roles and responsibilities?

Include and responsibilities that overlap and identify any
gaps where some roles are not be filled.
© Copyright 2012 Your organization
139
Organization – What should we be
considering?

Be careful here. You are likely to step on toes. Be sure
to vet any proposed changes with the appropriate
management.
© Copyright 2012 Your organization
140
Outline

Benefits of a data governance strategy

Components of a data governance strategy

Organization, roles and responsibilities

Impact of a data governance strategy on BI and IT

How to implement a data governance strategy program
© Copyright 2012 Your organization
141
Impact of a data governance strategy
on BI and IT












Better and faster decisions
Increased analyst productivity
Employee empowerment
Cost containment
RELIABLE
INFORMATION
Cash flow acceleration
Revenue enhancement
Fraud reduction
Demand chain management
Better customer service
Lower customer attrition
Better relationships with suppliers and customers
Public relations and reputation
© Copyright 2012 Your organization
142
Gain Control






Consistent security implementation
Understand, define and assign ownership
Understand, define and assign stewardship
Minimize redundancy
Inventory data
Develop consistent terminology
© Copyright 2012 Your organization
143
Support the IT Strategy




Provide departments, projects and personnel with
guidelines for storing and accessing data
Minimize the number of RDBMSs
Establish, disseminate and maintain standards for
shared data resources
Deliver a high level of service
–
–
–
–
Performance
Availability
Response time
Responsiveness to user requests
© Copyright 2012 Your organization
144
Outline

Benefits of a data governance strategy

Components of a data governance strategy

Organization, roles and responsibilities

Impact of a data governance strategy on BI and IT

How to implement a data governance strategy
© Copyright 2012 Your organization
145
Incremental Data Governance Strategy
Implementation





Don’t get into the details too soon
Don’t be seen as a theorist -- your actions must be
pragmatic
Don’t lead with long-term deliverables
Don’t commit more than you can deliver
Avoid unproven technology
© Copyright 2012 Your organization
146
Steps to Implement a Data Governance
Strategy






Conduct a data environment assessment
Establish a target data environment
Develop an implementation plan
Sell data governance strategy within the organization
Evaluate progress and justify your existence
Revisit the plan
© Copyright 2012 Your organization
147
Summary


Pitch the importance of a data governance strategy to
your CIO or CTO
Ask to either lead the effort or to be a permanent
member of the team
© Copyright 2012 Your organization
148
Thank you
ISBN 0-201-61635-1
ISBN 0-321-24099-5
ISBN 0-201-78420-3
ISBN 0-201-76033-9
Larissa Moss
Sid Adelman
Method Focus, Inc.
[email protected]
Sid Adelman & Associates
[email protected]
© Copyright 2012 Your organization
149