Data Administration

Download Report

Transcript Data Administration

Data Administration
Data Warehouse Environment (DWE)
Implementation
8/19/04
EIS
Original
Plan for
DWE
Ad Hoc and
Operational
Reports
Tell me what
happened?
Tell me everything I need to
know and what is important,
but do it quickly and easily!
Ad Hoc Query Repository





Copy of Source Data
Operational
Daily Updates
All Elements
Minimum Number of Years of
Data
WebFOCUS
(Reporting)
 Operational
 Enterprise
Source Data
IDMS
Oracle
Flat files
Metadata
OLAP Server
Tell me what
happened
and why?
Data Staging Area




Extract Data
Transform Data
Quality Assurance
Create Metadata
External Data
Census Data, Benchmark,
Salary Surveys, Economic
Data
Data Warehouse
SAS Data Mining
Server
 Cleansed
 Subset of Detail Data
 Subset of Summary
Data
 Multiple Years of Data
 Periodic Updates
 Strategic
Tell me what may
happen, or what is
interesting?
Data Mart #2
Resource Management
Legend:
1) Wide black border indicates physical servers.
2) Narrow black border indicates no decision on
if it will be a separate physical server.
3) Gray background indicates BI or analytical
software servers.
Data Mart #1
Course Management
 Subset of DW
 Summarized in specific
manner
 Tactical
Give me information
to help me achieve
specific goals!
8/11/2004
DWE Terms
•
Source Data: Operational data from internal systems, such as IDMS (FES,
FRS, HRS, SIS), Oracle, etc.
•
External Data: Data from systems external to the University, such as economic
and census data collected by the government.
•
Data Staging Area: Storage and processing area for data extracted from the
internal and external systems prior to loading into the Warehouse, Data Marts or
Ad Hoc Query Repository. Some of the data will remain un-cleansed and an
exact replica of the data in the online systems, for subsequent loading into the
Ad Hoc Query Repository. Other data will be cleansed and transformed before
being moved to the Data Warehouse and Data Marts for analysis. Some data
will be located in multiple places and in multiple forms and aggregations. (Also
known as an ETL or Extract, Transformation and Load server.)
•
Metadata: A term used for data that describes or specifies other data. It is used
to define all of the characteristics of data required to build databases and
applications, and to support knowledge workers and information producers.
This includes data element name, meaning, format, domain values, business
integrity rules, relationships, owner, etc.
DWE Terms
•
Ad Hoc Query Repository: A collection of enterprise data from multiple sources,
used to do ad hoc and operational reporting where the need to use the most current
and un-standardized source data is a requirement. The Repository will typically
contain only one or two years of the most recent data, unless regulatory or statutory
requirements dictate otherwise. (Also known as an Operational Data Store or
ODS.)
•
Data Warehouse: An enterprise-wide, cross-functional, cross-organizational
database typically comprised of data extracted, cleansed and/or summarized from
multiple online transaction processing systems, and other stores of data (Purdue
University; Stanford University). It is designed for query and analysis, typically
contains historical data, and is used to present information to support decisionmaking, tactical and strategic business processes. A data warehouse tends to start
from an analysis of what data already exists and how it can be collected in such a
way that the data can later be used. In general, a data warehouse tends to be a
strategic, but somewhat unfinished concept; a data mart tends to be tactical
and aimed at meeting an immediate need. (Improving Data Warehouse and
Business Information Quality, Larry P. English, 1999.)
DWE Terms
• Data Mart: A subset of enterprise data from the Data
Warehouse that is summarized and stored in an optimal fashion
for analysis and presentation of information to support trend
analysis and tactical decisions and processes. Data Marts are
typically designed based on an analysis of user needs to
answer specific questions in the pursuit of specific goals.
The scope can be that of a complete data subject such as
Student, or of a particular business area or line of business,
such as Enrollment. (Improving Data Warehouse and Business
Information Quality, Larry P. English, 1999.)
• Enterprise Reporting: A category of software technology that
enables the development, organization, sharing, execution,
delivery and scheduling of reports via a web platform.
DW Terms (Continued)
•
On-Line Analytical Processing (OLAP): A category of software technology that
enables analysts, managers and executives to gain insight into data through fast,
consistent, interactive access to a wide variety of possible views of information that
has been transformed from raw data to reflect the real dimensionality of the
enterprise as understood by the user. OLAP helps the user synthesize enterprise
information through comparative, personalized viewing, as well as through analysis of
historical and projected data in various "what-if" data model scenarios. This is
achieved through use of an OLAP Server.
(http://www.moulton.com/olap/olap.glossary.html) Functionality includes multidimensional analysis, slicing, drill-down and rotation.
•
Data Mining: A class of database applications that look for hidden patterns in a
group of data. For example, data mining software can help retail companies find
customers with common interests. The term is commonly misused to describe
software that presents data in new ways. True data mining software doesn't just
change the presentation, but actually discovers previously unknown relationships
among the data. (http://www.webopedia.com/TERM/d/data_mining.html)
DW Terms (Continued)
• Executive Information System (EIS): An application
developed to provide senior management direct access to
information relevant to an organization’s goals and
performance, such as a dashboard. These applications are
developed to gather, analyze and integrate internal and
external data to provide management with insight into key
performance indicators, potential problems, and changes in
the environment. Typical features include extensive use of
graphics, simple navigational controls, automatic replacement
of report contents, drill-down analysis, trend analysis
capabilities, exception reporting or alerts, graphical charts
with links to underlying reports, provision of data from multiple
sources, and the highlighting of information an executive feels
is critical. (The Data Warehouse Lifecycle Toolkit, Ralph
Kimball, et al.)
Components of a Decision Support System
What is a
Decision Support System
EIS
High Level Summarized Data
For Top Executives
(“Pre-programmed DASHBOARD”)
Data Mart
Data Warehouse
Operational Data Store
Addresses Specific
Subject Area
Collection Of Integrated Subject
Oriented Databases
(Historical)
Time-Current, Integrated
Databases
(Tactical-Power Users)
Covansys
EIS
Original
Plan for
DWE
Ad Hoc and
Operational
Reports
Tell me what
happened?
Tell me everything I need to
know and what is important,
but do it quickly and easily!
Ad Hoc Query Repository





Copy of Source Data
Operational
Daily Updates
All Elements
Minimum Number of Years of
Data
WebFOCUS
(Reporting)
 Operational
 Enterprise
Source Data
IDMS
Oracle
Flat files
Metadata
OLAP Server
Tell me what
happened
and why?
Data Staging Area




Extract Data
Transform Data
Quality Assurance
Create Metadata
External Data
Census Data, Benchmark,
Salary Surveys, Economic
Data
Data Warehouse
SAS Data Mining
Server
 Cleansed
 Subset of Detail Data
 Subset of Summary
Data
 Multiple Years of Data
 Periodic Updates
 Strategic
Tell me what may
happen, or what is
interesting?
Data Mart #2
Resource Management
Legend:
1) Wide black border indicates physical servers.
2) Narrow black border indicates no decision on
if it will be a separate physical server.
3) Gray background indicates BI or analytical
software servers.
Data Mart #1
Course Management
 Subset of DW
 Summarized in specific
manner
 Tactical
Give me information
to help me achieve
specific goals!
8/11/2004
Current
DWE
Ad Hoc and
Operational
Reports
Tell me what
happened?
Ad Hoc Query Repository





Copy of Source Data
Operational
Daily Updates
All Elements
Minimum Number of Years of
Data
WebFOCUS
(Reporting)
 Operational
 Enterprise
Source Data
IDMS
Oracle
Flat files
Metadata
Tell me what
happened
and why?
Data Staging Area




Extract Data
Transform Data
Quality Assurance
Create Metadata
External Data
Census Data, Benchmark,
Salary Surveys, Economic
Data
Legend:
1) Wide black border indicates physical servers.
2) Narrow border indicates no decision on if it
will be a separate physical server.
3) Red border indicates under development.
4) Gray background indicates BI or analytical
software servers.
Data Warehouse
 Cleansed
 Subset of Detail Data
 Subset of Summary
Data
 Multiple Years of Data
 Periodic Updates
 Strategic
Data Mart #1
Course Management
 Subset of DW
 Summarized in specific
manner
 Tactical
SAS Data Mining
Server
Tell me what may
happen, or what is
interesting?
Give me information
to help me achieve
specific goals!
8/17/2004
DWE Current Resources
–
Query Repository Production: PowerEdge 6650, 4 2.8GHz CPU, 4GB
RAM, 1.2TB storage, Windows Server 2003 Development: PowerEdge
2650, 1 3.0GHz CPU, 2GB RAM, 252GB storage, Windows Server 2003
Software: Oracle Enterprise
–
ETL Production: Dell PowerEdge 6650, 4 2.0GHz CPU, 2TB storage,
Windows 2000 Advanced Server Development: Dell PowerEdge 6650, 2
2.0GHz CPU, 1TB storage, Windows 2000 Advanced Server Software:
Informatica PowerCenter
–
Enterprise Reporting Production: PowerEdge 2650, 2 2.8GHz CPU,
4GB RAM, 291GB storage, Windows 2003 Server Standard
Development: PowerEdge 2550, 2 1.27GHz CPU, 1GB RAM, 220GB
storage, Windows 2000 Server Software: WebFOCUS
–
Statistical Analysis: Dell PowerEdge 2550, 2 1.4 GHZ CPU, 4GB RAM,
144GB storage, Windows 2000 Software: SAS Enterprise Miner,
Enterprise Guide, etc.
DWE Tasks
–
–
–
–
DBA (1-2 FTE) – Design Oracle DB, write/run
ETL jobs and production support (i.e. monitor
system and DB performance, enforce security,
schedule backups, etc.)
Data Administration (2-3 FTE) – User interface,
develop requirements document for all DW
projects and new views, evaluate data quality,
develop specialized reports, test, train users and
coordinate projects
Reporting (1-2 FTE) - Develop enterprise
reports
All – Infrastructure design (with Systems staff),
and tool evaluation (ETL, OLAP and desktop
reporting) with help from the C/S group.
Implementation Strategy - Educate Users
• Basics – “What is a Data Warehouse?” Create a
“single-source-of-truth.” “What it’s not!” (It is not all the
data, with daily updates and online storage.)
• Change in culture – “Let’s make better decisions based
on objective analysis of data.”
• Set realistic expectations - No silver bullet. It can help
you make better decisions, but you still have to be
responsible for implementing those decisions.
• Focus on institutional goals – “What is it we need to
achieve? What metrics do we need to evaluate our
progress in attaining goals?”
• Importance of business sponsors – Make timely
business decisions and support requests for necessary
resources.
Implementation Strategy Requirements
• Develop DWE in a phased approach.
• Develop detailed requirements
documents with users and institutional
administrators for applications within the
DWE (DW/DM and reports).
Course Management (I.V.C.)
Business Functions and Goals

Optimize course offerings to meet student need.
Improvement Opportunities




Increase number of high demand courses/sections
Increase maximum enrollment in sections
Eliminate or reduce frequency of low demand courses
Improve course meeting patterns and delivery mode
Performance Measures
 # and % decrease of students who do not get
any section of the course requested
 # and % decrease of low demand courses
 # and % increase in enrollment
 % usage of classroom capacity
 % decrease in length of time to graduate
 # and % increase in courses taught through
preferred mode
Business Questions






What are the characteristics of high/low demand courses?
What characteristics of the student are related to demand?
What courses can be eliminated?
Which courses should/can be moved to smaller/larger facilities?
What impact does the meeting time and location have on demand?
What improvements can be made with/without additional money?
Data Model
College Budgets
Degree Reqs.
Student
Defines
Facilities
Course Demand
Courses
Available Faculty
Enrollment
Economic Data
(American Management Systems, Inc.)
Data Mart/
Warehouse
Implementation Strategy –
Data Quality
•
Focus on improving data quality, and
establishing standards for data view and
element names and data content.
Implementation Strategy –
Enterprise Reports
•
Gather user input on most important
reports required by many users, and
develop these reports with an enterprise
reporting tool that allows us to deliver
pre-defined parameter-driven reports via
the web.
2001-2002: Infrastructure and Planning
1.
Create IDMS data dump to Oracle
2.
Implement WebFOCUS
3.
Purchase data mining tools and server for IR
4.
Create views for Query Repository (ad hoc
reporting repository)
5.
Establish enterprise standards for key data –
Analysis and recommendations are ongoing
6.
Identify and prioritize data mart development –
Course Management Data Mart top priority for Data Stewards
2001-2002: Infrastructure and
Planning (Continued)
7)
Initiate GASB – Phase I
8)
Initiate data quality projects
8)
Review Desktop Reporting Tools – Ongoing
review and testing of:
•
Brio
•
Crystal Reports
•
SAS
•
WebFOCUS
2002-2003: Data Mart Development, etc.
1.
Complete GASB – Phase I
2.
Implement SAS data mining server
3.
Conduct data quality projects – vendor, facilities, FRS, TA
data
4.
Select and Purchase ETL Tool
5.
Begin requirements on Course Management DM
6.
Define standards for data view and element names
2003-2004: DWE Upgrades and User Support
1)
Implement ETL tool
2)
Upgrade database servers
3)
Create Metadata application – “Data about data”
4)
Conduct SAS data mining project on freshmen
data
5)
Provide user and technical training on reporting
tools, support listservs and web page
6)
Purchase enterprise reporting tool and develop
reports
2003-2004: DWE Upgrades and User Support
7)
Create new data views with standardized
names
8)
Complete GASB - Phase II
9)
Continue development of the Course
Management DM requirements
10) Initiate development of the requirements
for the Resource Management DM
2004-2005: SAP, etc.
1) Complete standardization of remaining data
views
2) Create additional enterprise reports
3) Evaluate SAP Business Warehouse (BW)
4) Conduct extensive data quality analysis for
SAP
Reporting Web Site and Metadata
1)
2)
3)
4)
5)
Reporting URL: https://reporting.uky.edu/
Metadata URL: http://iweb.uky.edu/RptDataDesc/
Metadata directions URL:
http://www.uky.edu/IS/DataAdmin/DOCS/metadata/
MetadataDirections.pdf
Data element standards URL:
http://www.uky.edu/IS/DataAdmin/DOCS/ware/IUUN
0020-QRVE/QRVENamingStds/DataElementNamingStds.pdf
Data Administration URL:
http://www.uky.edu/IT/DataAdmin/
Naming Standards
All data view names start with “V_”.
All standard element names are comprised of words:
1)
2)
–
–
–
3)
4)
Prime (required) – describes the subject area of the data
(i.e. account, student, department, course, etc.),
Qualifier (optional) – further defines and distinguishes the
“prime” and “class” words (i.e. gender, ethnic, first, etc.),
Class (required) – describes the major classifications or
types of data (i.e. name, date, code, amount, etc.).
Standard Name: “Prime”_”Qualifier”_”Class”; standard
abbreviations
View - V_POSTN; Element - POSTN_BEG_DT
Current Query Repository Data
1)
UKFRS_FOC and UKHRS_FOC: to be used by WebFOCUS.
2)
3)
UKFRS_SYB: will be removed within 3-4 months.
GASB: non-standard views used by OC in producing
institutional financial statements.
UKFRS_RPT, UKHRS_RPT, UKSIS_RPT and UKSIS_FAMSBR:
standardized views will be created over the next couple
months, and old views will be removed in 90 days after new
views are available. Purchasing views in UKFRS_RPT are in
development. UKHRS_RPT also contains standard Labor
Distribution views.
UKHRS_STAT_RPT: HRS Stat File standard views currently in
development and being tested.
4)
5)
DWE/SAP Issues
1. How does the SAP Business Warehouse
functionality compare to what we originally
planned for the DWE?
2. Will the SAP BW replace our Data
Warehouse/Marts?
3. Should we continue our plans for the historical
legacy data in the DWE, and use the SAP BW for
data “from this point forward”?
4. Can/how do we “merge/join” historical data with
the new data in SAP?
5. What are our options to “interface” the SAP BW
with our DWE (API, etc.)?
6. Should the SAP BW feed our DWE or vice versa?
DWE/SAP Issues
(Continued)
7. How much (years of data) should we load into the
SAP OLTP system?
8. How much (years of data) should we load directly
to the SAP BW?
9. What level of detail data should be loaded into the
SAP BW, if the corresponding data is not available
in the OLTP system?
10. Should we continue with the “data mart” concept
within the SAP environment?
11. How easy is it to add new functionality to the SAP
BW (data, reports, “cubes”, etc.)?
Data Administration
QUESTIONS?