Data Administration
Download
Report
Transcript Data Administration
Data Administration
Data Warehouse Implementation
5/26/04
DW Current Resources
•
Servers –
•
IBM RS6000 S7A with 4 engines, 4GB memory, 200GB disk
space, running AIX 4.3.0
Dell PowerEdge 2550 with two 1.4 GHZ processors, 4GB of
RAM, four 36GB disk drives and running Windows 2000
server operating system.
Data – 100GB disk space used to house FRS, FES, HRS and
SIS Oracle extracts. This is essentially a “data dump” of the
IDMS data, and additional “views” still need to be created.
•
Staff –
DBA - Henry 80%, Song 20%, Chen 10%
DA - Kisil 70%, Cheesman 50%, Claunch 50%, Carter
100%
DW Tasks
DBA – Extract data, design Oracle DB, load data, and
production support (i.e. monitor system and DB performance,
enforce security, schedule backups, etc.)
Data Administration – User interface, develop requirements
document for all DW projects, evaluate data quality, create
DB views, develop specialized reports, test, train users, and
coordinate projects.
Both – Infrastructure design (with Systems staff), and tool
evaluation (ETL, OLAP and desktop reporting) with help from
the C/S group.
EIS
Current
DWE
Ad Hoc and
Operational
Reports
Tell me what
happened?
Tell me everything I need to
know and what is important,
but do it quickly and easily!
Ad Hoc Query Repository
Copy of Source Data
Operational
Daily Updates
All Elements
Minimum Number of Years of
Data
WebFOCUS
(Reporting)
Operational
Enterprise
Source Data
IDMS
Oracle
Flat files
Metadata
Tell me what
happened
and why?
Data Staging Area
Extract Data
Transform Data
Quality Assurance
Create Metadata
External Data
Census Data, Benchmark,
Salary Surveys, Economic
Data
Legend:
1) Wide black border indicates physical servers.
2) Narrow border indicates no decision on if it
will be a separate physical server.
3) Red border indicates under development.
4) Gray background indicates BI or analytical
software servers.
Data Warehouse
Cleansed
Subset of Detail Data
Subset of Summary
Data
Multiple Years of Data
Periodic Updates
Strategic
Data Mart #1
Course Management
Subset of DW
Summarized in specific
manner
Tactical
SAS Data Mining
Server
Tell me what may
happen, or what is
interesting?
Give me information
to help me achieve
specific goals!
5/26/2004
EIS
Original
Plan for
DWE
Ad Hoc and
Operational
Reports
Tell me what
happened?
Tell me everything I need to
know and what is important,
but do it quickly and easily!
Ad Hoc Query Repository
Copy of Source Data
Operational
Daily Updates
All Elements
Minimum Number of Years of
Data
WebFOCUS
(Reporting)
Operational
Enterprise
Source Data
IDMS
Oracle
Flat files
Metadata
OLAP Server
Tell me what
happened
and why?
Data Staging Area
Extract Data
Transform Data
Quality Assurance
Create Metadata
External Data
Census Data, Benchmark,
Salary Surveys, Economic
Data
Data Warehouse
SAS Data Mining
Server
Cleansed
Subset of Detail Data
Subset of Summary
Data
Multiple Years of Data
Periodic Updates
Strategic
Tell me what may
happen, or what is
interesting?
Data Mart #2
Resource Management
Legend:
1) Wide black border indicates physical servers.
2) Narrow black border indicates no decision on
if it will be a separate physical server.
3) Gray background indicates BI or analytical
software servers.
Data Mart #1
Course Management
Subset of DW
Summarized in specific
manner
Tactical
Give me information
to help me achieve
specific goals!
5/26/2004
DW Terms
•
Source Data: Operational data from internal systems, such as IDMS (FES,
FRS, HRS, SIS), Oracle, etc.
•
External Data: Data from systems external to the University, such as economic
and census data collected by the government.
•
Data Staging Area: Storage and processing area for data extracted from the
internal and external systems prior to loading into the Warehouse, Data Marts or
Ad Hoc Query Repository. Some of the data will remain un-cleansed and an
exact replica of the data in the online systems, for subsequent loading into the
Ad Hoc Query Repository. Other data will be cleansed and transformed before
being moved to the Data Warehouse and Data Marts for analysis. Some data
will be located in multiple places and in multiple forms and aggregations.
•
Metadata: A term used for data that describes or specifies other data. It is used
to define all of the characteristics of data required to build databases and
applications, and to support knowledge workers and information producers.
This includes information currently in the Data Inventory (the element name,
meaning, format, domain values), and additional information such as business
integrity rules, relationships, owner, etc.
DW Terms
•
Ad Hoc Query Repository: A collection of enterprise data from multiple sources,
used to do ad hoc and operational reporting where the need to use the most current
and un-standardized source data is a requirement. The Repository will typically
contain only one or two years of the most recent data, unless regulatory or statutory
requirements dictate otherwise. (Also known as an Operational Data Store or
ODS.)
•
Data Warehouse: An enterprise-wide, cross-functional, cross-organizational
database typically comprised of data extracted, cleansed and/or summarized from
multiple online transaction processing systems, and other stores of data (Purdue
University; Stanford University). It is designed for query and analysis, typically
contains historical data, and is used to present information to support decisionmaking, tactical and strategic business processes. A data warehouse tends to start
from an analysis of what data already exists and how it can be collected in such a
way that the data can later be used. In general, a data warehouse tends to be a
strategic, but somewhat unfinished concept; a data mart tends to be tactical
and aimed at meeting an immediate need. (Improving Data Warehouse and
Business Information Quality, Larry P. English, 1999.)
DW Terms
•
Data Mart: A subset of enterprise data from the Data Warehouse that is
summarized and stored in an optimal fashion for analysis and presentation of
information to support trend analysis and tactical decisions and processes. Data
Marts are typically designed based on an analysis of user needs to answer
specific questions in the pursuit of specific goals. The scope can be that of
a complete data subject such as Student, or of a particular business area or line
of business, such as Enrollment. (Improving Data Warehouse and Business
Information Quality, Larry P. English, 1999.)
•
On-Line Analytical Processing (OLAP): A category of software technology
that enables analysts, managers and executives to gain insight into data through
fast, consistent, interactive access to a wide variety of possible views of
information that has been transformed from raw data to reflect the real
dimensionality of the enterprise as understood by the user. OLAP helps the
user synthesize enterprise information through comparative, personalized
viewing, as well as through analysis of historical and projected data in various
"what-if" data model scenarios. This is achieved through use of an OLAP Server.
(http://www.moulton.com/olap/olap.glossary.html) Functionality includes multidimensional analysis, slicing, drill-down and rotation.
DW Terms (Continued)
• Data Mining:
A class of database applications that look for hidden
patterns in a group of data. For example, data mining software can help retail
companies find customers with common interests. The term is commonly misused to
describe software that presents data in new ways. True data mining software doesn't
just change the presentation, but actually discovers previously unknown relationships
among the data. (http://www.webopedia.com/TERM/d/data_mining.html)
DW Terms (Continued)
•
Executive Information System (EIS): An application developed to
provide senior management direct access to information relevant to an
organization’s goals and performance. These applications are
developed to gather, analyze and integrate internal and external data to
provide management with insight into key performance indicators,
potential problems, and changes in the environment. Typical features
include extensive use of graphics, simple navigational controls,
automatic replacement of report contents, drill-down analysis, trend
analysis capabilities, exception reporting or alerts, graphical charts with
links to underlying reports, provision of data from multiple sources, and
the highlighting of information an executive feels is critical. (The Data
Warehouse Lifecycle Toolkit, Ralph Kimball, et al.)
Components of a Decision Support System
What is a
Decision Support System
EIS
High Level Summarized Data
For Top Executives
(“Pre-programmed DASHBOARD”)
Data Mart
Data Warehouse
Operational Data Store
Addresses Specific
Subject Area
Collection Of Integrated Subject
Oriented Databases
(Historical)
Time-Current, Integrated
Databases
(Tactical-Power Users)
Covansys
How do we get there?
• Educate users.
• Develop detailed requirements documents,
including “Information Value Chains” for all
goals which the DW/DM is expected to
address.
• Data Mart approach.
• Phased implementation.
• Additional resources.
Educate Users
• Basics – “What is a Data Warehouse?” Create a
“single-source-of-truth.” “What it’s not!” (It is not all the
data, with daily updates and online storage.)
• Change in culture – “Let’s make better decisions
based on objective analysis of data.”
• Set realistic expectations - No silver bullet. It can help
you make better decisions, but you still have to be
responsible for implementing those decisions.
• Focus on institutional goals – “What is it we need to
achieve? What metrics do we need to evaluate our
progress in attaining goals?”
• Importance of business sponsors – Make timely
business decisions and support IT requests for additional
funds.
Course Management (I.V.C.)
Business Functions and Goals
Optimize course offerings to meet student need.
Improvement Opportunities
Increase number of high demand courses/sections
Increase maximum enrollment in sections
Eliminate or reduce frequency of low demand courses
Improve course meeting patterns and delivery mode
Performance Measures
# and % decrease of students who do not get
any section of the course requested
# and % decrease of low demand courses
# and % increase in enrollment
% usage of classroom capacity
% decrease in length of time to graduate
# and % increase in courses taught through
preferred mode
Business Questions
What are the characteristics of high/low demand courses?
What characteristics of the student are related to demand?
What courses can be eliminated?
Which courses should/can be moved to smaller/larger facilities?
What impact does the meeting time and location have on demand?
What improvements can be made with/without additional money?
Data Model
College Budgets
Degree Reqs.
Student
Defines
Facilities
Course Demand
Courses
Available Faculty
Enrollment
Economic Data
(American Management Systems, Inc.)
Data Mart/
Warehouse
Enrollment Management (I.V.C.)
Business Functions and Goals
Increase student enrollment
Improvement Opportunities
Increase student retention
Increase number of new students
Increase number of transfers
Performance Measures
# and % increase of returning, new and
transfer students by term
# and % increase of applicants to college
eligible by KY county
# and % increase of admits to applicants
# and % increase of enrollments to admits
Ratios and % increase for FT/PT, Res/Non-Resident
Business Questions
What are the characteristics of students who do/do not return?
What are the characteristics of students who apply, but do not enroll?
What are the characteristics of students who do/do not transfer?
What policies or practices could affect these student outcomes?
What are the external factors affecting our enrollment?
Data Model
KY Population Data
State Aid Programs Data
Financial Aid
Applicant
Student
Defines
Economic Data
Course Demand
(American Management Systems, Inc.)
Courses
Enrollment
Data Mart/
Warehouse
Course Management DM – Subject Areas
•
•
•
•
•
•
•
•
•
•
•
Accounts (dollars for faculty and supplies)
Assets (facilities and equipment)
Benchmark Data (what and when courses
offered, degree requirements)
Courses (meeting pattern, teaching mode,
requirements)
Department/College (who owns the course,
degree offered and requirements)
Economic Data (salary and demand by
profession)
Faculty (distribution of effort, availability to teach)
Faculty Applicants (who and how many)
Population Data (who, where and what volume)
Staff (instructional support)
Students (classification, course demand)
Enrollment Management DM – Subject Areas
•
•
•
•
•
•
•
•
•
•
•
•
Accounts (dollars for services, financial aid)
Assets (facilities and equipment, housing)
Benchmark Data (enrollment patterns)
Courses (meeting pattern, teaching mode,
requirements, professor)
Department/College (who owns the course,
degree requirements, degrees offered)
Economic Data (general condition of economy,
hot professions)
Faculty (quality, student evaluations)
Population Data (who, where and what volume)
Staff (instructional support, student service area)
Students (demographics, course demand, GPA)
Student Prospects (demographics, contacts)
Student Applicants (demographics, contacts)
Project Schedule
Phase I, 2001-02
Infrastructure & Planning
Phase II, 2002-03
Creation of Data Mart
Phase III, 2003-04
Creation of Additional Data Marts
Phase I – Infrastructure and Planning (2001-
2002)
1)
2)
3)
IDMS Data Dump to Oracle – 100% complete.
WebFOCUS Implementation – Completion 12/02.
Data Mining Tools for IR staff – Purchased in 2002.
Implementation in progress. Target installation
completion 3/03.
4)
Create Views for “Data Dump” (Ad Hoc
Reporting Repository) - Target completion 4Q 0203.
5)
Establish Enterprise Standards for Key Data –
Analysis and recommendations are ongoing.
6)
Identify and Prioritize Data Mart Development
– Course Management Data Mart top priority for Data
Stewards.
Phase I – Infrastructure and Planning (2001-
2002) (Continued)
7)
8)
9)
GASB – Phase I completed 8/02.
CPE – Six years of data loaded in
Oracle.
Review Desktop Reporting Tools –
Ongoing review and testing of:
•
Brio
•
Crystal Reports
•
SAS
•
WebFOCUS
Phase II – Creation of Data Mart (2002-2003)
1)
Select and Purchase ETL Tools – Choices:
•
•
•
•
•
Ascential
IBM
Informatica
Oracle
SAS
2)
Course Management DM – Requirements
phase. Target completion 4/03.
3)
Phase II of GASB. Target completion 7/03.
Phase III – Creation of Additional Data Marts
(2003-2004)
1)
Create Metadata – Dependent on ETL implementation.
2)
Data Marts – Complete Course Management DM by 7/1/04,
and complete requirements for Resource Management DM.
(Others to consider: Enrollment Mgt., Department Mgt.,
Diversity & Equity Initiatives, Grant Management, and
Research Management.).
3)
Evaluate External Data Needs - Identify external data
needed for the Data Marts, locate source(s), estimate cost
and request funds for 04-05.
Phase ??
1) Create Remaining Data Marts.
2) Develop OLAP applications.
3) Develop EIS.
4) Purchase and Load External Data.
New DWE Resources
• The existing server will need to be upgraded, and
additional servers will need to be purchased for the
Ad Hoc Reporting Repository, the Data Staging Area,
the Data Mining and/or OLAP tools, and for one or
more of the Data Marts. Size, type and cost will vary
depending on function.
• Storage – Additional disk space for DW server, and
for other servers using Enterprise Storage System.
Exact amounts for online storage and for archiving will
be identified during the project requirements phase.
New Resources (Cont.)
• Training –
Infrastructure – “How do you design a DW environment?
Staff training on DW
Oracle (or other) DB training
Tools – Analysis and reporting tools.
Onsite visits to other universities.
• Consulting on Infrastructure Design – Covansys,
Gavroshe, Oracle, IBM, etc.
• Staff – Additional staff required for database
administration, institutional reporting, EIS development,
ETL use, data modeling, etc.
DWE Critical Issues
• Personnel Resources – Need to keep people focused on DW in light
of competing projects. (Given the current budget situation, this will be
difficult.)
• Training – IT and users.
• Consulting – Assistance for infrastructure planning.
• Culture Change – Focus on goals rather than data elements. Tell us
what, not how! Value in making decision based on data. Proof of
concept.
• Requirements Gathering – Needs to be thorough, and heavily relies
on timely user decisions. Defining who the “users” are on any given
project is critical.
• Budget/Funding – For additional hardware, software and people.
• Business Sponsor – The Data Warehouse is not another IT
project. It is an enterprise initiative!
Data Administration
QUESTIONS?