Data Warehouses - IIS Windows Server
Download
Report
Transcript Data Warehouses - IIS Windows Server
Business Intelligence in Detail
What is a Data Warehouse?
Overview of Business Intelligence
Analytical
tools
Data
Sources
Data
Warehouse
Data
visualisation
OLAP
Data
Mining
Analysis
Results
Data
visualisation
Example BI Questions
(Microsoft)
• How does our profitability to date compare with the same
time period during the past five years?
• How much money did customers over the age of 35 spend
last year, and how has that behavior changed over time?
• How many products were sold in two specific
country/regions this month as opposed to the same month
last year?
• For each customer age group, what is the breakdown of
profitability (both margin percentage and total) by product
category?
• Find top and bottom salespeople, distributors, vendors,
clients, partners, or customers.
To answer these...Data is
1. Gathered from relevant sources
2. Filtered,standardised and stored
3. Analysed and arranged into meaningful
patterns using different tools .
4. Business intelligence is the knowledge
gained from that data analysis.
We need to Understand
•
•
•
•
•
•
•
Data issues – data quality
Where data comes from
How data is stored: data warehouses
How data is analysed
Example Business contexts
Limitations of the computer
Our own blind spots (if this is possible)!
Where does the data come
from?
• Data can be collected manually or
automatically.
– Transaction data
– Time studies, questionnaire, observation notes
– Physical sensors e.g. temperature of a rooms in
a house
– Sensors, scanners, bar codes
• It may be stored in different systems e.g.
ERP database, operational databases etc.
and in different formats
We need to decide what data is
important...
• Depends what our goals are, the functional
area(e.g. Sales, HR, marketing..) and what
processes we are looking at..
e.g. Balanced scorecard uses
Critical success factors
Key performance indicators
These are derived from e.g. transaction data
collected
Human
resources
•Employee
data e.g. Pay
Sales and marketing
• products
Example
• customers
data in
• demographics
Functional
• promotions
Areas
• sales force
Finance
• order type
• currency standards
• account information
• industry trends
Operations
management
•assembly speed
•warehouse stock
•manufacturer and
supplier cost
•shift productivity
Data Quality is also important
• Contextual – relevance, value, timeliness
completeness, amount
• Intrinsic – accuracy, objectivity, believability,
reputation
• Accessibility DQ – ease of access,security
• Representation DQ – interpretability, ease of
understanding, concise, consistent
representation.
What is a Data Warehouse?
A data repository that makes operational and
other data accessible in a form that is
readily acceptable for decision support and
other user applications.
Note: A data warehouse is not another word
for a database. The specific purpose of a
data warehouse is to support decisions not
operations.
10
Data 3
....
From Turban, Aronson and Liang
Data warehouse vs operational
database
Operational database – optimised for speed of update
• Data is normalised. Each data item is only held once.
• very fast insert/update performance
• Older data periodically purged to improve performance.
Data warehouse - optimized for speed of data retrieval.
– Data may be stored using a dimension-based model
e.g. like OLAP
– data warehouse data are often stored multiple times.
Historic data are held to enable comparison
E.g. Microsoft: How can data from different
sources be joined together (consolidated and
integrated) securely?
• SQL Server provides a comprehensive and
scalable data warehouse platform
• organizations build large-scale enterprise
data warehouses that can consolidate data
from multiple disparate systems into a
single, secure, manageable solution.
Parts of a Data Warehouse System
• Data Warehouse itself
• Data acquisition (back end) software
• Client (front-end) software
14
Data 3
Data Warehouse Itself
• A large physical database which contains
the data in the data warehouse.
• A logical data warehouse which contains all
the meta data, business rules and
processing logic used to organise and
preprocess the data.
15
Data 3
Data acquisition (back end) software
• Extracts data from legacy systems and
external sources, consolidates and
summarizes the data and loads it into the
data warehouse.
• The size of the update window
– is how long it takes for the operational data to
be transformed and loaded into the
warehouse.
16
Data 3
Data Extraction
•
•
•
•
Importing files
Summarising data
standardising data
Filtering and condensing data.
17
Data 3
What do we mean by
transforming data?
• Comparing data from different systems to
improve data quality
– Data may be missing from one data base but present in
another.
• Standardising data and codes
E.g. Gender may be classified as Male/Female, M/F, 0/1
• Integrating data from different systems
e.g. if one system keeps orders and another stores
customers, these data elements need to be linked.
• Performing other system housekeeping functions.
– Change files to reduce data load times.
– Finding keys for data. 18
Data 3
Meta Data
• A data warehouse must allows for the
storage of metadata
– information about the content of the
warehouse,
– Sources of the data
– guides for moving data,
– summarisation rules,
– business and technical terms,
– rules for data extraction.
19
Data 3
Characteristics of Data in the Warehouse
• Subject oriented
– data are organised by detailed subject, containing only information relevant
for decision support.
• Integrated
– all data is standardised and consistent
• Time-variant
– data kept for 5-10 years & used for trends, forecasting and comparison
• Non-volatile
– once entered into the data warehouse data can’t be changed.
– Obsolete data are discarded.
• Summarised
– into different levels of detail
• Not normalised
20
Data 3
Front End Tools
• Business intelligence (BI) tools and analytic
applications may be used to access the
warehouse/marts to support querying, reporting,
and analysis of the data.
– General-purpose report writers and managed query
tools
– OLAP
– Data mining tools
– Performance management systems
– Data visualization tools
21
Data 3
Related Technologies : Data Mart
• A subset of a data warehouse, typically consisting of a single
subject area.
• Lower cost, scaled down version of a data warehouse
designed for a strategic business unit or a department.
• Advantages:
–
–
–
–
–
–
lower cost
shorter lead time for implementation
local control
more rapid response
easier to understand and navigate
allows a business unit to build its own systems without relying on a
centralised IS department.
22
Data 3
What is OLAP?
• OLAP enables you to look at and access
data in different ways (3-d data cubes) ,
drill down, view summarised data, make
calculations on the fly etc.
• http://www.census.gov
How can data be analysed?
• Microsoft Online Analytical Processing
(OLAP) makes it quick and easy to perform
ad-hoc queries and analysis of large
amounts of complex data across all aspects
of your business.
Microsoft OLAP is used to report
on...
•
•
•
•
•
•
sales
marketing
management issues
business process management
budgeting and forecasting,
financial issues etc..
Why use a Data Warehouse?
– Data is stored in different systems.
– Management use information to make
decisions.
– The customer base is large and diverse.
– The data in the different systems is
represented differently.
– Data is stored in highly technical, difficult to
decipher formats.
26
Data 3