Lecture 5 - Faculty Web Pages

Download Report

Transcript Lecture 5 - Faculty Web Pages

Opening Case:
The Case for Business
Intelligence at Netflix
McGraw-Hill-Ryerson
©2015 The McGraw-Hill Companies, All Rights Reserved
Chapter Four Overview
• SECTION 4.1 – DATABASES
–
–
–
–
–
Storing Transactional Data
Relational Database Fundamentals
Relational Database Advantages
Database Management Systems
Data-Driven Web Sites
• SECTION 4.2 – DATA WAREHOUSING
–
–
–
–
Accessing Organizational Information
History of Data Warehousing
Data Warehouse Fundamentals
Data Mining
Copyright © 2015 McGraw-Hill Ryerson Limited
4-2
Learning Outcomes
1.
Describe the structure of a relational database.
2.
Describe the advantages to storing data in a relational
database.
3.
Explain how users interact with a database management
system, the advantage of data-driven Web sites, and the
primary methods of integrating data and information
across multiple databases in organizations.
4.
5.
Describe data warehouse fundamentals and advantages.
Describe data mining and explain the relationship
between data-mining and data warehousing.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-3
DATABASES
McGraw-Hill-Ryerson
©2015 The McGraw-Hill Companies, All Rights Reserved
Storing Transactional Data
Learning
Outcome
4.1
Transactional Data is stored in databases.
• Database
– Collection of records
– Schema describes data it holds, the objects (data items)
represented & relationships among them
– Database (or Data) Models
• How the Schema is organized.
• Most common is relational using multiple tables set up in
rows and columns
Example of a Relational
Database Table
Copyright © 2015 McGraw-Hill Ryerson Limited
4-5
Database Fundamentals
Learning
Outcome
4.1
• Database models include:
– Hierarchical database model
• Information is organized into a tree-like structure that
allow for repeating data.
• One parent record has many subordinate (or child)
records
– Network database model
• Flexible way of representing objects and their
relationships.
• Subordinate (or child) records can have many parent
records forming a complex, multi-dimensional lattice
structure
– Relational database model –
• Stores data in the form of logically related twodimensional tables.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-6
Relational Database Fundamentals
Learning
Outcome
4.1
• Entity class
– A category of person, place, thing or event about which
information is stored.
• Entity
– An individual person, place, thing or an individual occurrence of
an event about which information is stored.
• Table
– Collects the data for an entity class. For example: One table is
for Customers, another for Orders, another for Products.
• Record
– Rows containing the data for each entity belonging to that class.
• Field (Attribute)
– Columns indicating the characteristics stored for each entity
Copyright © 2015 McGraw-Hill Ryerson Limited
4-7
Relationship Fundamentals
Learning
Outcome
4.1
This is a fictitious sample order by Dave’s Sub Shop for Barq’s
Root Beer from Coca Cola with transactional data to be
stored in a database.
Potential Relational
Database for the
Coca-Cola Bottling
Company
From Figure 4.1
Copyright © 2015 McGraw-Hill Ryerson Limited
4-8
Relationship
Fundamentals
Relational
Database
Fundamentals
Learning
Outcome
4.1
Example of
an Entity
Class (Table)
Examples of
Entities
(Records)
Potential Relational
Database for the
Coca-Cola Bottling
Company
From Figure 4.1
Copyright © 2015 McGraw-Hill Ryerson Limited
4-9
Storing Data in
a Relational
Database
Learning
Outcome
4.1
Data is stored in
tables according to
its particular
category.
Potential Relational
Database for the
Coca-Cola Bottling
Company
Figure 4.1
Copyright © 2015 McGraw-Hill Ryerson Limited
4-10
Relating Data through Keys
Learning
Outcome
4.1
Primary key
• A field (or group of fields) contain values that uniquely
identify a given record in a table.
Foreign key
• A primary key of one table that appears a field in another
table. A value in the foreign key of one table corresponds to
the value in the primary key of another table.
Relationships
• The data from one table is linked to another when the
computer finds a match between the values in a primary key
to the values in the foreign key of another table.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-11
Relational Database Fundamentals
Learning
Outcome
4.1
1. The Customer
table is linked
to the Order
table by
means of the
Customer ID
field.
2. Customer ID is a
Primary Key in
the Customer
table. The value
23 is a unique
identifier for
Dave’s Sub
Shop.
Copyright © 2015 McGraw-Hill Ryerson Limited
3. Customer ID is
the Foreign Key
in the Order table.
The value 23 is a
unique identifier
for Dave’s Sub
Shop and links
the order to the
customer.
Potential Relational
Database for the
Coca-Cola Bottling
Company
See Figure 4.1
4-12
Relational Database Advantages
Learning
Outcome
4.2
Increased Flexibility
– Handle changes quickly and easily
– Provide users with different views
– Have only one physical view
• Physical view – deals with the physical
storage of information on a storage
device
– Have multiple logical views
• Logical view – focuses on how users
logically access information
Copyright © 2015 McGraw-Hill Ryerson Limited
4-13
Relational Database Advantages
Learning
Outcome
4.2
Increased Scalability and Performance
• A database must increase or decrease
in size to meet increased demand,
while maintaining acceptable
performance levels.
– Scalability refers to how well a system
can adapt its capacity to changing
demands.
– Performance measures how quickly a
system performs a certain process or
transaction.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-14
Relational Database Advantages
Learning
Outcome
4.2
Reduced Data Redundancy
• Data (Information) Redundancy is the
duplication of information or storing the
same information in multiple places
• Problems include:
– Inconsistency of data describing the
same thing.
– Waste of space, waste of time to enter
and update.
– Difficulty securing data in many places.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-15
Relational Database Advantages
Learning
Outcome
4.2
Increased Information Integrity (Quality)
• Information integrity measures the quality
of information
• Integrity constraints are rules to ensure the
quality of information:
– Relational integrity constraints are rules
enforcing data structures and accurate storage,
analysis & display of information
– Business-critical integrity constraints are rules
supporting operational requirements such return
policies and credit terms.
– Support error reduction & increase in the use of
organizational data.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-16
Relational Database Advantages
Learning
Outcome
4.2
Increased Security
• Information is an organizational asset and
must be protected.
• Databases offer several security features
including:
– Password – provides authentication of the user
– Access level – determines who has access to
the different types of information
– Access control – determines types of user
access, such as read-only read-write, readwrite-copy
Copyright © 2015 McGraw-Hill Ryerson Limited
4-17
Database Management Systems (DBMS)
Learning
Outcome
4.3
Software through which users and application programs
interact with a database
Interacting Directly and Indirectly with a Database Through a DBMS
Figure 4.2
Copyright © 2015 McGraw-Hill Ryerson Limited
4-18
Data-Driven Web Sites
Learning
Outcome
4.3
An interactive Web site which uses a database to keep it updated
and relevant to the needs of its customers.
A Data-driven Website
1) Search Engine
3) Search Query Results
Figure 4.3
2) Database
Copyright © 2015 McGraw-Hill Ryerson Limited
4-19
Data Driven Web Site Advantages
Learning
Outcome
4.3
Development
Capability
Allows website owner to make changes anytime
with little or no training
Content Management
Capability
Faster turnaround time and more accurate
updates.
Future Expandability
Easier layout, displays and functionality changes.
Minimization of human Has “error-trapping” mechanisms to ensure
content & formats are correct.
error
Less production &
Update Costs
Data entry personnel are trained more quickly
and are less expensive than programmers.
More efficiency
System cascades changes through the site.
Better stability
System tracks templates and source files.
See Figure 4.4
Copyright © 2015 McGraw-Hill Ryerson Limited
4-20
Data Integration
Learning
Outcome
4.3
Allows separate systems to communicate directly with
each other.
– Forward integration takes information entered into a
given system and sends it automatically to all
downstream systems and processes.
– Backward integration takes information entered into
a given system and sends it automatically to all
upstream systems and processes.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-21
Forward and Backward Integration
Learning
Outcome
4.3
Forward and Backward Customer Data Integration
Figure 4.5
Copyright © 2015 McGraw-Hill Ryerson Limited
4-22
Integrated Customer Data
Learning
Outcome
Integrated Customer Data
4.3
Figure 4.6
Copyright © 2015 McGraw-Hill Ryerson Limited
4-23
OPENING CASE QUESTIONS
The Case for Business Intelligence at NetFlix
1.
2.
3.
4.
5.
What is the impact to NetFlix if the information contained in
its database is of low quality?
Review the five common characteristics of high quality
information and rank them in order of importance to
NetFlix?
How might NetFlix resolve issues of poor information in
their customer movie reviews?
Identify the different types of entities that might be stored
in NetFlix's database.
Why is database technology so important to NetFlix and its
business model?
Copyright © 2015 McGraw-Hill Ryerson Limited
4-24
Data Warehousing
McGraw-Hill-Ryerson
©2015 The McGraw-Hill Companies, All Rights Reserved
History of Data Warehousing
Learning
Learning
Outcome
4.4
• In the 1990’s, Functional systems were too
cumbersome & inefficient
– Operations systems and data were not
integrated.
– Little historic data, little trend
information
– Quality issues
– Good for transactions processing, not
analysis
• Turn of the millennium
– Data scattered over too many platforms
– Complex analysis was not timely
Copyright © 2015 McGraw-Hill Ryerson Limited
4-26
Data Warehouse Fundamentals
Learning
Outcome
4.4
• Data warehouse
– A logical collection of information
– Gathered from many different operational databases
– Supports strategic business analysis activities and
decision-making tasks.
• Primary Purpose
– To aggregate information throughout an organization
– Not a location for ALL data, only data of interest.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-27
Characteristics of Data Warehouses
Learning
Outcome
4.4
• Subject oriented
– Information is organized around a major
organizational subject area, e.g.. Customers
• Integrated
– Sourced from a variety of internal operational systems
and external databases into a coherent whole
• Time-variant
– Time-stamped according to its cycle (daily, yearly etc.)
• Non-volatile
– Once loaded, data does not change
Copyright © 2015 McGraw-Hill Ryerson Limited
4-28
Data Warehouse Fundamentals
Learning
Outcome
4.4
• Extraction, transformation, and loading (ETL)
– A process that extracts information from
internal and external databases,
– Transforms the information using a common
set of enterprise definitions
– Loads the information into a data warehouse.
• Data mart
– Contains a subset of data warehouse
information
– Extracted to be analyzed for specific objectives.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-29
Model of a Typical Data Warehouse
Learning
Outcome
4.4
Figure 4.7
Copyright © 2015 McGraw-Hill Ryerson Limited
4-30
Multi-dimensional Analysis
Learning
Outcome
4.4
• Databases contain information in twodimensional tables…rows and columns
• Data warehouse information is threedimensional…layers of rows and columns
– Each Dimension is a particular characteristic
of the information; an attribute.
– Cube is acommon term for the
representation of multi-dimensional
information.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-31
Multi-dimensional Analysis
Learning
Outcome
4.4
A Cube of Information for Performing Multi-Dimensional Analysis on
Three Stores for Five Products and Four Promotions.
Figure 4.8
Copyright © 2015 McGraw-Hill Ryerson Limited
4-32
Information Cleansing or Scrubbing
Learning
Outcome
4.4
A process that weeds out and fixes or
discards inconsistent, incorrect, or
incomplete information.
• Software tools use sophisticated
algorithms to parse, standardize, correct,
match and consolidate warehouse
information.
• Process is done during the ETL process
and once it is in the warehouse.
• Critical when data exits in several
operational systems.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-33
Information Cleansing or Scrubbing
Learning
Outcome
Customer Contact Data in Operational Systems
4.4
Figure 4.9
Copyright © 2015 McGraw-Hill Ryerson Limited
4-34
Standardizing Customer Name from
Operational Systems
Learning
Outcome
4.4
Figure 4.10
Copyright © 2015 McGraw-Hill Ryerson Limited
4-35
Information Cleansing or Scrubbing
Learning
Outcome
4.4
Figure 4.11
Copyright © 2015 McGraw-Hill Ryerson Limited
4-36
Accurate and Complete Information
Learning
Outcome
4.4
Figure 4.12
Copyright © 2015 McGraw-Hill Ryerson Limited
4-37
Data Mining
Learning
Outcome
4.5
• The process of analyzing data to extract
information.
– Drilling Down progresses through increasing
levels of detail.
– Drilling Up works through increasing levels of
summarization.
• Data Mining Tools
‒ Variety of techniques that find patterns and
relationships in large volumes of information.
‒ Specialized technologies and functionalities
including Query tools, reporting tools, statistical
tools and intelligence agents.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-38
Data Mining Activities
Learning
Outcome
4.5
Apply algorithms to information sets to uncover inherent trends
and patterns which are used to develop new business strategies.
• Classification
– Assigning records to one of a pre-defined set of classes
• Estimation
– Determining the values for an unknown continuous
variable behavior
• Affinity grouping
– Which things go together
• Clustering
– Breaks up a heterogeneous population of records into a
number of more homogenous subgroups.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-39
Data Mining Output
Learning
Outcome
4.5
SC Johnson:
Changes in
Consumer
Environmental
Behaviour
Figure 4.13
Copyright © 2015 McGraw-Hill Ryerson Limited
4-40
Data Mining Techniques
Learning
Outcome
4.5
Cluster analysis
•
A statistical technique used to divide an information set into
mutually exclusive groups such that the members of each
group are as close together as possible to one another and
the different groups are as far apart as possible.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-41
Association Detection
Learning
Outcome
4.5
Association detection
Reveals the relationship between
variables along with the nature and
frequency of the relationships
• Rule Generators
Data Collection
for Market Basket
Analysis
– Form business rules from the data
mining applications
– Predict business events and their
probability of occurrence
•
Market basket analysis
–
–
Analyzes websites & checkout
scanners
Predict future buyer behaviour
Copyright © 2015 McGraw-Hill Ryerson Limited
Figure 4.14
4-42
Statistical Analysis
Learning
Outcome
4.5
Performs such functions as information correlations,
distributions, calculations, and variance analysis
–
–
–
–
SENECA defines qualitative variables and assigns them
numerical scales. Then, builds models, forecasts and
trends based on consumer testing.
Forecast – Predictions made on the basis of time-series
information
Time-series information – Data collected at regular,
equal-spaced, periods. Used for trend analysis.
Many large vendors provide end-to-end data mining
decision tools with predictive analytical capabilities.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-43
OPENING CASE QUESTIONS
The Case for Business Intelligence at NetFlix
6. Why must NetFlix cleanse or scrub the
information in its database?
7. Choose one of the three common forms of data
mining analysis and explain how NetFlix is using it
to gain BI?
8. How might NetFlix be using tactical, operational
and strategic BI?
Copyright © 2015 McGraw-Hill Ryerson Limited
4-44
CLOSING CASE ONE
Scouting for Quality
1. Explain the importance of high-quality information
for Scouts Canada.
2. Review the five common characteristics of high
quality information and rank them in order of
importance for Scouts Canada.
3. How could data warehouses and data marts be
used to help Scouts Canada improve the efficiency
and effectiveness of its operations? Its decision
making?
Copyright © 2015 McGraw-Hill Ryerson Limited
4-45
CLOSING CASE ONE
Scouting for Quality
4. What kinds of data marts might Scouting Canada
want to build to help it analyze its operational
performance?
5. Do the managers at Scouts Canada actually have
all of the information they require to make an
accurate decision? Explain the statement “it is
never possible to have all of the information
required to make the best decision possible.”
Copyright © 2015 McGraw-Hill Ryerson Limited
4-46
CLOSING CASE TWO
Searching for Revenue: Google
1.
Review the five common characteristics of highquality information and rank them in order of
importance to Google’s business.
2.
What would be the ramifications of Google’s
business if the search information it presented to its
customers was of low quality?
3.
Describe the different types of databases. Why
should Google use a relational database?
4.
Identify the different types of entities, entity classes,
attributes, keys, and relationships that might be
stored in Google’s AdWords relational database.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-47
CLOSING CASE TWO
Searching for Revenue - Google
5. How might Google use a data warehouse to
improve its business operations?
6. Why would Google need to scrub and cleanse
the information in its data warehouse?
7. Identify a data mart that Google’s marketing and
sales department might use to track and analyze
its AdWords revenue.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-48
CLOSING CASE THREE
Caesars - Gambling Big on Technology
1. Identify the effects poor information might have
on Caesar’s service-oriented business strategy.
2. How does Caesar’s use database technologies to
implement its service-oriented strategy?
3. Caesar’s was one of the first casino companies to
find value in offering rewards to customers who
visit multiple Caesar’s locations. Describe the
effects on the company if it did not build any
integration among the databases located at each
of its casinos. How could Caesar’s use distributed
databases or a data warehouses to synchronize
customer information?
Copyright © 2015 McGraw-Hill Ryerson Limited
4-49
CLOSING CASE THREE
Caesars - Gambling Big on Technology
4.
5.
6.
7.
Estimate the potential impact to Caesar’s business if
there is a security breach in its customer
information.
Identify three different types of data marts Caesar’s
might want to build to help it analyze its operational
performance.
What might occur if Caesar’s fails to clean or scrub
its information before loading it into its data
warehouse?
Describe cluster analysis, association detection, and
statistical analysis and explain how Caesar’s could
use each one to gain insights into its business.
Copyright © 2015 McGraw-Hill Ryerson Limited
4-50