Transcript Data mgmt
CHAPTER
3
Data and
Knowledge
Management
1
1. Managing Data
2. The Database Approach
3. Big Data
4. Data Warehouses and Data Marts
5. Knowledge Management
2
>>>
1. Discuss ways that common challenges in
managing data can be addressed using data
governance.
2. Discuss the advantages and disadvantages of
relational databases.
3. Define Big Data, and discuss its basic
characteristics.
3
>>>
4. Recognize the necessary environment to
successfully implement and maintain data
warehouses.
5. Describe the benefits and challenges of
implementing knowledge management
systems in organizations.
4
OPENING
>
• Flurry Gathers Data
from Smartphone
Users
Do you feel that Flurry should be installed on your
smartphone by various app makers without your
consent? Why or why not? Support your answer.
What problems would Flurry encounter if someone
other than the smartphone’s owner uses the
device? (Hint: Note how Flurry gathers data.)
Can Flurry survive the privacy concerns that are
being raised about its business model?
5
3.1
Managing Data
• Data Quality
• Difficulties of Managing Data
• Data Governance
6
Data quality
•Accurate
•Complete
•Timely
•Consistent
•Accessible
•Relevant
•Concise
1. Relevant
2. Complete
3. Accessible
4. Timely
5. Accurate
6. Consistent
7. Concise
Purpose
Usage
“Qual
details”
7
The Difficulties of Managing Data
• The amount of data increases
exponentially over time
• Data are scattered throughout
organizations
• Data are generated from multiple
sources (internal, personal, external)
• New sources of data
20-15 years ago:
9-11 years ago:
Now:
8
The Difficulties of Managing Data
(continued)
•“Natural” loss of data:
Data Degradation
Moved; changed; inactivate
Data Rot
Rocketdyne example: Moon landing
•Data security, quality, and integrity
are critical
•Legal requirements change frequently
and differ among countries &
industries
9
’S ABOUT BUSINESS 3.1
• New York City
Opens Its Data
to All
What are some other creative applications
addressing city problems that could be
developed using NYC’s open data policy?
List some disadvantages of providing all city
data in an open, accessible format.
10
Data Governance
• Master Data Management
Synchronize A single version (P.71)
• Master Data vs Transaction data
11
3.2
The Database
Approach
• Data File
• Database Systems Minimize &
Maximize Three Things
• The Data Hierarchy
• The Relational Database Model
• Slides 13-15: Why databases;
benefits
12
Figure 3.1: Database
Management System
13
Database Management Systems
(DBMS) Minimize:
• Data Redundancy
• Data Isolation
• Data Inconsistency
14
Database Management Systems
(DBMS) Maximize:
• Data Security
• Data Integrity
• Data Independence
In the old days, data files and programs
were …
15
’S ABOUT BUSINESS 3.2
• Google’s Knowledge Graph
Refer to the definition of a relational
database. In what way can the Knowledge
Graph be considered a database? Provide
specific examples to support your answer.
Refer to the definition of an expert system
in Plug IT In 5. Could the Knowledge Graph
be considered an expert system? If so,
provide a specific example to support your
answer.
What are the advantages of the Knowledge
Graph over traditional Google searches?
16
Data Hierarchy
•
•
•
•
Database
Data File (Table)
Record
Field
• Byte
• Bit
Biz/
Logic
Physics
Hi
Low
17
Data Hierarchy
(Figure from Laudon & Laudon)
High
Low
Bad exmple - A principle of DB design: smlst elmt…
Data hierarchy explained:
What they are
Term in DB
As component
in DB
Smallest
meaningful unit
Related fields
Same type of
records
Related files
19
Figure 3.2: Hierarchy of Data for
a Computer-Based File
20
The Relational Database Model
• Database Management System
(DBMS)
• Relational Database Model
• Data Model
Entity (type)
(Entity) Instance
Attribute
Attribute value
21
Relational Database Model Example
Entity
Attribute1 Value of Att1 Attribute2 Val of Att2
Every row in the table is a…
22
DISCUSSION
Database - maintains information about various types of
objects, places/organizations, people, and events
Entity
Attributes
object
Place/org
Place/org
Ex: Bottled water
people
Ex: Student
people
Ex: Patient
event
Ex: Stud-org-sponsored campus activities
Ex: College
Ex: Department
23
The Relational Database Model
(continued)
• Primary Key
Uniquely:
• Secondary Key
P. 76
Help to identify but not uniquely:
• Foreign Key
Through which relationship is
maintained and implemented
PK-FK pair, between “_______”
and “______” tables (1-M later)
PK on 1-side
(Parent-tbl),
FK on M-side
(Child-tbl)
24
Figure 3.3: Student Database
Example
Problem of this design?
25
3.3
•
•
•
•
•
Big Data
Defining Big Data
Characteristics of Big Data
Issues with Big Data
Managing Big Data
Putting Big Data to Use
26
Defining Big Data
• Gartner (www.gartner.com)
• The Big Data Institute (TBDI)
27
Defining Big Data: Gartner
Diverse,
high volume,
high-velocity
• information assets that
require new forms of processing
to enable enhanced
decision making,
insight discovery, and
process optimization.
28
Defining Big Data: The Big Data
Institute (TBDI)
• Vast Datasets that:
Exhibit variety
Include structured, unstructured, and
semi-structured data
Generated at high velocity with an
uncertain pattern
Do not fit neatly into traditional,
structured, relational databases
Can be captured, processed, transformed,
and analyzed in a reasonable amount of
time only by sophisticated information
systems.
29
Characteristics of Big Data
• Volume
• Velocity
• Variety
30
Issues with Big Data (P. 79)
• Untrusted data sources
• Big Data is “dirty”
• Big Data changes, especially in data
streams
31
Managing Big Data
• Big Data can reveal valuable
patterns, trends, and information
that were previously hidden:
tracking the spread of disease
tracking crime
detecting fraud
32
Managing Big Data (continued)
• First Step:
Integrate information silos into a
database environment and develop data
warehouses for decision making.
• Second Step:
making sense of their proliferating data.
33
Managing Big Data (continued)
• Many organizations are turning to
NoSQL databases to process Big
Data
34
Putting Big Data to Use
• Making Big Data Available
• Enabling Organizations to Conduct
Experiments
• Micro-Segmentation of Customers
• Creating New Business Models
• Organizations Can Analyze Far More
Data
35
Putting Big Data to Use •
•
•
•
•
•
•
Sentiment Analysis
360-Degree Customer View
Multi-Channel Marketing
Customer Micro-Segmentation
Clickstream Analysis
Fraud Detection
Ad-hoc Analysis
“Big Data” is stored on and organized in…
36
3.4
Data Warehouses and
Data Marts
• Describing Data Warehouses and
Data Marts
• A Generic Data Warehouse
Environment
37
Figure 3.4: Data Warehouse
Framework
38
Describing Data Warehouses and
Data Marts
• Organized by business dimension or
Use online analytical processing
(OLAP)
• Integrated
• Time variant
Data warehouse
• Nonvolatile
supports data mining
• Multidimensional
39
A Generic Data Warehouse
Environment
•
•
•
•
Source Systems
Data Integration
Storing the Data
Metadata
Data about data: type, structure,
constraints, owner, source, revision, …
• Data Quality
• Governance
• Users
40
Figure 3.5: Relational Databases
41
Figure 3.6: Data Cube
42
Figure 3.7: Equivalence Between
Relational and Multidimensional Databases
43
’S ABOUT BUSINESS 3.4
Data Warehouse Gives
Nordea Bank a Single
Version of the Truth
What are other advantages (not mentioned
in the case) that Nordea Bank might realize
from its data warehouse?
What recommendations would you give to
Nordea Bank about incorporating Big Data
into their bank’s data management? Provide
specific examples of what types of Big Data
you think Nordea should consider.
44