Data and Knowledge Management

Download Report

Transcript Data and Knowledge Management

Chapter 10
Data and Knowledge Management
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Data
• Set of discrete, objective facts about
events
• Business - structured records of
transactions
• Little relevance or purpose
Information
• Message with sender and receiver
• Meant to change way receiver perceives
something
• Have an impact on his judgment /
behavior
Data Processing
• Contextualize - why was data gathered?
• Categorize - what are its key
components?
• Calculate - analyze mathematically
• Condense - summarize in more concise
form
Information Processing
• Compare - in kind and in time
• Consequences - how used in decisions /
actions
• Connections - relation to other
information
• Conversation - what other people think
about this information
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Database
•
•
•
•
•
•
Element
Types
Structure
Models
Creation
Topology
Element
•
•
•
•
Bit, byte, field, record, file, database
Entity, attribute, key field
Relation
Class, object
Database Types
•
•
•
•
•
•
•
•
Business database
Geographical information database
Knowledge database / deductive database
Multimedia database
Data warehouse
Data marts
Multimedia and hypermedia database
Object-oriented database
Database Structure
• Data definition language
– Schema & subschema
• Data Manipulation language
– Structured Query Language (SQL)
– Query By Example (QBE)
• Data dictionary
Database Models
• Hierarchical
– One to many
– TPS or routine MIS
• Network
– Many to many
– TPS or routine MIS
• Relational
– Normalization
– Ad hoc reports or DSS
• Object-oriented
– E-commerce
Database Creation
• Conceptual design
– Logical view
– Entity-relationship (ER) diagram
– Normalization
Entity Relationship Diagram
• Entity: object or concept
• Relationship: meaning association between
objects
• Attribute: property of an object
– Simple & Composite
– Single-valued & multi-valued
– Derived
• Key
– Primary key
– Foreign key
Normalization
• A technique for identifying a true primary
key for a relation
• Types
– First normal form: not repeating group
– Second normal form: every non-primary-key
attribute is fully functionally dependent on the
entire primary key
– Third normal form: no transit dependency
Structured Query Language
• Select
• Join
SQL DML - SELECT
•
SELECT [DISTINCT|ALL] {* | [colexpr [AS newname]][,...]
FROM table-name [alias] [,...]
[WHERE condition]
[GROUP by colm [, colm]
[HAVING condition]]
ORDER BY colm [, colm]
SQL DML - SELECT
•
SELECT attributes (or calculations: +,
-, /, *)
FROM relation
•
SELECT DISTINCT attributes
FROM relation
Examples
• SELECT stuname
FROM student;
• SELECT stuid, stuname, credit
FROM student;
• SELECT stuid, stuname, credit+10
FROM student;
• SELECT DISTINCT major
FROM student;
SQL DML - SELECT
•
SELECT attributes (or * wild card)
FROM relation
WHERE condition
Examples
• SELECT *
FROM student;
• SELECT stuname, major, credit
FROM student
WHERE stuid = ‘S114’;
• SELECT *
FROM faculty
WHERE dept = ‘MIS’;
SELECT - WHERE condition
•
•
•
•
•
•
AND
OR
NOT
IN
NOT IN
BETWEEN
IS NULL
IS NOT NULL
LIKE '%' multiple characters
LIKE ‘_’ single characters
Examples
• SELECT *
FROM faculty
WHERE dept = ‘MIS’ AND rank =
‘full professor’;
• SELECT *
FROM faculty
WHERE dept = ‘MIS’ OR rank =
‘full professor’;
• SELECT *
FROM faculty
WHERE dept = ‘MIS’ NOT rank =
‘full professor’;
• SELECT *
FROM class
WHERE room LIKE ‘B_S%’;
• SELECT *
FROM class
WHERE room NOT LIKE ‘BUS%’;
• SELECT productid, productname
FROM inventory
WHERE onhand BETWEEN 50 and
100;
• SELECT companyid, companyname
FROM company
WHERE companyname BETWEEN
‘G’ AND ‘K’;
• SELECT productid, productname
FROM inventory
WHERE onhand NOT BETWEEN
50 and 100;
• SELECT companyid, companyname
FROM company
WHERE companyname NOT
BETWEEN ‘G’ AND ‘K’;
• SELECT facname
FROM faculty
WHERE dept IN (‘MIS’, ‘ACT’);
• SELECT facname
FROM faculty
WHERE rank NOT IN (‘assistant’,
‘lecture’);
• SELECT customername
FROM customer
WHERE emailadd IS NOT NULL;
• SELECT customername
FROM customer
WHERE creditlimit IS NULL;
SELECT - aggregate functions
•
•
•
•
•
•
COUNT (*)
COUNT
SUM
AVG
MIN
MAX
Examples
• SELECT COUNT(*)
FROM student;
• SELECT COUNT(major)
FROM student;
• SELECT COUNT(DISTINCT major)
FROM student;
• SELECT COUNT(stuid), SUM(credit),
AVG(credit), MAX(credit),
MIN(credit)
FROM student;
SELECT - GROUP
•
•
GROUP BY
HAVING
Examples
•
•
SELECT major, AVG(credit)
FROM student
GROUP BY major
HAVING COUNT(*) > 2;
SELECT course#, COUNT(stuid)
FROM enrollment
GROUP BY course#
HAVING COUNT(*) > 2;
•
SELECT major, AVG(credit)
FROM student
WHERE major IN (‘MIS’, ‘ACT’)
GROUP BY major
HAVING COUNT(*) > 2;
SELECT - ORDER BY
•
•
ORDER BY
ORDER BY ... DESC
Examples
•
•
SELECT facname, rank
FROM faculty
ORDER BY facname;
SELECT facname, rank
FROM faculty
ORDER BY rank DESC,
facname;
SELECT - JOIN Tables
•
•
Multiple tables in FROM clause
MUST have join conditions!!!
Examples
•
SELECT stuname, grade
FROM student, enrollment
WHERE student.stuid =
enrollment.stuid;
•
SELECT enrollment.course#,
stuname, major
FROM class, enrollment, student
WHERE class.course# =
enrollment.course#
AND enrollment.stuid =
student.stuid
AND facid = ‘F114’
ORDER BY enrollment.course#;
SUBQUERY, EXIST, NOT EXIST
• SELECT s.stuname, major
FROM student s
WHERE EXIST
(SELECT *
FROM enrollment e
WHERE
s.stuid = e.stuid);
• SELECT s.stuname, major
FROM student s
WHERE NOT EXIST
(SELECT *
FROM enrollment e
WHERE
s.stuid = e.stuid);
Database Creation
• Physical design
– Physical view
– Data topology (organization)
• Centralized
• Distributed database
– Replicated database
– Partitioned
• Organization & access method
– Sequential file
– Indexed sequential file
– Direct or random file
• Security
– Logical, physical, and transmitting
Selection Criteria
•
•
•
•
•
•
•
•
•
User’ needs (type of application)
Compatibility
Portability
Reliability
Cost
Feature
Performance
Vendor’s support
Others?
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Data Administrator
•
•
•
•
Clean up data definitions
Control shared data
Manage distributed data
Maintain data quality
Clean Up Definitions
• Synonyms / aliases
• Standard data definitions
– Names and formats
• Data Dictionary
– Active
– Integrated
Control Shared Data
• Local - used by one unit
• Shared - used by two or more activities
• Impact of proposed program changes on
shared data
• Program-to-data element matrix
Manage Distributed Data
• Geographically dispersed
– Whether shared data or not
• Different levels of detail
– Different management levels
Maintain Data Quality
• Put owners in charge of data
– Verify data accuracy and quality
• Purge old data
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
The DBMS
Data Base Management System: software
that permits a firm to:
– Centralize data
– Manage them efficiently
– Provide access to applications
• Such as payroll, inventory
DBMS Components
•
•
•
•
Data Definition Language (DDL)
Data Manipulation Language (DML)
Inquiry Language (IQL)
Teleprocessing Interface (TP)
Definitions
• Views:
– Physical - how stored
– Logical - how viewed and used by users
• Schema - Overall logical layout of records
and fields in a database
• Subschema: Individual user’s logical
portion of database (view)
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Distributing Data
• Centralized files
• Fragemented files
– Distribute data without duplication
– Users unaware of where data located
Distributing Data
• Replicated files
– Data duplicated
– One site has master file
– Problem with data synchronization
• Decentralized files
– Local data autonomy
Distributing Data
• Distributed files
–
–
–
–
–
Client / server systems
Stored centrally
Portion downloaded to workstation
Workstation can change data
Changes uploaded to central computer
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Data Warehousing
• Collect large amounts of data from
multiple sources over several years
• Classify each record into multiple
categories
– Age
– Location
– Gender
Data Warehousing
• Rapidly select and retrieve by multiple
dimensions
– All females in Chicago under 25 years of
age
• Provide tailored, on-demand reports
• Data mart: a replicated subset of the data
warehouse
– A functional or regional area
Data Mining
• Fitting models to, or determining
patterns from, warehoused data
• Purposes:
– Analyze large amount of data
– Find critical points of knowledge
– Perform automatic analyses
Data Mining Terms
• Data Visualization
• Drill-down Analysis
– Hierarchical structure
– Leads to increasing level of detail
• Expert System (ES) methodology
– e.g., neural networks
Applications
•
•
•
•
•
Finance - fraud detection
Stock Market - forecasting
Real estate - property evaluation
Airlines - customer retention
Retail - customer targeting
Data Mining Example
• What type customers are buying specific
products?
• When are the times customers will most
likely shop?
• What types of products can be sold
together?
Points to Remember
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Discussion Questions
• How can a database help an organization?
• Why normalization is very important for
building a database?
• Do you see any problem of the database in
your organization?
Discussion Questions
• What kind of database model is most suitable for
– School?
– Department store?
– Police?
• Some organizations are hesitant to distribute data.
These organizations feel that they may lose control.
– Do they lose control? Why?
– Could you suggest a “good” tactic?
• Could Data Mining pose a threat to individual
privacy?
– Why or why not?
– If so, how can we mitigate that threat?
– Do the advantages outweigh the disadvantages?
Assignment
•
•
•
•
Review chapters 10
Read chapter 8, 9, and 11
Group assignment
Research paper