Chapter 15 - Emunix Emich

Download Report

Transcript Chapter 15 - Emunix Emich

Chapter 15
Sharing Enterprise Data
Fundamentals, Design,
and Implementation, 9/e
Database Processing
Architectures
 System architectures for enterprise
database processing:
– Teleprocessing system
– Client-server system
– File-sharing system
– Distributed system
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/2
Teleprocessing Systems
 Classic architecture for multi-user database
processing
 Users operate dumb terminals or PC that
emulate dumb terminals
– User interface is usually simple and primitive
 A single centralized computer processes
communications control program,
application programs, DBMS, and
operating system
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/3
Teleprocessing Systems
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/4
Client-Server Systems
 A client-server system consists of a
network of computers connected via a LAN
 Clients are personal computers used to
process application programs
 Servers are PCs or mainframes that stores
DBMS and the data-management portion
of the operating system
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/5
Client-Server Systems
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/6
File-Sharing Systems
 This architecture was developed before the clientserver architecture
 File server and user computers are connected
through LAN
– File server provides access to files and other resources
– User computers must contain a copy of DBMS and
application programs
 DBMS on user’s computer sends requests to the
data management portion of the operating system
on the file server for file-level processing
– This cause more traffic across LAN than the client-server
system
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/7
File-Sharing Systems
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/8
Distributed Database Systems
 Distributed database systems use multiple
computers to process the same database
 Distributed processing: use multiple
computers for applications or DBMS
processing
– E.g., file-sharing, client-server, and distributed
database system
 Distributed database processing: distribute
database to multiple computers
– E.g., distributed database system
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/9
Distributed Database Systems
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/10
Database Partitioning
 A vertical partition, or vertical fragment,
refers to a table that is broken into two or
more sets of columns
 A horizontal partition, or horizontal
fragment, refers to a table that is broken
into two or more sets of rows
 Mixed partition refers to a database broken
into both horizontal and vertical partitions
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/11
Types of Distributed Databases
 Types of distributed database:
–
–
–
–
Nonpartitioned, nonreplicate
Partitioned, nonreplicated
Nonpartitioned, replicated
Partitioned, replicated
 The greater the degree of partitioning and
replication
– The greater the flexibility, independence, and reliability
– The greater the expense, control difficulty, and security
problems
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/12
Types of Distributed Databases
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/13
Comparing DB Distribution
Alternatives
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/14
Distributed Processing
Techniques
 Three types of distributed database processing
 Downloading of read-only data: only one computer
updates data, but multiple computers are sent
copies to process
 Updating by a designated computer: allows data
update requests to originate on multiple
computers, but to transmit those update requests
to a designated computer for processing
– Database at the designated computers must be
periodically synchronized
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/15
Distributed Processing
Techniques (cont.)
 Updating by multiple computers: allows multiple
updates on the same data at multiple sites
– Three types of distributed update conflict can occur:
• Loss of uniqueness
• Lost updates due to concurrent transactions
• Updates of deleted data
 Coordinating distributed atomic transactions is
difficult and requires a two-phase commit
 The OLE Distributed Transaction Server and Java
Enterprise Beans are two technologies for dealing
with these problems
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/16
Downloading Data
 Powerful personal computers enable user
to download enterprise data for local
processing
 Users can query and report on downloaded
data using DBMS products on their
machines
 Normally, users are not allowed to update
and return data to prevent data integrity
problems
 A Web server can be used to publish
downloaded data
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/17
Potential Problems of
Downloaded Databases
 Coordination
– Downloaded data must conform to database
constraints
– Local updates must be coordinated with
downloads
 Consistency
– In general, downloaded data should not be
updated
– Applications need features to prevent updating
– Users should be made aware of possible
problems
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/18
Potential Problems of
Downloaded Databases (cont.)
 Access Control
– Data may be replicated on many computers
– Procedures to control data access are more
complicated
 Potential for Computer Crime
– Illegal copying is difficult to prevent
– Diskettes and illegal online access are easy to
conceal
– Risk may prevent the development of
downloaded data applications
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/19
Processing Downloaded Data
with a Web Server
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/20
OLAP
 On Line Analytical Processing (OLAP) is a
new way of presenting information
 With it, data is viewed in cubes that have
axes, dimensions, measures, slices, and
levels
 Cube refers to
– Underlying semantic structure that is used to
interpret data
– A particular materialization of data in such a
semantic structure
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/21
Example: Relational Source Data
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/22
Example: OLAP Cube
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/23
OLAP Terminology
 OLAP
hypercube:
means a
data display
with an
unlimited
number of
axes
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/24
Example: Cube Data Definition
CREATE CUBE HousingSalesCube (
DIMENSION Time TYPE TIME,
LEVEL Year TYPE YEAR,
LEVEL Quarter TYPE QUARTER,
LEVEL Month TYPE MONTH,
DIMENSION Location,
LEVEL USA TYPE ALL,
LEVEL State,
LEVEL City,
DIMENSION HousingCategory,
DIMENSION HousingType,
MEASURE SalesPrice,
FUNCTION AVG
MEASURE AskingPrice,
FUNCTION AVG
)
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/25
Example: Multidimensional
SELECT Statement
SELECT CROSSJOIN
({Existing Structure, New Construction},
{California.Children, Nevada})
ON COLUMNS,
{1998.Q1.Children, 1998.Q2, 1998.Q3, 1998.Q4,
1999.Q1.Children, 1999.Q2, 1999.Q3, 1999.Q4}
ON ROWS
FROM HousingSalesCube
WHERE (SalesPrice, HousingType =
‘SingleFamily’)
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/26
OLAP Schema Structures
 Star schema: every dimension table is adjacent to
the table storing the measure values
– These tables may or may not be normalized
 Snowflake schema: there can be multilevel,
normalized tables
 In general, the star schema requires greater
storage, but it is faster to process than the
snowflake schema
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/27
Example: Star Schema
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/28
Example: Snowflake Schema
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/29
OLAP Storage Alternatives
 Three different means for storing OLAP data
 ROLAP (relational OLAP): relational DBMS with
extensions is sufficient to meet OLAP
requirements
 MOLAP (multidimensional OLAP): a specialized
multidimensional processor is necessary to
produce acceptable OLAP performance
 HOLAP (hybrid OLAP): both DBMS products and
specialized OLAP engines have a role and can be
used to advantage
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/30
Microsoft OLAP Architecture
 Microsoft has extended OLE DB and ADO for OLAP
 OLE DB for OLAP includes
– A dataset object
– ADO MD: new objects for processing dataset objects in
ways similar to recordset objects
– Pivot Table Service: an OLAP processor available as Office
2000/XP and Windows 2000/XP service
 Microsoft’s architecture moves much OLAP
processing to client computers
 Whether this will be acceptable for the processing of
data on enterprise servers is as yet unknown
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/31
Microsoft OLAP Architecture
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/32
Data Warehouse
 A data warehouse is a store of enterprise
data that is designed to facilitate
management decision-making
 Goal: to increase the value of the
organization’s data asset
 Role: to store extracts from operational
data and make those extracts available to
users in a useful format
– Data can be extracts from databases, files,
images, recordings, photos, external data, etc.
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/33
Data Warehouse
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/34
Data Warehouse Components









Data extraction tools
Extracted data
Metadata of warehouse contents
Warehouse DBMS(s) and OLAP servers
Warehouse data management tools
Data delivery programs
End-user analysis tools
User training courses and materials
Warehouse consultants
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/35
Data Warehouse Requirements





Queries and reports with variable structure
User-specified data aggregation
User-specified drill down
Graphical outputs
Integration with domain-specific programs
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/36
Challenges for Data Warehouses
 Inconsistent data
– When data are integrated, inconsistencies can
develop due to timing and domain differences
– Solution: create metadata to describe both
timing and domains of source data
 Tool Integration
– Because of the many tools required in a data
warehouse, tools will have different user
interfaces and inconsistent means of importing
and exporting data, and it may be difficult to
obtain technical support
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/37
Challenges for Data Warehouses
 Lack of tools for managing the data
warehouse
– The organization may have to develop its own
tools for managing non-relational data and for
maintaining appropriate metadata. Such
development is difficult and expensive
 Ad hoc nature of requirements
– Such requests are difficult to satisfy
– Solution: create datamart, i.e,, limited-scope
data warehouses
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/38
Data Marts
 A data mart is a limited-scope data
warehouse
 A data mart is easier to manage than the
enterprise data warehouse because
– It has a much smaller domain
– It can be restricted
• To a particular type of input data
• To a particular business function
• To a particular business unit or geographic area
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/39
Enterprise Data Sharing
Continuum
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/40
Data Administration
 Data are an important organizational asset
that can support both operations and
management decisionmaking
 The purpose of offices of data
administration is to guard and protect the
data and to ensure that they are used
effectively
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/41
Data Administration Challenges




Many types of data exist
Basic categories of data are not obvious
The same data can have many names
The same data can have many
descriptions and formats
 Data are changed often concurrently
 Political-organizational issues complicate
operational issues
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/42
Functions of Data Administration
 Marketing
– Communicate existence of data administration to
organization
– Explain reason for existence of standards, policies, and
guidelines
– Describe in a positive light the services provided
 Data Standards
– Establish standard means for describing data items.
Standards include name, definition, description,
processing restrictions, etc.
– Establish data proponents
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/43
Functions of Data Administration
 Data Policies
– Establish organization-wide data policy, e.g., security,
data proponency, and distribution
 Forum for Data Conflict Resolution
– Establish procedures for reporting conflicts
– Provide means for hearing all perspectives and views
– Have authority to make decision to resolve conflict
 Return on Organization's Data Investment
– Focus attention on value of data investment
– Investigate new methodologies and technologies
– Take proactive attitude toward information management
Copyright © 2004 Database Processing: Fundamentals, Design, and Implementation, 9/e
by David M. Kroenke
Chapter 15/44
Chapter 15
Sharing Enterprise Data
Fundamentals, Design,
and Implementation, 9/e