Transcript Document
Distributed DBMSs
• A distributed database is a single logical
database that is physically distributed to
computers on a network.
• Homogeneous DDBMS has the same local
DBMS at each site.
• Heterogeneous DDBMS has at least two
sites where the local DBMSs are different.
Characteristics of
Distributed DBMSs
• Location transparency feels to a user as though the
entire database is at their location.
• Replication transparency is where the user is
unaware of the behind the scenes replication of the
data.
• Fragmentation transparency is where a local object
can be divided among the various locations on the
network.
Advantages of Distributed Databases
•
•
•
•
Local control of data
Increasing database capacity
System availability
Added efficiency
Disadvantages of Distributed Databases
•
•
•
•
•
Update of replicated data
More complex query processing
More complex treatment of shared update
More complex recovery measure
More difficult management of data
dictionary
• More complex data design
File Servers
• File server contained files required by the
individual workstations on the network.
Client/Server Systems
• Client/Server has the DBMS run on the file
server, but the user sends requests for specific
data, not files.
Advantages of
Client/Server Systems
• More efficient than file server systems.
• Possibility of distributing work among
several processors.
• Workstations need not be as powerful.
• The user doesn’t need to learn any special
commands or techniques.
Advantages of
Client/Server Systems
• Easier for users to access data from a
variety of sources.
• Provides greater level of security than file
server systems.
• Powerful enough to replace expensive
mainframe applications.
Data Warehouses
•
A subject-oriented, integrated, timevariant, nonvolatile collection of data
in support of management’s
decision-making process.
Data Warehouse Architecture
Data Warehouse Structure
Why build a Data Warehouse?
• To speed up the writing and maintaining of queries
and reports by technical personnel
• To more easily query and report data, on a regular
basis, from multiple transaction processing
systems and/or from external data sources
• To provide a repository of transaction processing
system data that contains data over a span of time
Why build a Data Warehouse
• To address security concerns
• To provide a repository of "cleaned up"
transaction processing systems data that can
be reported against and that does not
necessarily require fixing the transaction
processing systems
Data errors
• Incomplete
– Missing records/fields
• Incorrect
– Wrong codes (or incorrect pairing of codes)
• Incomprehensible
– Multiple fields in one field
– Many to many relationships
– Spreadsheet and word-processing files
Data Errors
• Inconsistent
–
–
–
–
–
Use / meaning of codes
Business rules
Timing
Use of attributes
Use of nulls/spaces
Data Mining
•
•
•
•
Identify the goal
Assemble the relevant data
Choose your analysis methods
Decide which software tool is best for
implementing the method
• Run the analysis
• Decide how to implement the results
Organizational Databases
• Operational Database
– organized about a
transaction
– supports OLTP (record
keeping)
– thousands of users
– accesses few records at
a time
– response time in
seconds
• Data Warehouses
– organized about a
subject
– supports OLAP
(decision support)
– few hundred users
– accesses many records
at a time
– response times in
minutes
Organizational Databases
• Operational Database
– primitive & detailed
– smaller (current)
– highly normalized
(many tables with few
columns)
– dynamic (continuous
updates online)
• Data Warehouses
– derived & summarized
– larger (historical)
– de-normalized (few
tables with many
columns)
– periodic (batch update)