Transcript Slide 1
IELM 511: Information System design
Introduction
Part 1. ISD for well structured data – relational and other DBMS
Info storage (modeling, normalization)
Info retrieval (Relational algebra, Calculus, SQL)
DB integrated API’s
Part 2. ISD for systems with non-uniformly structured data
Basics of web-based IS (www, web2.0, …)
Markup’s, HTML, XML
Design tools for Info Sys: UML
Part III: (one out of)
API’s for mobile apps
Security, Cryptography
IS product lifecycles
Algorithm analysis, P, NP, NPC
Agenda
Brief introduction to DBMS architecture
Some other DB types
DB architecture components
Any DB has three (or four) essential components:
1. The process (DBMS application)
2. System memory (RAM on the computer where the process is running)
3. Permanent data store (usually on a Hard disk) and, (for most DBs),
4. A computer network.
Typical client-server DB model (e.g. the one we use in our labs)
client
cpu
client
disk
RAM
client
Why do we need any other model ?
DB architectures: need for other models
Very large number of queries per minute parallel systems
RAM
cpu
cpu
RAM
cpu
disk
RAM
cpu
disk
RAM
cpu
disk
cpu
RAM
RAM
disk
RAM
cpu
disk
Shared nothing
cpu
Shared disk
RAM
cpu
cpu
cpu
Shared memory
disk
disk
Note: a parallel DB may or may not store data
in different locations, or across a network.
DB architectures: need for other models..
Network is not reliable, but services are critical Distributed DB
(e.g. Park-n-Shop)
Distributed DB - a single logical database that is split into fragments, each
fragment is controlled by a separate DBMS
RAM
cpu
frag 1
RAM
cpu
RAM
cpu
frag n
frag 2
Each site can process queries that access local data as well as on other
computers in the network.
Data is distributed, but transparent to the user !
Distributed DBs, examples
A distributed system appears to the user as a “centralized system”
- Users do not need to worry about network details
- Users are not concerned with where the data is stored, or
redundantly stored data
- Tables may be stored in multiple fragments, but this is
transparent to the user
The DBMS must have special functions to provide this
functionality !
Distributed DBs, advantages
- Increased reliability and availability
- Improved performance of local data
- Easier expansion
Distributed DBs, examples
Extra functionality embedded in distributed DBMS:
Keeping track of distributed data
Distributed query processing
Distributed transaction management
Maintaining consistent copies of replicated data
Distributed database recovery
Security - user access authorization
Distributed catalog management
Examples of fragmented data:
- Some tables are only stored at some sites
- Vertical fragmenting of tables
e.g. some columns of a table in one site, others in another site.
- Horizontal fragmenting of tables
e.g. different rows stored in different sites.
Distributed DB examples
Multiple stores belonging to same retail chain (e.g. Park-n-Shop)
Multiple branches of same bank
Domain name service (DNS) for internet
Need for even more (other) models: Object Oriented DBs
Why have Object Oriented Databases ?
- Need for more complex applications
- Need for additional data modeling features
- Increased use of object-oriented programming languages since 1990
Commercial OO Database products:
Ontos, Gemstone, O2 ( Ardent), Objectivity, Objectstore ( Excelon),
Versant, Poet, Jasmine (Fujitsu-GM)
Object Oriented DBs
Main idea in OODB:
DB objects should have a direct correspondence to real-world objects
Advantage:
Objects maintain their integrity and identity ease of modeling, maintenance
Object is composed of:
Data (values of attributes) and
Behavior (methods or operations)
Relational DB: simple program objects (tables), data about single object may
be spread over multiple tables (e.g. account data in our Bank DB)
OODB: program objects can be arbitrarily complex; however, all data and
functions related to one object are stored together.
Some Key concepts of OODBs
Encapsulation
At the time when an object is defined, the user must define
- All data and it type
- All operations a user can apply to the object.
Contrast this with Relational DBs: Data is defined, but operators
are system functions, not specific to objects.
Operator Polymorphism
- Each object encapsulates its own methods
- Different objects may have some similar actions (e.g. subtract some amount
from a ‘loan’ object, or from an ‘account’ object.)
- Polymorphism allows same operator name to be used by different objects
(Note: actual functions are different, although the do similar things).
Constructors, Destructors
Object instances are created (equivalent of inserting row(s) in RDB) by constructors
and deleted (equivalent of deleting row(s) in RDB) by using destructors
Some Key concepts of OODBs…
Object hierarchy and inheritance
Objects can be organized in hierarchical structure (e.g. ‘account’ object
is a super-class of ‘savings_account’ and ‘checking_account’ objects).
Objects of a sub-class inherit attributes (and values) from parent classes in the
hierarchy.
Practical situation of OODBs
Commercial OO Database products:
Ontos, Gemstone, O2 ( Ardent), Objectivity,
Objectstore ( Excelon),
Versant, Poet, Jasmine (Fujitsu-GM)
Commercial success and penetration: < 1% of total market.
Possible Reasons:
OODBs were introduced in 1990s, by which time RDBs dominated most
markets. Switching costs too high. Operator efficiency cannot match RDB.
OODBs lack the simplicity and universality of SQL.
Oracle provides support for Object-Relational DB for special applications.
- Try to capture the best of RDB and OODB
Object-Relational DBs
Main features:
- User-Defined Types, Object ID’s, Nested Tables
No standard implementation among different DB vendors.
Most common interface standard: SQL-99
User Defined Types (UDT):
CREATE TYPE <typename> AS ( attribute_1 data-type_1, … );
Subsequently, a table may be defined in terms of UDT’s:
CREATE TABLE <table name> OF <typename>;
UDT and nested tables allow design of DB to appear more like real-world
objects (internally, the DB, e,g, Oracle, may convert these into regular tables.)
Spatial and Temporal Databases
The most recent advances in Data storage field are in areas of
- Spatial Databases
- Temporal Databases
Spatial Databases
Motivating example 1: Google maps or GPS programs
- Storage: a ‘map’, possibly with different models, e.g. terrain, road,…
- Queries:
Find object of type x ‘near’ point p;
Find shortest route from point p to point q;
Is point p in zone (e.g. district, or country) z ?
Motivating example 2: DB models for medical applications: CT scans
- Storage: CT scan of a human brain
- Queries:
Find a path from point p to point q along artery A;
Find cell-cluster of type tumor_x;
Spatial databases..
Provide Spatial Data Types (SDT) in model and in query language
- Point, Line, Region
- Relationship(s) between them: point p is on line L
DBMS provides support for SDTs:
- Spatial indexing (for quickly locating, e.g., point in region)
- Spatial joins
Spatial databases..
Example 1. (fast response with spatial indexing):
Find all electronics factories in PRD area
SELECT fname FROM factories f WHERE f.location inside PRD.area
Example 2. Spatial join: a join that compares any two joined objects based on
a predicate on their spatial attribute values
For each highway passing through PRD, find all factories within < 2 Km.
SELECT h.highway, f.fname
FROM highways h, factories f
WHERE h.route intersects PRD.area
and distance( h.route, f.location) < 2 Km
Temporal databases
Most DBs record data; if the data is available in the DB, then it is ‘true’
if not, then it is ‘not true’.
Temporal DBs record not only data, but specifically store validity
time window for all data. Thus, each data record has two time stamps:
- Transaction time
- Valid time
Motivating example 1: Internet games, e.g. second life
- Storage: similar to RDB, but with additional valid time for each cell.
- Queries:
Was Anton in coffee_shop at same time as Dave?
Concluding remarks
Other than RDBM, several other DB types have been used successfully.
Advantages of these types depend on the usage:
When data is handled by many geographically separated, localized operations,
it may be better to use Distributed DBs
When the application is space/geography related, instead of building
special APIs, Spatial DBs may be used.
When data validity and time of events is important, Temporal DBs may
be useful (e.g. internet games, cyber-crime detections, …)
References and Further Reading
Chaps. 16, 18, 21
Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill
Chaps. 20, 24, 25, 27
Elmasri and Navathe, Fundamentals of Database Systems, Addison-Wesley
Next: IS for non-structured data