Slides from Lecture 24 - Courses - University of California, Berkeley

Download Report

Transcript Slides from Lecture 24 - Courses - University of California, Berkeley

Object-Oriented, Intelligent and
Object-Relational Database Models
University of California, Berkeley
School of Information Management
and Systems
SIMS 257: Database Management
IS 257 – Spring 2004
2004.04.20 - SLIDE 1
Lecture Outline
• Review
– Applications for Data Warehouses
– Data Mining
• Thanks again to lecture notes from Joachim
Hammer of the University of Florida
•
•
•
•
Object Oriented DBMS
Inverted File and Flat File DBMS
Object-Relational DBMS (revisited)
Intelligent DBMS
IS 257 – Spring 2004
2004.04.20 - SLIDE 2
Lecture Outline
• Review
– Applications for Data Warehouses
– Data Mining
• Thanks again to lecture notes from Joachim
Hammer of the University of Florida
•
•
•
•
Object Oriented DBMS
Inverted File and Flat File DBMS
Object-Relational DBMS (revisited)
Intelligent DBMS
IS 257 – Spring 2004
2004.04.20 - SLIDE 3
What is Decision Support?
• Technology that will help managers and
planners make decisions regarding the
organization and its operations based on
data in the Data Warehouse.
– What was the last two years of sales volume
for each product by state and city?
– What effects will a 5% price discount have on
our future income for product X?
• Increasing common term is KDD
– Knowledge Discovery in Databases
IS 257 – Spring 2004
2004.04.20 - SLIDE 4
Conventional Query Tools
• Ad-hoc queries and reports using
conventional database tools
– E.g. Access queries.
• Typical database designs include fixed
sets of reports and queries to support
them
– The end-user is often not given the ability to
do ad-hoc queries
IS 257 – Spring 2004
2004.04.20 - SLIDE 5
OLAP
• Online Line Analytical Processing
– Intended to provide multidimensional views of
the data
– I.e., the “Data Cube”
– The PivotTables in MS Excel are examples of
OLAP tools
IS 257 – Spring 2004
2004.04.20 - SLIDE 6
Data Cube
IS 257 – Spring 2004
2004.04.20 - SLIDE 7
Operations on Data Cubes
• Slicing the cube
– Extracts a 2d table from the multidimensional
data cube
– Example…
• Drill-Down
– Analyzing a given set of data at a finer level of
detail
IS 257 – Spring 2004
2004.04.20 - SLIDE 8
Star Schema
• Typical design for the derived layer of a
Data Warehouse or Mart for Decision
Support
– Particularly suited to ad-hoc queries
– Dimensional data separate from fact or event
data
• Fact tables contain factual or quantitative
data about the business
• Dimension tables hold data about the
subjects of the business
• Typically there is one Fact table with
multiple dimension tables
IS 257 – Spring 2004
2004.04.20 - SLIDE 9
Star Schema for multidimensional data
Order
OrderNo
OrderDate
…
Customer
CustomerName
CustomerAddress
City
…
Salesperson
SalespersonID
SalespersonName
City
Quota
IS 257 – Spring 2004
Fact Table
OrderNo
Salespersonid
Customerno
ProdNo
Datekey
Cityname
Quantity
TotalPrice
Product
ProdNo
ProdName
Category
Description
…
City
CityName
State
Country
…
Date
DateKey
Day
Month
Year
…
2004.04.20 - SLIDE 10
Data Mining
• Data mining is knowledge discovery rather
than question answering
– May have no pre-formulated questions
– Derived from
• Traditional Statistics
• Artificial intelligence
• Computer graphics (visualization)
IS 257 – Spring 2004
2004.04.20 - SLIDE 11
Goals of Data Mining
• Explanatory
– Explain some observed event or situation
• Why have the sales of SUVs increased in California but not
in Oregon?
• Confirmatory
– To confirm a hypothesis
• Whether 2-income families are more likely to buy family
medical coverage
• Exploratory
– To analyze data for new or unexpected relationships
• What spending patterns seem to indicate credit card fraud?
IS 257 – Spring 2004
2004.04.20 - SLIDE 12
Data Mining Applications
•
•
•
•
•
•
Profiling Populations
Analysis of business trends
Target marketing
Usage Analysis
Campaign effectiveness
Product affinity
IS 257 – Spring 2004
2004.04.20 - SLIDE 13
Data Mining Algorithms
•
•
•
•
•
Market Basket Analysis
Memory-based reasoning
Cluster detection
Link analysis
Decision trees and rule induction
algorithms
• Neural Networks
• Genetic algorithms
IS 257 – Spring 2004
2004.04.20 - SLIDE 14
Market Basket Analysis
• A type of clustering used to predict
purchase patterns.
• Identify the products likely to be purchased
in conjunction with other products
– E.g., the famous (and apocryphal) story that
men who buy diapers on Friday nights also
buy beer.
IS 257 – Spring 2004
2004.04.20 - SLIDE 15
Memory-based reasoning
• Use known instances of a model to make
predictions about unknown instances.
• Could be used for sales forcasting or fraud
detection by working from known cases to
predict new cases
IS 257 – Spring 2004
2004.04.20 - SLIDE 16
Cluster detection
• Finds data records that are similar to each
other.
• K-nearest neighbors (where K represents
the mathematical distance to the nearest
similar record) is an example of one
clustering algorithm
IS 257 – Spring 2004
2004.04.20 - SLIDE 17
Link analysis
• Follows relationships between records to
discover patterns
• Link analysis can provide the basis for
various affinity marketing programs
• Similar to Markov transition analysis
methods where probabilities are calculated
for each observed transition.
IS 257 – Spring 2004
2004.04.20 - SLIDE 18
Decision trees and rule induction algorithms
• Pulls rules out of a mass of data using
classification and regression trees (CART)
or Chi-Square automatic interaction
detectors (CHAID)
• These algorithms produce explicit rules,
which make understanding the results
simpler
IS 257 – Spring 2004
2004.04.20 - SLIDE 19
Neural Networks
• Attempt to model neurons in the brain
• Learn from a training set and then can be
used to detect patterns inherent in that
training set
• Neural nets are effective when the data is
shapeless and lacking any apparent
patterns
• May be hard to understand results
IS 257 – Spring 2004
2004.04.20 - SLIDE 20
Genetic algorithms
• Imitate natural selection processes to
evolve models using
– Selection
– Crossover
– Mutation
• Each new generation inherits traits from
the previous ones until only the most
predictive survive.
IS 257 – Spring 2004
2004.04.20 - SLIDE 21
Lecture Outline
• Review
– Applications for Data Warehouses
– Data Mining
• Thanks again to lecture notes from Joachim
Hammer of the University of Florida
•
•
•
•
Object Oriented DBMS
Inverted File and Flat File DBMS
Object-Relational DBMS (revisited)
Intelligent DBMS
IS 257 – Spring 2004
2004.04.20 - SLIDE 22
Object-Oriented DBMS Basic Concepts
• Each real-world entity is modeled by an
object. Each object is associated with a
unique identifier (sometimes call the object
ID or OID)
IS 257 – Spring 2004
2004.04.20 - SLIDE 23
Object-Oriented DBMS Basic Concepts
• Each object has a set of instance
attributes (or instance variables) and
methods.
– The value of an attribute can be an object or
set of objects. Thus complex object can be
constructed from aggregations of other
objects.
– The set of attributes of the object and the set
of methods represent the object structure and
behavior, respectively
IS 257 – Spring 2004
2004.04.20 - SLIDE 24
Object-Oriented DBMS Basic Concepts
• The attribute values of an object represent
the object’s status.
– Status is accessed or modified by sending
messages to the object to invoke the
corresponding methods
IS 257 – Spring 2004
2004.04.20 - SLIDE 25
Object-Oriented DBMS Basic Concepts
• Objects sharing the same structure and
behavior are grouped into classes.
– A class represents a template for a set of
similar objects.
– Each object is an instance of some class.
IS 257 – Spring 2004
2004.04.20 - SLIDE 26
Object-Oriented DBMS Basic Concepts
• A class can be defined as a specialization
of of one or more classes.
– A class defined as a specialization is called a
subclass and inherits attributes and methods
from its superclass(es).
IS 257 – Spring 2004
2004.04.20 - SLIDE 27
Object-Oriented DBMS Basic Concepts
• An OODBMS is a DBMS that directly
supports a model based on the objectoriented paradigm.
– Like any DBMS it must provide persistent
storage for objects and their descriptions
(schema).
– The system must also provide a language for
schema definition and and for manipulation of
objects and their schema
– It will usually include a query language,
indexing capabilities, etc.
IS 257 – Spring 2004
2004.04.20 - SLIDE 28
Generalization Hierarchy
employee
Employee No
Name
Address
Date hired
Date of Birth
calculateAge
Hourly
Salaried
consultant
Hourly Rate
Annual Salary
Stock Option
Contract No.
Date Hired
calculateWage
calculateStockBenefit
AllocateToContract
IS 257 – Spring 2004
2004.04.20 - SLIDE 29
OODBMS
• Many available commercially:
– Gemstone, Polyhedra, Objectivity/DB,
MetaKit, ObjectDB, etc.
• Many Open Source:
– SHORE, GOODS (Generic Object Oriented
Database System), The Zope Object
DataBase (ZODB), Ozone, etc.
• If interested in finding more about oodbms
– See http://cbbrowne.com/info/oodbms.html
IS 257 – Spring 2004
2004.04.20 - SLIDE 30
Example: Ozone
• Version 1 of the MMM datastore used for
the phone project in 202 this year was
based on Ozone.
• “The Ozone Database Project is a open
initiative for the creation of an open
source, Java based, object-oriented
database management system.”
• Definitely a “work in progress”
IS 257 – Spring 2004
2004.04.20 - SLIDE 31
Example: Ozone
• “ozone is a fully featured, object-oriented database management
system completely implemented in Java and distributed under an
open source license. The ozone project aims to evolve a database
system that allows developers to build pure object-oriented, pure
Java database applications. Just program your Java objects and let
them run in a transactional database environment.”
• “ozone includes a fully W3C compliant DOM implementation that
allows you to store XML data. You can use any XML tool to provide
and access these data. Support classes for Apache Xerces-J and
Xalan-J are included.”
• “Besides the native API, ozone provides a ODMG 3.0 interface.
Although not fully ODMG compliant it helps you to port applications
to/from ozone.”
• “ozone does not depend on any back-end database or mapping
technology to actually save objects. It contains its own clustered
storage and cache system to handle persistent Java objects. “
• From http://www.ozone-db.org/frames/home/what.html
IS 257 – Spring 2004
2004.04.20 - SLIDE 32
Example: Ozone
•
•
•
•
Database objects are the persistent objects designed by developers to fullfill their
application logic needs. Database objects implement a given interface (in more
concrete terms, a Java interface that extends org.ozoneDB.OzoneRemote), and this
interface is the "visible" side of database objects. There is only one instance of a
database object, which lives inside the database server. This database object is
controlled via proxy objects.
A given proxy object represents its corresponding database object - inside the client
applications and inside other database objects. A proxy object can be seen as a
persistent reference. Proxy classes are automatically generated out of the database
classes by the Ozone post-processor and implement the same public interface as
their respective database object counterpart - which means that they also implement
the OzoneRemote interface that their corresponding database object implements.
All ozone API methods return proxies for the actual database object inside the
database. Therefore, the client deals with proxies only. However, this is transparent to
the client: proxies can be used as if they were the actual database objects, since they
implement the same interface.
Database objects are different from ordinary Java objects (other systems and specs,
like JDO, respectively call them "primary" and "secondary", or "first-class" and
"second-class"). Only one instance of a given database object reference exists in the
database, as opposed to standard Java objects, which are treated in a "by-copy"
fashion each time they are serialized. By analogy, database objects are a bit like rows
in a relational database table, and members of these database objects that are
standard Java objects correspond to the columns in the row - database object
members would correspond to links to other tables, if we push the analogy.
•
From: http://sourceforge.net/docman/display_doc.php?docid=10743&group_id=39695
IS 257 – Spring 2004
2004.04.20 - SLIDE 33
Example: Ozone
Ozone Architecture
From: http://sourceforge.net/docman/display_doc.php?docid=10743&group_id=39695
IS 257 – Spring 2004
2004.04.20 - SLIDE 34
Example: Ozone
•
•
The Ozone architecture, very generally represented by the preceding
diagram, has four main layers:
Client
– This is the client application area; the client obtains a connection to an Ozone
server, connection that can be shared by many threads. The client application
interacts with the database API to create, delete, update and search persistent
objects in the underlying Ozone storage
•
Network
– The network layer is where the Ozone protocol plays a role similar to RMI. It
carries method invocation information targeted at persistent objects, in addition
to all other commands relayed to the Ozone server.
•
Server
– The server manages client connections, security, transactions, and incoming
method invocations from the clients. If required, it is in charge of invoking
methods on persistent objects, therefore tightly interacting with the underlying
object storage facility. The server maintains a transactionally safe environment
for multiple clients that access persistent objects through a remote proxy.
•
Storage
– The storage system is always accessed through an Ozone server. The storage is
responsible for object persistence, clustering, object identification, and other task
pertaining to low-level database-like operations.
• From: http://sourceforge.net/docman/display_doc.php?docid=10743&group_id=39695
IS 257 – Spring 2004
2004.04.20 - SLIDE 35
Lecture Outline
• Review
– Applications for Data Warehouses
– Data Mining
• Thanks again to lecture notes from Joachim
Hammer of the University of Florida
•
•
•
•
Object Oriented DBMS
Inverted File and Flat File DBMS
Object-Relational DBMS (revisited)
Intelligent DBMS
IS 257 – Spring 2004
2004.04.20 - SLIDE 36
Inverted File DBMS
• Usually similar to Hierarchic DBMS in
record structure
– Support for repeating groups of fields and
multiple value fields
• All access is via inverted file indexes to
DBS specified fields.
• Examples: ADABAS DBMS from Software
AG -- used in the MELVYL system
IS 257 – Spring 2004
2004.04.20 - SLIDE 37
Flat File DBMS
• Data is stored as a simple file of records.
– Records usually have a simple structure
• May support indexing of fields in the
records.
– May also support scanning of the data
• No mechanisms for relating data between
files.
• Usually easy to use and simple to set up
IS 257 – Spring 2004
2004.04.20 - SLIDE 38
Lecture Outline
• Review
– Applications for Data Warehouses
– Data Mining
• Thanks again to lecture notes from Joachim
Hammer of the University of Florida
•
•
•
•
Object Oriented DBMS
Inverted File and Flat File DBMS
Object-Relational DBMS (revisited)
Intelligent DBMS
IS 257 – Spring 2004
2004.04.20 - SLIDE 39
Object Relational Databases
• Began with UniSQL/X unified objectoriented and relational system
• Some systems (like OpenODB from HP)
were Object systems built on top of
Relational databases.
• Miro/Montage/Illustra built on Postgres.
• Informix Buys Illustra. (DataBlades)
• Oracle Hires away Informix Programmers.
(Cartridges)
IS 257 – Spring 2004
2004.04.20 - SLIDE 40
PostgreSQL
• Derived from POSTGRES
– Developed at Berkeley by Mike Stonebraker
and his students (EECS) starting in 1986
• Postgres95
– Andrew Yu and Jolly Chen adapted
POSTGRES to SQL and greatly improved the
code base
• PostgreSQL
– Name changed in 1996, and since that time
the system has been expanded to support
most SQL92 and many SQL99 features
IS 257 – Spring 2004
2004.04.20 - SLIDE 41
Object Relational Data Model
• Class, instance, attribute, method, and
integrity constraints
• OID per instance
• Encapsulation
• Multiple inheritance hierarchy of classes
• Class references via OID object
references
• Set-Valued attributes
• Abstract Data Types
IS 257 – Spring 2004
2004.04.20 - SLIDE 42
PostgreSQL Classes
• The fundamental notion in Postgres is that of a
class, which is a named collection of object
instances. Each instance has the same
collection of named attributes, and each attribute
is of a specific type. Furthermore, each instance
has a permanent object identifier (OID) that is
unique throughout the installation. Because SQL
syntax refers to tables, we will use the terms
table and class interchangeably. Likewise, an
SQL row is an instance and SQL columns are
attributes.
IS 257 – Spring 2004
2004.04.20 - SLIDE 43
Creating a Class
• You can create a new class by specifying the
class name, along with all attribute names and
their types:
CREATE TABLE weather (
city
varchar(80),
temp_lo
int,
-- low temperature
temp_hi
int,
-- high temperature
prcp
real,
-- precipitation
date
date
);
IS 257 – Spring 2004
2004.04.20 - SLIDE 44
PostgreSQL
• Postgres can be customized with an arbitrary
number of user-defined data types.
Consequently, type names are not syntactical
keywords, except where required to support
special cases in the SQL92 standard.
• So far, the Postgres CREATE command looks
exactly like the command used to create a table
in a traditional relational system. However, we
will presently see that classes have properties
that are extensions of the relational model.
IS 257 – Spring 2004
2004.04.20 - SLIDE 45
PostgreSQL
• All of the usual SQL commands for
creation, searching and modifying classes
(tables) are available. With some
additions…
• Inheritance
• Non-Atomic Values
• User defined functions and operators
IS 257 – Spring 2004
2004.04.20 - SLIDE 46
Inheritance
CREATE TABLE cities (
name
text,
population
float,
altitude
int -- (in ft)
);
CREATE TABLE capitals (
state
char(2)
) INHERITS (cities);
IS 257 – Spring 2004
2004.04.20 - SLIDE 47
Inheritance
• In Postgres, a class can inherit from zero
or more other classes.
• A query can reference either
– all instances of a class
– or all instances of a class plus all of its
descendants
IS 257 – Spring 2004
2004.04.20 - SLIDE 48
Inheritance
• For example, the following query finds all the
cities that are situated at an attitude of 500ft or
higher:
SELECT name, altitude
FROM cities
WHERE altitude > 500;
+----------+----------+
|name
| altitude |
+----------+----------+
|Las Vegas | 2174 |
+----------+----------+
|Mariposa | 1953 |
+----------+----------+
IS 257 – Spring 2004
2004.04.20 - SLIDE 49
Inheritance
• On the other hand, to find the names of all cities,
including state capitals, that are located at an
altitude over 500ft, the query is:
SELECT c.name, c.altitude
FROM cities* c
WHERE c.altitude > 500;
which returns:
+----------+----------+
|name
| altitude |
+----------+----------+
|Las Vegas | 2174 |
+----------+----------+
|Mariposa | 1953 |
+----------+----------+
|Madison | 845
|
+----------+----------+
IS 257 – Spring 2004
2004.04.20 - SLIDE 50
Inheritance
• The "*" after cities in the preceding query
indicates that the query should be run over
cities and all classes below cities in the
inheritance hierarchy
• Many of the PostgreSQL commands
(SELECT, UPDATE and DELETE, etc.)
support this inheritance notation using "*"
IS 257 – Spring 2004
2004.04.20 - SLIDE 51
Non-Atomic Values
• One of the tenets of the relational model is
that the attributes of a relation are atomic
– I.e. only a single value for a given row and
column
• Postgres does not have this restriction:
attributes can themselves contain subvalues that can be accessed from the
query language
– Examples include arrays and other complex
data types.
IS 257 – Spring 2004
2004.04.20 - SLIDE 52
Non-Atomic Values - Arrays
• Postgres allows attributes of an instance to be
defined as fixed-length or variable-length multidimensional arrays. Arrays of any base type or
user-defined type can be created. To illustrate
their use, we first create a class with arrays of
base types.
CREATE TABLE SAL_EMP (
name
text,
pay_by_quarter int4[],
schedule
text[][]
);
IS 257 – Spring 2004
2004.04.20 - SLIDE 53
Non-Atomic Values - Arrays
• The preceding SQL command will create a class
named SAL_EMP with a text string (name), a
one-dimensional array of int4 (pay_by_quarter),
which represents the employee's salary by
quarter and a two-dimensional array of text
(schedule), which represents the employee's
weekly schedule
• Now we do some INSERTSs; note that when
appending to an array, we enclose the values
within braces and separate them by commas.
IS 257 – Spring 2004
2004.04.20 - SLIDE 54
Inserting into Arrays
INSERT INTO SAL_EMP
VALUES ('Bill',
'{10000, 10000, 10000, 10000}',
'{{"meeting", "lunch"}, {}}');
INSERT INTO SAL_EMP
VALUES ('Carol',
'{20000, 25000, 25000, 25000}',
'{{"talk", "consult"}, {"meeting"}}');
IS 257 – Spring 2004
2004.04.20 - SLIDE 55
Querying Arrays
• This query retrieves the names of the employees
whose pay changed in the second quarter:
SELECT name
FROM SAL_EMP
WHERE SAL_EMP.pay_by_quarter[1] <>
SAL_EMP.pay_by_quarter[2];
+------+
|name |
+------+
|Carol |
+------+
IS 257 – Spring 2004
2004.04.20 - SLIDE 56
Querying Arrays
• This query retrieves the third quarter pay of all
employees:
SELECT SAL_EMP.pay_by_quarter[3] FROM
SAL_EMP;
+---------------+
|pay_by_quarter |
+---------------+
|10000
|
+---------------+
|25000
|
+---------------+
IS 257 – Spring 2004
2004.04.20 - SLIDE 57
Querying Arrays
• We can also access arbitrary slices of an array,
or subarrays. This query retrieves the first item
on Bill's schedule for the first two days of the
week.
SELECT SAL_EMP.schedule[1:2][1:1]
FROM SAL_EMP
WHERE SAL_EMP.name = 'Bill';
+-------------------+
|schedule
|
+-------------------+
|{{"meeting"},{""}} |
+-------------------+
IS 257 – Spring 2004
2004.04.20 - SLIDE 58
Lecture Outline
• Review
– Applications for Data Warehouses
– Data Mining
• Thanks again to lecture notes from Joachim
Hammer of the University of Florida
•
•
•
•
Object Oriented DBMS
Inverted File and Flat File DBMS
Object-Relational DBMS (revisited)
Intelligent DBMS
IS 257 – Spring 2004
2004.04.20 - SLIDE 59
Intelligent Database Systems
• Intelligent DBS are intended to handle
more than just data, and may be used in
tasks involving large amounts of
information where analysis and “discovery”
are needed.
The following is based on “Intelligent Databases” by Kamran Parsaye,
Mark Chignell, Setrag Khoshafian and Harry Wong
AI Expert, March 1990, v. 5 no. 3. Pp 38-47
IS 257 – Spring 2004
2004.04.20 - SLIDE 60
Intelligent Database Systems
• They represent the evolution and merging
of several technologies:
– Automatic Information Discovery
– Hypermedia
– Object Orientation
– Expert Systems
– Conventional DBMS
IS 257 – Spring 2004
2004.04.20 - SLIDE 61
Intelligent Database Systems
Automatic
discovery
Expert
Systems
Intelligent
Databases
Hypermedia
Traditional
Databases
IS 257 – Spring 2004
Object
Orientation
2004.04.20 - SLIDE 62
Intelligent Database Architecture
High-Level
Tools
High-Level
User Interface
Intelligent Database
Engine
IS 257 – Spring 2004
2004.04.20 - SLIDE 63
Environment Components
Flexible queries
Data
Dictionary
Error detection
Automatic Discovery
Concept
Dictionary
IS 257 – Spring 2004
2004.04.20 - SLIDE 64
Intelligent Databases
• Data Dictionary contains the system
metadata
• Concept Dictionary defines ‘virtual fields’
based on approximate definitions
• Data Analysis and discovery
– Find patterns
– detect errors
– Process queries
IS 257 – Spring 2004
2004.04.20 - SLIDE 65
Intelligent Databases
• Automatic Discovery
– Data comprehension
– Form Hypotheses
– Make queries
– View results and perhaps modify hypotheses
– Repeat
IS 257 – Spring 2004
2004.04.20 - SLIDE 66
Intelligent Databases
• Automatic Error Detection
– Integrity Constraints
– Rule systems
– Analysis of data for anomalies
IS 257 – Spring 2004
2004.04.20 - SLIDE 67
Intelligent Databases
• Flexible Query Processing
– Approximate and “fuzzy” queries
• SELECT NAME, AGE, TELEPHONE FROM
PERSONEL WHERE NAME = ‘Dovid Smith’ and
AGE IS-CLOSE-TO 19;
• confidence factors
– Ranked query results
IS 257 – Spring 2004
2004.04.20 - SLIDE 68
Intelligent Databases
• Intelligent User Interfaces
– Hyperlinked data in the data/knowledge base
– Multimedia presentations
– Dynamic linking of related information
IS 257 – Spring 2004
2004.04.20 - SLIDE 69
Intelligent Databases
• Intelligent Database Engine
–
–
–
–
–
–
–
–
–
OO support
Inference features
Global optimization
Rule manager
Explanation manager
Transaction manager
Metadata manager
Access module
Multimedia manager
IS 257 – Spring 2004
2004.04.20 - SLIDE 70