46-942 - Andrew.cmu.edu
Download
Report
Transcript 46-942 - Andrew.cmu.edu
Information Resources
Management
April 24, 2001
Agenda
Administrivia
Object-Oriented & Databases
Data Warehousing
Data Mining
SQL Extensions
XML
Administrivia
Homework #8
Homework #9
Current Scores
Final Review Session?
OODBMS vs. ORDBMS
OODBMS - Object-Oriented
ORDBMS - Object-Relational
Appendix A
OODBMS
Persistent Objects
By class
By creation
By marking
By reference
Storage/Retrieval Methods
OODBMS - Benefits
Match
Programming
Methodology
Data types & structures
Ease of programming
Inheritance
OODBMS - Challenges
Standards
ODMG - Object Database
Management Group
Performance
Database vs. persistent language
Loss of integrity, queries
Storage Space
Maturity
ORDBMS
Extensions to relational model
Complex data types
Inheritance
References
Migration path
Use existing applications and
knowledge base
ORDBMS - Benefits
SQL
Existing Systems
Vendors
ORDBMS - Challenges
Standards
“Fit” with the development language
Programming Complexity
Using a relational database to store data
from an object-oriented system has been
likened to parking your car in your garage.
With an OODBMS you park the car in the
garage. If a (O)RDBMS is used, to park your
car in the garage, you must first completely
disassemble it and put each part in its
specific location on a shelf. This process
must then be reversed the next time you
want to go for a drive.
OODBMS/ORDBMS Products
Vendor
Computer Associates
www.cai.com/products/jasmine
Franz
www.franz.com
Fujitsu Software
www.fsc.fujitsu.com
Gemstone Systems
www.gemstone.com
Matisse Software
www.matisse.com
O2 Technology
www.o2tech.com
Object Design
www.odi.com
Product
Jasimine
AllegroStore
Jasmine
GemStone/S
ADB
O2
ObjectStore
OODBMS/ORDBMS Products
Vendor
Objectivity
www.objectivity.com
Object Systems
www.iprolink.ch/ibex.com
Ontos
www.ontos.com
Persistence
www.persistence.com
Poet Software
www.poet.com
Unisys
www.osmos.com
Versant
www.versant.com
Product
Objectivity/DB
ITASCA
Ontos Integrator
Persistence Live
Object Server
Poet Object Server
Osmos
Versant ODBMS
Other Links
Object Database Management Group
www.odmg.org
Object Database Newsgroup
comp.databases.object
Data Mining
Corporations have collosal amounts of data
Usually only used for very specific purposes
(operations)
Automated attempt to learn from the data
Find statistical rules and patterns in the data
Example: Giant Eagle Advantage Card
Goals of Data Mining
Explanatory - Why?
Confirmatory - Is it?
Exploratory - ???
Approaches to Data Mining
Classification
identify rules that
create groups
Association
find related conditions
or events
Correlation
relationships between
values
User Guided
hypothesis
driven
Automatic
data driven
- AI based
Data Warehouse
A subject-oriented, integrated, timevariant, nonvolatile collection of data
Usually all data for a corporation
Multidimensional database
Data Warehousing
Single location
Long-term storage
Greater availability
Separate “data” processing from day-today operations (performance)
All data is historical
Support data mining, et al.
Data Warehousing Questions
What data needs to be kept?
Where is it from?
How good is it?
How long should it be kept?
Can it be summarized? When?
Will it make sense? What is the
schema?
When is it updated?
Data Warehousing - Benefits
Support for decision making tools
DSS, EIS, Data Mining
Separation of information and day-today processing
Unification - Centralization
Improved quality and consistency
Data Warehousing Challenges
Costs: Storage, Setup, Maintenance
Historical data issues
Defining the warehouse schema
Doing the conversion
Implementation & every time
Keeping up with operational system
changes
Answering the questions
Multidimensional Databases
Two views
Multidimensional tables
Star schema
Multidimensional table
each cell is attribute
dimensions are “interesting”
categories
Multidimensional Table
Cell - sales
Dimensions
day
person
store
item
Star Schema
Multiple tables
Central table - data item (cell)
Surrounding tables - information
about each category (dimensions)
Star Schema
Person
Day
Sales
Item
Store
Star Schema
Sales (Day, Person, Store, Item, sales)
Day (Day, day info)
Person (Person, person info)
Store (Store, store info)
Item (Item, item info)
Building/Maintaining a Data
Warehouse
1.
2.
3.
4.
Capture
Scrub
Transform
Load and Index
Data Marts
Making specific data available
Different ones for different needs
DW
Operational
Systems
DM1
DM2
Data Mining
Corporations have collosal amounts of data
Usually only used for very specific purposes
(operations)
Automated attempt to learn from the data
Find statistical rules and patterns in the data
Example: Giant Eagle Advantage Card
Goals of Data Mining
Explanatory - Why?
Confirmatory - Is it?
Exploratory - ???
Approaches to Data Mining
Classification
identify rules that
create groups
Association
find related conditions
or events
Correlation
relationships between
values
User Guided
hypothesis
driven
Automatic
data driven
- AI based
Data Mining - Benefits
Use data
Learn new things
Improve decision making
Data Mining - Challenges
Time (human and/or computer)
Spurious results
Separating the wheat from the chaff
Availability of data
Amount of data
Changes in tools and technologies
Validity over time
Enhanced Data Analysis
Beyond SUM, COUNT, and AVG
SQL extensions (suggested)
GROUP BY … AS PERCENTILE
Specific percentiles
GROUP BY … WITH CUBE
Cross-tabulations
Statistical package interface
SAS, S++, others
Enhanced Data Analysis Benefits
Greater functionality
Improved decision making
Enhanced Data Analysis Challenges
Lack of standards
Understandability
Processing requirements
Cost of poorly written queries
“ad hoc” queries aren’t reviewed
Extending Relational DBs
Spatial and Geographic Databases
Multimedia Databases
Changing the data stored while
retaining the benefits of relational
databases
Spatial & Geographic DBs
Spatial - CAD
Geographic - GIS
Similar issue
How to store and retrieve such data
Spatial Databases
Geometric objects (2 or 3 dimensions)
Locations
Connections
Nonspatial information about each
object
Substructures
Spatial integrity constraints
Two things can’t occupy the same
space
GIS Databases
Raster Data (fractal data)
Pictures - possibly over time
Maps
Vector Data
Locations
Connections
Nongeographic information
Spatial & Geographic DB Benefits
DBMS
Specialized queries
Spatial & Geographic Data
“Standard” Data
Mix of the two
Integrity constraints
Spatial & Geographic DB Challenges
Space requirements
Level of detail
Understandability - Complexity
Processing requirements
Compatibility between systems
Lack of standards
Multimedia Databases
Images, Audio, Video
Nonmultimedia data (text) about each
Database Enhancements
BLOBs (Binary Large Objects)
Similarity-based queries
Guaranteed steady rate
Synchronization of audio and video
Multimedia Databases Benefits
DBMS
Greater compression may be possible
“Paperless” office - document imaging
Workflow redesign - improvements
Greater availability
Multimedia Databases Challenges
STORAGE
Specialized DBMS
Unity of database and network
Usually requires ATM
Specialized hardware
“juke boxes”
optical disks
XML
What is it?
What isn’t it?
What are the goals?
Who controls it?
Who’s using it?
Beyond XML
What is XML?
eXtensible Markup Language
Markup language for “structured
information”
“structured” - content & role of that
content
markup - identify structures
“meta language for describing markup
languages”
Huh?
Storing structured data in a text file
spreadsheet, address book, transactions
(think EDI)
Looks like HTML, <tags>, but isn’t
Text is universal, but not efficient
Does disk space matter?
What about network capacity?
XML is license-free & platform-independent
What XML isn’t
HTML
SGML - Standard Generalized Markup
Language - printing
Limited to current definitions (tags)
XML is the way to add new definitions
A relational database management
system
A database, or is it?
Goals of XML
Easy to use over Internet
Wide variety of applications
Compatible with SGML (subset)
Easy to write programs that use XML
documents
No (or few) optional features
Human-legible if necessary
Goals of XML (2)
Standards developed quickly
Formal and concise
Easy to create documents
No need for “shortcuts”
Who Controls XML?
W3 Consortium
www.w3.org/XML
XML 1.0 specification
Who’s Using XML?
Financial Products Markup Language
FpML
FpML.org
“A standard for financial derivatives
business-to-business e-Commerce”
Others?
Beyond XML
Xlink - hyperlinks in XML
XPointer & Xfragments - point to parts
of an XML document
CSS - style sheet language
XML and HTML
XSL - advanced language for style
sheets
XSLT - XSL transformation language
Beyond XML (2)
DOM - standard function calls for
manipulating XML (and HTML) from
programs
XML Namespaces - link a URL with
every tag and attribute
XML Schemas 1 & 2 - help in precisely
developing own XML-based formats
Homework #10
Last One! (No HW #11)
Research and evaluate products
100 points
Final
Next Tuesday, 5/1
Approximately 1/3 from 4/3 - 4/24
Remainder - comprehensive
Thank You