46-942 - Andrew.cmu.edu

Download Report

Transcript 46-942 - Andrew.cmu.edu

Information Resources
Management
April 24, 2001
Agenda
Administrivia
 Object-Oriented & Databases
 Data Warehousing
 Data Mining
 SQL Extensions
 XML

Administrivia
Homework #8
 Homework #9
 Current Scores
 Final Review Session?

OODBMS vs. ORDBMS
OODBMS - Object-Oriented
 ORDBMS - Object-Relational
 Appendix A

OODBMS
Persistent Objects
 By class
 By creation
 By marking
 By reference
 Storage/Retrieval Methods

OODBMS - Benefits
Match
 Programming
 Methodology
 Data types & structures
 Ease of programming
 Inheritance

OODBMS - Challenges
Standards
 ODMG - Object Database
Management Group
 Performance
 Database vs. persistent language
 Loss of integrity, queries
 Storage Space
 Maturity

ORDBMS
Extensions to relational model
 Complex data types
 Inheritance
 References
 Migration path
 Use existing applications and
knowledge base

ORDBMS - Benefits
SQL
 Existing Systems
 Vendors

ORDBMS - Challenges
Standards
 “Fit” with the development language
 Programming Complexity

Using a relational database to store data
from an object-oriented system has been
likened to parking your car in your garage.
With an OODBMS you park the car in the
garage. If a (O)RDBMS is used, to park your
car in the garage, you must first completely
disassemble it and put each part in its
specific location on a shelf. This process
must then be reversed the next time you
want to go for a drive.
OODBMS/ORDBMS Products
Vendor
Computer Associates
www.cai.com/products/jasmine
Franz
www.franz.com
Fujitsu Software
www.fsc.fujitsu.com
Gemstone Systems
www.gemstone.com
Matisse Software
www.matisse.com
O2 Technology
www.o2tech.com
Object Design
www.odi.com
Product
Jasimine
AllegroStore
Jasmine
GemStone/S
ADB
O2
ObjectStore
OODBMS/ORDBMS Products
Vendor
Objectivity
www.objectivity.com
Object Systems
www.iprolink.ch/ibex.com
Ontos
www.ontos.com
Persistence
www.persistence.com
Poet Software
www.poet.com
Unisys
www.osmos.com
Versant
www.versant.com
Product
Objectivity/DB
ITASCA
Ontos Integrator
Persistence Live
Object Server
Poet Object Server
Osmos
Versant ODBMS
Other Links
Object Database Management Group
www.odmg.org
 Object Database Newsgroup
comp.databases.object

Data Mining




Corporations have collosal amounts of data
Usually only used for very specific purposes
(operations)
Automated attempt to learn from the data
Find statistical rules and patterns in the data
Example: Giant Eagle Advantage Card
Goals of Data Mining
Explanatory - Why?
 Confirmatory - Is it?
 Exploratory - ???

Approaches to Data Mining



Classification
 identify rules that
create groups
Association
 find related conditions
or events
Correlation
 relationships between
values


User Guided
 hypothesis
driven
Automatic
 data driven
- AI based
Data Warehouse
A subject-oriented, integrated, timevariant, nonvolatile collection of data
 Usually all data for a corporation
 Multidimensional database

Data Warehousing
Single location
 Long-term storage
 Greater availability
 Separate “data” processing from day-today operations (performance)
 All data is historical
 Support data mining, et al.

Data Warehousing Questions
What data needs to be kept?
 Where is it from?
 How good is it?
 How long should it be kept?
 Can it be summarized? When?
 Will it make sense? What is the
schema?
 When is it updated?

Data Warehousing - Benefits
Support for decision making tools
 DSS, EIS, Data Mining
 Separation of information and day-today processing
 Unification - Centralization
 Improved quality and consistency

Data Warehousing Challenges
Costs: Storage, Setup, Maintenance
 Historical data issues
 Defining the warehouse schema
 Doing the conversion
 Implementation & every time
 Keeping up with operational system
changes
 Answering the questions

Multidimensional Databases
Two views
 Multidimensional tables
 Star schema
 Multidimensional table
 each cell is attribute
 dimensions are “interesting”
categories

Multidimensional Table
Cell - sales
 Dimensions
 day
 person
 store
 item

Star Schema

Multiple tables
 Central table - data item (cell)
 Surrounding tables - information
about each category (dimensions)
Star Schema
Person
Day
Sales
Item
Store
Star Schema
Sales (Day, Person, Store, Item, sales)
Day (Day, day info)
Person (Person, person info)
Store (Store, store info)
Item (Item, item info)
Building/Maintaining a Data
Warehouse
1.
2.
3.
4.
Capture
Scrub
Transform
Load and Index
Data Marts
Making specific data available
 Different ones for different needs

DW
Operational
Systems
DM1
DM2
Data Mining




Corporations have collosal amounts of data
Usually only used for very specific purposes
(operations)
Automated attempt to learn from the data
Find statistical rules and patterns in the data
Example: Giant Eagle Advantage Card
Goals of Data Mining
Explanatory - Why?
 Confirmatory - Is it?
 Exploratory - ???

Approaches to Data Mining



Classification
 identify rules that
create groups
Association
 find related conditions
or events
Correlation
 relationships between
values


User Guided
 hypothesis
driven
Automatic
 data driven
- AI based
Data Mining - Benefits
Use data
 Learn new things
 Improve decision making

Data Mining - Challenges
Time (human and/or computer)
 Spurious results
 Separating the wheat from the chaff
 Availability of data
 Amount of data
 Changes in tools and technologies
 Validity over time

Enhanced Data Analysis
Beyond SUM, COUNT, and AVG
 SQL extensions (suggested)
 GROUP BY … AS PERCENTILE
 Specific percentiles
 GROUP BY … WITH CUBE
 Cross-tabulations
 Statistical package interface
 SAS, S++, others

Enhanced Data Analysis Benefits
Greater functionality
 Improved decision making

Enhanced Data Analysis Challenges
Lack of standards
 Understandability
 Processing requirements
 Cost of poorly written queries
 “ad hoc” queries aren’t reviewed

Extending Relational DBs
Spatial and Geographic Databases
 Multimedia Databases


Changing the data stored while
retaining the benefits of relational
databases
Spatial & Geographic DBs
Spatial - CAD
 Geographic - GIS


Similar issue
 How to store and retrieve such data
Spatial Databases
Geometric objects (2 or 3 dimensions)
 Locations
 Connections
 Nonspatial information about each
object
 Substructures
 Spatial integrity constraints
 Two things can’t occupy the same
space

GIS Databases
Raster Data (fractal data)
 Pictures - possibly over time
 Maps
 Vector Data
 Locations
 Connections
 Nongeographic information

Spatial & Geographic DB Benefits
DBMS
 Specialized queries
 Spatial & Geographic Data
 “Standard” Data
 Mix of the two
 Integrity constraints

Spatial & Geographic DB Challenges
Space requirements
 Level of detail
 Understandability - Complexity
 Processing requirements
 Compatibility between systems
 Lack of standards

Multimedia Databases
Images, Audio, Video
 Nonmultimedia data (text) about each


Database Enhancements
 BLOBs (Binary Large Objects)
 Similarity-based queries
 Guaranteed steady rate
 Synchronization of audio and video
Multimedia Databases Benefits
DBMS
 Greater compression may be possible
 “Paperless” office - document imaging
 Workflow redesign - improvements
 Greater availability

Multimedia Databases Challenges
STORAGE
 Specialized DBMS
 Unity of database and network
 Usually requires ATM
 Specialized hardware
 “juke boxes”
 optical disks

XML
What is it?
 What isn’t it?
 What are the goals?
 Who controls it?
 Who’s using it?
 Beyond XML

What is XML?
eXtensible Markup Language
 Markup language for “structured
information”
 “structured” - content & role of that
content
 markup - identify structures
 “meta language for describing markup
languages”

Huh?
Storing structured data in a text file
 spreadsheet, address book, transactions
(think EDI)
 Looks like HTML, <tags>, but isn’t
 Text is universal, but not efficient
 Does disk space matter?
 What about network capacity?
 XML is license-free & platform-independent

What XML isn’t
HTML
 SGML - Standard Generalized Markup
Language - printing
 Limited to current definitions (tags)
 XML is the way to add new definitions
 A relational database management
system
 A database, or is it?

Goals of XML
Easy to use over Internet
 Wide variety of applications
 Compatible with SGML (subset)
 Easy to write programs that use XML
documents
 No (or few) optional features
 Human-legible if necessary

Goals of XML (2)
Standards developed quickly
 Formal and concise
 Easy to create documents
 No need for “shortcuts”

Who Controls XML?

W3 Consortium
 www.w3.org/XML
 XML 1.0 specification
Who’s Using XML?
Financial Products Markup Language
 FpML
 FpML.org
 “A standard for financial derivatives
business-to-business e-Commerce”
 Others?

Beyond XML
Xlink - hyperlinks in XML
 XPointer & Xfragments - point to parts
of an XML document
 CSS - style sheet language
 XML and HTML
 XSL - advanced language for style
sheets
 XSLT - XSL transformation language

Beyond XML (2)
DOM - standard function calls for
manipulating XML (and HTML) from
programs
 XML Namespaces - link a URL with
every tag and attribute
 XML Schemas 1 & 2 - help in precisely
developing own XML-based formats

Homework #10
Last One! (No HW #11)
 Research and evaluate products
 100 points

Final
Next Tuesday, 5/1
 Approximately 1/3 from 4/3 - 4/24
 Remainder - comprehensive

Thank You