Data Warehouses and the Web - Computer Information Systems

Download Report

Transcript Data Warehouses and the Web - Computer Information Systems

Warehousing on the Web
Webhouse
Why Utilize the Web?
What is the data Webhouse
 Managing clickstreams
 WWW today
 ROI
 DSS

Data Webhouse
Defined by Ralph Kimball
 Two distict focuses

• Bringing the web to the warehouse
– Clickstream data as a source of information
• Bringing existing data warehouses to web
– Fully distributed environment
Required Capabilities








Capture clickstream logs and convert to
tables for analysis
Merge customer demographic and account
info with above
Interpret customer paths in website
Identify abandoned sessions
Use dw to drive customer responses
appearing on your website
DW querying and reporting available through
web browsers
Attach multimedia to DW
DW security
Architecture – Web to Warehouse
Beyond comprehensive snapshot of
business on real-time basis also want
knowledge of customer behavior
 Extended design factors

• Timliness – real-time
• Data volume – no upper limit
• Response time – less than 10 seconds
Hot Response Cache






A file server holding complex file objects
As a file server it is an I/O engine (bandwidth)
Must hold objects which will be requested
Security responsibility of requesting server
Extension of original operational data store
(ODS)
Does not physically speed up database
creates illusion by storing predictable
answers
Who are our users?

Traditional
• Power users
– need database connectivity
• Analysts
– want to manipulate existing data
• Report viewers
– view standardized reports

Web
• Our customers
• Our business partners
• Our employees
Clickstreams

Clickstream not another data source
• Distributed nature leads to multiple data sources
which require synchronization
• Multiple parties
• More than a dozen log file formats for capturing
clickstream data
• Search specification

Basic form of clickstream data stateless
• Log shows isolated page retrieval event


Clickstream data anonymous
Todays Promotions
•
Clickthroughs and referrals as a revenue source
Clickstreams
Clickstream post-processor – receives
raw long data from web server and
normalizes it into a format which can be
combined with application derived data
for insertion into dw
 Todays Promotions

• Clickthroughs and referrals as a revenue
source
Why Bring DW to Web?
Primary function of dw to publish
information – web good partner
 Need distrnuted dw – web provides
universal connectivity
 Universal front-end – web browser

Web Pushes Data Warehouse




User interface effectiveness measurable
Queries and updates mixed
Speed expected – 10 second rule
Global
• 27 X 7 expected
• International characters, dates, addresses

Expanded multimedia
• Animation, zoomable images, maps, video clips
• Need material in digital form
• Enterprise information portal will require items to
be searchable
Web Pushes Data Warehouse

Mass customization
• Dynamically created web pages – XML

Fully distributed
• Linking together all the data marts

Security and Privacy
• Publish only to those who need to know
• User profiles and access profiles defined in one
place
• Full-time expert security person
Second Generation User
Interface Guidelines


Near- instantaneous performance
Website Design
•
•
•
•
•
•
•
•
Design for lowest common denominator
Measure page performance on a continuous basis
Paint navigation buttons immediately
Disclose content progressively
Implement page caching
Cache data, reports
Improve web server bandwidth
Improve server throughput
Second Generation User
Interface Guidelines

Data Webhouse design
• Adapt all web design responses
• Select appropriate DBMS software –
dimensional models, OLAP
• Use indexes, aggregations
• Partition files
• Increase RAM
• Use parallel processing
Meet User Expectations

Website design
• Site navigation choices
• Help choices
• Communication with various groups –
response must be assured
• Headlines serious and define content
• Indicate off-screen material
• Survey customer needs and wants
Meet User Expectations

Data Webhouse design
• Report library
• Folder of previous queries, reports …
• Dimension browser – viewing dimension
can assist report creation
• Business metadata interface –understand
organizations data assets
Streamline Process
Business processes designed from
ground up to work seamlessly on web
 Website design

• Reengineer to streamline process and
make navigation easier, uniform interfaces
• Remove barriers to reaching page
• Minimize clicks and new windows
• Allow interruption and return
Streamline Process

Data Webhouse design
• Build an explicit value chain for reporting and
analysis around the application suite using
conformed dimensions and facts
• Drill across functions
• Single user interface for reporting against all parts
of business
• Master report library and FAQs
• Single login and single console access to
webhouse
Reassure Users

Website Design
• Map of processes

Data Webhouse design
•
•
•
•
•
•
Provide status and lineage of current data
Provide status of running reports
Active notification
Allow for entry of NA if data not available
Time stamped dimensions
Time stamped reports
Allow Problem Resolution

Website design
•
•
•
•
Allow backtracking, rollback, play forward
Keep old transactions
Easy error reporting
Acknowledge, track and follow-up all user inputs,
show wait time
• Assist searching

Data Webhouse design
• Provide adequate end user support
• Show aggregates in use and available
• Show system load and percent completed
Build Trust
Clearly state and observe website’s
policies for using customer’s identity
 Website design

•
•
•
•
Do not abuse privacy
Link to privacy statement
Use friendly pictures of people
Distinguish between ad content and
editorial content
Build Trust

Data Webhouse design
• Two-factor security
– What you know – password
– What you posses – token
• Track changes in employee and contractor
status
• Create and enforce roles for employees,
contractors and customers
• Manage webhouse security directly
Provide Communication Hooks

Website design
• Provide useful links to others – internal and
external
• Remove links that invalidate the “back”
button
• Use copyable URLs
• Use URL as medium of distribution
Advantages of Web Today 1998
2000
Immediate worldwide access
 Centralized management Decentralized
 Thin client
 Multi-platform (client and server) Distributed
 Little or no software distribution Downloads

A+
Disadvantages of Web Today
1998 2000
Immature technology - Teenager
 Security - Solutions
 Speed restricted by bandwidth - data
and logic must both travel across
internet
 Design limited to least common
denominator or access restricted to
specific browser

Vulnerabilities
Physical assets
 Information assets

• theft
• modification
Software assets
 Ability to conduct business

Web Architecture
Thin Client
Communication layer
(network/internet)
•Browser
•Applets/ActiveX
•Email
•Spreadsheet
•Word-processing
Application
Application
Internet
Server
Analysis/ Graphics
statistics
Report SQL
Writer Query
OLAP Server
Database
Servers
Multidimensional
Database
Summary/Alternative
Relational Tables
Data Warehouse - Relational Database
Business Management through
Information

Analysis of historical records
• order processing, inventory levels,
shipments, receivables, customer history,
etc.

Goals include:
• Measures of efficiency
• Anticipate changes (planning and
forecasting)
• Make adjustments
• Integration of model and control function
Rule-Based Management

Create Strategic rules
• IF market demand increases
THEN implement marketing campaign
A3
• IF profit margin drops below value X
THEN adjust overhead by …

Must not forget alert rules
• If unanticipated condition, then notify CFO

Must not be too reactive
• would cause thrashing
OLDM Decision Process

Simultaneous capture of:
• Decision support information
– Surveyed customer on-line in exchange for an
additional discount


• with business function inputs
Immediate computation or estimation of
secondary information
• based on planning and forecasting rules
Decision support information is:
Management
• available on-line
Defined !
• ready to use “as is”
OLDM Decision Process
Derived data becomes control
information
 Automation of analysis and decision
support

• immediately available to management
Problems documented on-line
 Classes of problem and corrective
action codified

• problem recognition
• decision rules
OLDM Decision Process

Requires four types of information
• Characteristics which identify a class of
problem
• Corrective action ( management responses
by problem class)
• Rules to implement actions
• Record of result
Potential of OLDM

Better managed business
• knowledge asset capture and retention
• consistency across enterprise
• flexible, highly responsive

Close loop with customer
• event and market driven but controlled

Direct customer interaction
• via web, telephone, remote connection


Improved systems capacity planning and system
management
Re-alignment of business and IT