Data Warehouses and the Web - Computer Information Systems
Download
Report
Transcript Data Warehouses and the Web - Computer Information Systems
Warehousing on the Web
Webhouse
Why Utilize the Web?
What is the data Webhouse
Managing clickstreams
WWW today
ROI
DSS
Data Webhouse
Defined by Ralph Kimball
Two distict focuses
• Bringing the web to the warehouse
– Clickstream data as a source of information
• Bringing existing data warehouses to web
– Fully distributed environment
Required Capabilities
Capture clickstream logs and convert to
tables for analysis
Merge customer demographic and account
info with above
Interpret customer paths in website
Identify abandoned sessions
Use dw to drive customer responses
appearing on your website
DW querying and reporting available through
web browsers
Attach multimedia to DW
DW security
Architecture – Web to Warehouse
Beyond comprehensive snapshot of
business on real-time basis also want
knowledge of customer behavior
Extended design factors
• Timliness – real-time
• Data volume – no upper limit
• Response time – less than 10 seconds
Hot Response Cache
A file server holding complex file objects
As a file server it is an I/O engine (bandwidth)
Must hold objects which will be requested
Security responsibility of requesting server
Extension of original operational data store
(ODS)
Does not physically speed up database
creates illusion by storing predictable
answers
Who are our users?
Traditional
• Power users
– need database connectivity
• Analysts
– want to manipulate existing data
• Report viewers
– view standardized reports
Web
• Our customers
• Our business partners
• Our employees
Clickstreams
Clickstream not another data source
• Distributed nature leads to multiple data sources
which require synchronization
• Multiple parties
• More than a dozen log file formats for capturing
clickstream data
• Search specification
Basic form of clickstream data stateless
• Log shows isolated page retrieval event
Clickstream data anonymous
Todays Promotions
•
Clickthroughs and referrals as a revenue source
Clickstreams
Clickstream post-processor – receives
raw long data from web server and
normalizes it into a format which can be
combined with application derived data
for insertion into dw
Todays Promotions
• Clickthroughs and referrals as a revenue
source
Why Bring DW to Web?
Primary function of dw to publish
information – web good partner
Need distrnuted dw – web provides
universal connectivity
Universal front-end – web browser
Web Pushes Data Warehouse
User interface effectiveness measurable
Queries and updates mixed
Speed expected – 10 second rule
Global
• 27 X 7 expected
• International characters, dates, addresses
Expanded multimedia
• Animation, zoomable images, maps, video clips
• Need material in digital form
• Enterprise information portal will require items to
be searchable
Web Pushes Data Warehouse
Mass customization
• Dynamically created web pages – XML
Fully distributed
• Linking together all the data marts
Security and Privacy
• Publish only to those who need to know
• User profiles and access profiles defined in one
place
• Full-time expert security person
Second Generation User
Interface Guidelines
Near- instantaneous performance
Website Design
•
•
•
•
•
•
•
•
Design for lowest common denominator
Measure page performance on a continuous basis
Paint navigation buttons immediately
Disclose content progressively
Implement page caching
Cache data, reports
Improve web server bandwidth
Improve server throughput
Second Generation User
Interface Guidelines
Data Webhouse design
• Adapt all web design responses
• Select appropriate DBMS software –
dimensional models, OLAP
• Use indexes, aggregations
• Partition files
• Increase RAM
• Use parallel processing
Meet User Expectations
Website design
• Site navigation choices
• Help choices
• Communication with various groups –
response must be assured
• Headlines serious and define content
• Indicate off-screen material
• Survey customer needs and wants
Meet User Expectations
Data Webhouse design
• Report library
• Folder of previous queries, reports …
• Dimension browser – viewing dimension
can assist report creation
• Business metadata interface –understand
organizations data assets
Streamline Process
Business processes designed from
ground up to work seamlessly on web
Website design
• Reengineer to streamline process and
make navigation easier, uniform interfaces
• Remove barriers to reaching page
• Minimize clicks and new windows
• Allow interruption and return
Streamline Process
Data Webhouse design
• Build an explicit value chain for reporting and
analysis around the application suite using
conformed dimensions and facts
• Drill across functions
• Single user interface for reporting against all parts
of business
• Master report library and FAQs
• Single login and single console access to
webhouse
Reassure Users
Website Design
• Map of processes
Data Webhouse design
•
•
•
•
•
•
Provide status and lineage of current data
Provide status of running reports
Active notification
Allow for entry of NA if data not available
Time stamped dimensions
Time stamped reports
Allow Problem Resolution
Website design
•
•
•
•
Allow backtracking, rollback, play forward
Keep old transactions
Easy error reporting
Acknowledge, track and follow-up all user inputs,
show wait time
• Assist searching
Data Webhouse design
• Provide adequate end user support
• Show aggregates in use and available
• Show system load and percent completed
Build Trust
Clearly state and observe website’s
policies for using customer’s identity
Website design
•
•
•
•
Do not abuse privacy
Link to privacy statement
Use friendly pictures of people
Distinguish between ad content and
editorial content
Build Trust
Data Webhouse design
• Two-factor security
– What you know – password
– What you posses – token
• Track changes in employee and contractor
status
• Create and enforce roles for employees,
contractors and customers
• Manage webhouse security directly
Provide Communication Hooks
Website design
• Provide useful links to others – internal and
external
• Remove links that invalidate the “back”
button
• Use copyable URLs
• Use URL as medium of distribution
Advantages of Web Today 1998
2000
Immediate worldwide access
Centralized management Decentralized
Thin client
Multi-platform (client and server) Distributed
Little or no software distribution Downloads
A+
Disadvantages of Web Today
1998 2000
Immature technology - Teenager
Security - Solutions
Speed restricted by bandwidth - data
and logic must both travel across
internet
Design limited to least common
denominator or access restricted to
specific browser
Vulnerabilities
Physical assets
Information assets
• theft
• modification
Software assets
Ability to conduct business
Web Architecture
Thin Client
Communication layer
(network/internet)
•Browser
•Applets/ActiveX
•Email
•Spreadsheet
•Word-processing
Application
Application
Internet
Server
Analysis/ Graphics
statistics
Report SQL
Writer Query
OLAP Server
Database
Servers
Multidimensional
Database
Summary/Alternative
Relational Tables
Data Warehouse - Relational Database
Business Management through
Information
Analysis of historical records
• order processing, inventory levels,
shipments, receivables, customer history,
etc.
Goals include:
• Measures of efficiency
• Anticipate changes (planning and
forecasting)
• Make adjustments
• Integration of model and control function
Rule-Based Management
Create Strategic rules
• IF market demand increases
THEN implement marketing campaign
A3
• IF profit margin drops below value X
THEN adjust overhead by …
Must not forget alert rules
• If unanticipated condition, then notify CFO
Must not be too reactive
• would cause thrashing
OLDM Decision Process
Simultaneous capture of:
• Decision support information
– Surveyed customer on-line in exchange for an
additional discount
• with business function inputs
Immediate computation or estimation of
secondary information
• based on planning and forecasting rules
Decision support information is:
Management
• available on-line
Defined !
• ready to use “as is”
OLDM Decision Process
Derived data becomes control
information
Automation of analysis and decision
support
• immediately available to management
Problems documented on-line
Classes of problem and corrective
action codified
• problem recognition
• decision rules
OLDM Decision Process
Requires four types of information
• Characteristics which identify a class of
problem
• Corrective action ( management responses
by problem class)
• Rules to implement actions
• Record of result
Potential of OLDM
Better managed business
• knowledge asset capture and retention
• consistency across enterprise
• flexible, highly responsive
Close loop with customer
• event and market driven but controlled
Direct customer interaction
• via web, telephone, remote connection
Improved systems capacity planning and system
management
Re-alignment of business and IT