投影片 1 - Tamkang University
Download
Report
Transcript 投影片 1 - Tamkang University
1
Outline
Background
Lessons and challenges presented
Business-level
Technical-level (by data mining lifecycle stages)
Data collection
Data warehouse construction
Business intelligence
Deployment
2
Background
Blue Martini Software
From beginning, significant consideration was given to data
transformation and analysis needs
Lessons from 1999-2003
More than 20 clients
Durations from a few person-weeks to several person-months
Some are available as case studies
Sources of data
Customer registration and demographic information, web click streams,
response to DM and email campaigns, orders places through a website,
call center, or in-store POS systems
A few thousands records to more than 100 million records
Collected from a few months to several years
3
Business lessons
By data mining lifecycle stages
Requirement gathering
Data collection
Data warehouse construction
Business intelligence
Deployment
4
Requirement gathering lessons
Clients are often reluctant to list specific business
questions
Whet the clients’ appetite by presenting preliminary findings
Push clients to ask characterization and strategic
questions
“What is the distribution of males/females among those
spending more than $500?”
“What characterize people who spend more than $500”
Challenges: developing methodology and best practices to
help business people define appropriate questions
5
Data collection
The system transparently collects
Every search and the number of results returned
Shopping carts events
Important events such as registration, initiation of
checkout, and order confirmation
Any form field failure
Use’s local time zone, data for robot detection, color
depth, screen resolution
6
Data collection lessons
Collect the right data, up front
Integrate external events
7
Data warehouse construction
Lessons
Automatic generation of Decision Support System
database is appreciated
Challenges
Firewalls
Integration
8
Business intelligence lessons
Expect the operational channels to be higher
priority than decision support
Crawl, walk, run
Start from basic reporting
Train data analyst
Tell people the time, not how to build clocks
Define the terminology
Writing a good glossary and sharing the terms across
reports is important
9
Business intelligence challenges
Make it easier to map business questions to data
transformations
Automate feature construction
Build comprehensible models
Experiment because correlation does not imply
causality
Explain counter-intuitive insights
Assess the ROI (return on investment) of
insights
10
Deployment
Lessons
Share insights
Take action
Challenges
Have transformed data available for scoring
11
Technical details (1) data definition, collection, and preparation
Data collection and management
Data cleansing
Data processing
12
Data collection and management
Lessons
Collect data at the right abstraction levels
Design forms with data mining in mind
Validate forms to ease data cleansing and analysis
Determine thresholds based on careful data analysis
Example: session timeout
13
Data collection and management
Challenges
Sample at collection
Support slowly changing dimensions
Perform data warehouse updates effectively
14
Data cleansing
Lessons
Audit the data
15
Data cleansing
Challenges
Detect bots
Between 5% to 40% of visits are due to bots
Perform regular de-duping of customers and
accounts
many-to-many relationship
16
Data processing
Lessons
Support hierarchical attributes
Handle cyclical attributes
Support rich data transformations
17
Data processing
Challenges
Support hierarchical supports
Handle “unknown” and “not applicable” attribute
values
NULL
18
Technical details (2) - Analysis
Understanding and enriching the data
Building models and identifying insights
Deploying models, acting upon the insights, and
closing the loop
Empowering business users to conduct their
own analysis
19
Understanding and enriching the
data
Lessons
Statistics
Distributions, min, max, mean, number of NULL and
non-NULL
Weighted average
Visualization
Line chart, bar chart, scatter plot, heatmap, filter chart
20
Building models and identifying insights
Lessons
Mine data at the right granularity levels
Handle leaks in predictive models
Leaks are attributes highly correlated with the target but
not useful in practice as good predictors
Improve scalability
Build simple models first
Use data mining suites
Peel the oinion and validate results
21
Sharing insights, deploying models,
and closing the loop
Lessons
Represent models visually for better insights
Understand the importance of the deployment
context
Creating actionable models and closing the gap
22
Empowering business users to
conduct their own analysis
Lessons
Share the results among business users via simple,
easy to understand reports
Provide canned reports that can be run by business
users by simply specifying values for a few
parameters
Technically savvy business users might be
comfortable designing their own investigations
provided a simple user interface
23
Empowering business users to
conduct their own analysis
Challenges
Visualize models
Prune rules and associations
Analyze and measure long-term impact of changes
24
Summary
Top three lessons
Integrate data collection into operations to support analytics
and experimentation
Do not confuse yourself with the target user
Provide simple reports and visualizations before building
more complex models
Top three challenges
The ability to translate business questions to the desired data
transformations
Efficient algorithms whose output is comprehensible for
business insight, and which can handle multiple data types
Integrated workflow
25