投影片 1 - Tamkang University

Download Report

Transcript 投影片 1 - Tamkang University

1
Outline


Background
Lessons and challenges presented
Business-level
 Technical-level (by data mining lifecycle stages)

Data collection
 Data warehouse construction
 Business intelligence
 Deployment

2
Background

Blue Martini Software


From beginning, significant consideration was given to data
transformation and analysis needs
Lessons from 1999-2003




More than 20 clients
Durations from a few person-weeks to several person-months
Some are available as case studies
Sources of data



Customer registration and demographic information, web click streams,
response to DM and email campaigns, orders places through a website,
call center, or in-store POS systems
A few thousands records to more than 100 million records
Collected from a few months to several years
3
Business lessons

By data mining lifecycle stages
Requirement gathering
 Data collection
 Data warehouse construction
 Business intelligence
 Deployment

4
Requirement gathering lessons

Clients are often reluctant to list specific business
questions


Whet the clients’ appetite by presenting preliminary findings
Push clients to ask characterization and strategic
questions



“What is the distribution of males/females among those
spending more than $500?”
“What characterize people who spend more than $500”
Challenges: developing methodology and best practices to
help business people define appropriate questions
5
Data collection

The system transparently collects
Every search and the number of results returned
 Shopping carts events
 Important events such as registration, initiation of
checkout, and order confirmation
 Any form field failure
 Use’s local time zone, data for robot detection, color
depth, screen resolution

6
Data collection lessons


Collect the right data, up front
Integrate external events
7
Data warehouse construction

Lessons


Automatic generation of Decision Support System
database is appreciated
Challenges
Firewalls
 Integration

8
Business intelligence lessons


Expect the operational channels to be higher
priority than decision support
Crawl, walk, run




Start from basic reporting
Train data analyst
Tell people the time, not how to build clocks
Define the terminology

Writing a good glossary and sharing the terms across
reports is important
9
Business intelligence challenges






Make it easier to map business questions to data
transformations
Automate feature construction
Build comprehensible models
Experiment because correlation does not imply
causality
Explain counter-intuitive insights
Assess the ROI (return on investment) of
insights
10
Deployment

Lessons
Share insights
 Take action


Challenges

Have transformed data available for scoring
11
Technical details (1) data definition, collection, and preparation



Data collection and management
Data cleansing
Data processing
12
Data collection and management

Lessons
Collect data at the right abstraction levels
 Design forms with data mining in mind
 Validate forms to ease data cleansing and analysis
 Determine thresholds based on careful data analysis


Example: session timeout
13
Data collection and management

Challenges
Sample at collection
 Support slowly changing dimensions
 Perform data warehouse updates effectively

14
Data cleansing

Lessons

Audit the data
15
Data cleansing

Challenges

Detect bots


Between 5% to 40% of visits are due to bots
Perform regular de-duping of customers and
accounts

many-to-many relationship
16
Data processing

Lessons
Support hierarchical attributes
 Handle cyclical attributes
 Support rich data transformations

17
Data processing

Challenges
Support hierarchical supports
 Handle “unknown” and “not applicable” attribute
values


NULL
18
Technical details (2) - Analysis




Understanding and enriching the data
Building models and identifying insights
Deploying models, acting upon the insights, and
closing the loop
Empowering business users to conduct their
own analysis
19
Understanding and enriching the
data

Lessons

Statistics

Distributions, min, max, mean, number of NULL and
non-NULL
Weighted average
 Visualization


Line chart, bar chart, scatter plot, heatmap, filter chart
20
Building models and identifying insights

Lessons
Mine data at the right granularity levels
 Handle leaks in predictive models


Leaks are attributes highly correlated with the target but
not useful in practice as good predictors
Improve scalability
 Build simple models first
 Use data mining suites
 Peel the oinion and validate results

21
Sharing insights, deploying models,
and closing the loop

Lessons
Represent models visually for better insights
 Understand the importance of the deployment
context
 Creating actionable models and closing the gap

22
Empowering business users to
conduct their own analysis

Lessons
Share the results among business users via simple,
easy to understand reports
 Provide canned reports that can be run by business
users by simply specifying values for a few
parameters
 Technically savvy business users might be
comfortable designing their own investigations
provided a simple user interface

23
Empowering business users to
conduct their own analysis

Challenges
Visualize models
 Prune rules and associations
 Analyze and measure long-term impact of changes

24
Summary

Top three lessons




Integrate data collection into operations to support analytics
and experimentation
Do not confuse yourself with the target user
Provide simple reports and visualizations before building
more complex models
Top three challenges



The ability to translate business questions to the desired data
transformations
Efficient algorithms whose output is comprehensible for
business insight, and which can handle multiple data types
Integrated workflow
25