Steven F. Ashby Center for Applied Scientific Computing Month DD

Download Report

Transcript Steven F. Ashby Center for Applied Scientific Computing Month DD

Lecture Notes 1
Introduction
1
ZHANGXI LIN
ISQS 7339
TEXAS TECH UNIVERSITY
ISQS 7342-001, Business Analytics
What are different from data mining course?
 Problem solving oriented
 More detailed algorithms and methods
 SAS EM 5.x
 Data mining + optimization
Case 1 – Impacts of Disaster on the Virtual
Community
3
 Research question:
 How the Sichuan earthquake impact Chinese – a view into the online
community
 Hypothesis



Themes in VC are diversified and dynamic
A major disaster will have significant impacts on VC
VCs will be focusing on the similar themes after a disaster in a period of
time.
 Methodology


Text mining
3.5 GB text data collected from a popular Chinese chatting room.
ISQS 7342-001, Business Analytics
Source: The New York Times
Sichuan - Chengdu
Population: 7 million
A town was totally eliminated
Keywords used in A Chinese Online Chatting Room
 Before 5/12

China, me, world, country, society, Beijing, children, job, government,
Olympic game, time, company, job
 On 5/12

Earthquake, China, me, happen, Sichuan, country, job, time,
situation, government, today, life
 After 5/12 within a month

Earthquake, affected region, people, happen, Wenchuan, disaster,
hope, donation, government, lives
The Dynamic of Daily Themes
Using that of 4/24 as the Benchmark
140
130
120
110
100
90
80
Series1
70
4 per. Mov. Avg.
(Series1)
60
07/10/08
07/03/08
06/26/08
06/19/08
06/12/08
06/05/08
05/29/08
05/22/08
05/15/08
05/08/08
05/01/08
04/24/08
50
Theme Distances from Different Benchmark Days
5/12, when the earthquake happened
5/19, the national memorial day for the victims in the earthquake
At 2p of 5/19, the search engine traffic in China was zero.
Case 2 – Location Optimization of
Multiple Data Backup Centers
10
 Problem
 M regional financial service branches (RFSB) distributed in different
cities in a country
 N data backup centers are to be built to service these branches,
which will be selected from N0 possible locations
 There are K kinds of disasters with different level of risk (in
probability) regarding T time periods in a year, which vary from
location to location
 Objective
 Choose N locations to build data backup centers for M RFSB to
minimize the risk probability in each time period.
ISQS 7342-001, Business Analytics
Courseware
SAS Course Notes
Textbooks
Decision
Tree
Decision Tree
AAEM
(Ch1, 2)
SAS EM 5.x
DMDT
CRM
Clustering
Mathematical
Programming
PMADV
OROPT
SAS
Programming