Transcript Data Mining
Predicting Blood Donor’s Attrition
with Data Mining Methods
Xing Wan
Introduction
Blood supply in the U.S. is always
inadequate
Recruitment and retention of blood donor has
become a top priority
RESEARCH QUESTIONS
What kind of blood donors is more likely to
drop out?
What are critical factors may lead to donor
attrition?
Introduction
BRIEF LITERATURE REVIEW
Current situation of blood supply in the U.S.
Prior survey based researches found some
important factors (such as donation experience
and convenient donation place) may influence
donor’s decision to return
A recent study on freshmen student attrition
used data mining techniques
DataSets
Donation
Records
Disaster relief
appeals and
incentive info.
Possible
survey data
Whole Dataset
For Modeling
Methodology and Tools
CRISP-DM
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation (10-fold cross-validation approach)
Deployment
Tools:
SAS DM
SAS EG
Base SAS
Procedure
Data preprocessing
Dependent Variable
Target_dropout: code it as “1” if a donor did not
return after his/her first donation
Modeling
Popular classification methods
Logistic Regression; Decision Tree; ANN; SVM
Ensemble techniques
Bagging ,Busting and Information fusion
Procedure
Sensitivity analysis
measures the importance of independent
variable based on the change in modeling
performance that occurs if this variable is
not included in the model.
The greater the performance decrease, the
greater the ratio of importance.
Results
Identify potential dropout by using
data mining model
Identify important independent
variables or predictors
Develop and deploy Detention
strategies
References
1. Cheng, E., Chany, C., & Chauz, M. (2010). Data Analysis for Healthcare: A
Case Study in Blood Donation Center Analysis. Americas Conference on
Information Systems.
2. Delen, D. (2010). A comparative analysis of machine learning techniques for
student retention management. Decision Support Systems, 49, 498-506.
3. Masser, B., White, K., Hyde, M., & Terry, D. (2008). The Psychology of Blood
Donation: Current Research and Future Directions. Transfusion Medicine
Reviews, 22(3), 215-233.
4. Saltelli, A. ( 2002). Making best use of model evaluations to compute
sensitivity indices, . Computer Physics Communications 145, 280-297.
References
5. Schlumpf, K., Glynn, S., Schreiber, G., & Wright, D. (2008). Factors
influencing donor return. TRANSFUSION, 48, 264-272.
6. Schreiber, G., Sanchez, A., Glynn, S., & Wright, D. (2003). Increasing blood
availability by changing donation patterns. TRANSFUSION, 43, 590-597.
7. SPSS. SPSS PASW Modeler (formerly Clementine) User Manual. A
Comprehensive Data Mining Toolkit, 2010.
8. Yu, P., Chung, K., Lin, C., Chan, J., & Lee, C. (2007). Predicting potential
drop-out and future commitment for first-time donors based on first 1.5-year
donation patterns: the case in Hong Kong Chinese donors. Vox Sanguinis, 93,
57–63.