Agogo.Davidgbmx - JMP User Community

Download Report

Transcript Agogo.Davidgbmx - JMP User Community

A Jump Into the Android App Store: What Makes a Best-Selling App?
David Agogo
University of Massachusetts, Amherst
Abstract
Comments on Methods
Within the past decade, the explosive rise in the use of
smartphones has led to the establishment of the multi-billion
dollar mobile app ecosystem with the app store at its center.
Using data mining approaches, an archive of information on
1.1 million apps is probed for interesting characteristics of top
selling apps. From analyses of subsets of the data archive,
paid and free apps of above average popularity are
predicted with 64% accuracy using bootstrap forests. Useful
insights on the most successful app categories, and how best to
interpret review ratings are discovered.
• Data mining methods consistently outperformed parametric
statistics (OLS)
• Results held strongly in the hold-out validation sample
R-Square of Data Mining Methods
1
0.729
0.8
0.6
0.457
0.4
0.366
0.2
0
OLS
Objectives
• Discover characteristics common to apps with the highest
number of installs
• Discover what app categories are most popular, after
controlling for other variables
• Discover the differences that exist between the worlds of paid
and unpaid app
Data
• 1.1 Million rows of metadata crawled from Play Store
• Analysis performed on two randomly selected subsets:
• N1 = 27,981 randomly selected rows
• N2 = 30,000 paid apps ONLY
• 70:30 training to hold-out validation ratio
Methods
• Ordinary Least Squares, CHAID/CART Decision Trees,
Bootstrap Forest [best performing], Neural Net
0.689
CHAID/CART
Training
Predictors
• Derived Predictors
• Average App Rating (5 point scale)
• Proportion of Reviewers that Loved it (5/5)
• Proportion of Reviewers that Hated it (1/5)
• Presence in Other App Stores
• Number of Weeks in App Store
• Estimated Number of Release Cycles
• Earliest Android Compatibility
• Original Predictors
• App Category
• Created by Top Developer
• App Price
• App Size
• Presence of In-App Purchases
Read Full Paper
www.agogodavid.com/JMP2015
Bootstrap Forest
Neural Net
(Click graph
to enlarge)
Validation
• However, model predictors performed worse in analysis of
paid apps only.
Findings
• Data mining methods consistently outperform parametric
statistics (OLS)
• Results generally hold strongly in the hold-out validation
sample
• Model predictors perform poorly in predicting correlates
of number of installs for paid apps
Other Highlights
• Median = Mean success is 1,000 to 5,000 installations
• Largest Effect: IsTopDeveloper, IsFree, OtherStores
• Best Categories for Top Developers: Weather, Productivity,
Transportation and Brain App; Worst Categories: Sports,
News and Magazines, Travel and Local
• Categories with Highest Installs of Free Apps: Racing,
Communication, Sports Games, Music and Audio, Brain
Apps. Lowest Installs: News and Magazines, Travel and
Local, Sports, Entertainment and Medical Apps)
A Jump Into the Android App Store: What Makes a Best-Selling App?
David Agogo
University of Massachusetts, Amherst
Comments on Methods
R-Square of Data Mining Methods
1
0.8
0.729
0.689
0.6
0.457
0.4
0.366
0.2
0
OLS
CHAID/CART
Training
Bootstrap Forest
Validation
Neural Net