Yandex Ad Data - Yury Lifshits
Download
Report
Transcript Yandex Ad Data - Yury Lifshits
Mining for High-CTR Subsets in
the Yandex Ad Data
Peter Sadowski, Yuri Lifshits
Data Set
•
•
•
•
26,000,000 ad events
Click-through rate (CTR) = .0088
250,000 unique user ids
430,000 unique ad ids
#-of-Clicks Distribution
5
2
x 10
# of Users
1.5
1
0.5
>20 clicks were grouped
together here
0
0
5
10
15
# of Clicks
20
Average # of clicks per user: 0.9004
25
#-of-Clicks Distribution
5
4
x 10
3.5
3
# of ads
2.5
2
1.5
1
>20 clicks were grouped
together here
0.5
0
0
5
10
15
# of Clicks
Average # of clicks per ad: 0.5267
20
25
Largest Connected Component
(edge = click event)
• This gets rid of users and ads that are of little
use to us
• 54,000 users; 56,000 ads
• 12,000,000 events (~50% of total)
• 97% of total clicks
• CTR = .0179
Approach to Presenting Ads
Goal: Matching ads to users such that there is a
high probability of clicks.
Approach: (Find the best matches first)
Find subsets of ads and users that have high
CTR. Ads within these subsets can then be
matched to users who liked the rest of the ads
in the subset.
Mining for High-CTR Subsets
Goal: Find subsets of ads and users that have
high CTR.
Method: Find good subset, remove those ads,
repeat.
Results:
• Subsets of 100 with CTR of .5, compared to
~.03 for these users and ~.015 for ads (for
within connected component).
• Subsets of 20 with CTR of .5
6 Subsets of 100
30 Subsets of 20
Concerns