RapStar’s Solution to Data Mining Hackathon on Best Buy

Download Report

Transcript RapStar’s Solution to Data Mining Hackathon on Best Buy

RapStar’s Solution to Data Mining
Hackathon on Best Buy Mobile Site
Kingsfield, Dragon
Beat Benchmark
Beat Benchmark
• Naive Bayes
– We want to know the probability 𝑝 𝑖 𝑐 that user
click sku 𝑖 under context 𝑐.
– We use query 𝑞 as context first.
– So we have:
• 𝑝 𝑖 𝑐 ∝ 𝑝(𝑖) ×
𝑤𝑘 ∈𝑞 𝑝(𝑤𝑘 |𝑖)
– Select 5 item with highest predicted probability as
prediction.
Use Time information
• Time is a good feature in data mining.
Use Time information
• Divided data into 12 time periods based on
click_time field
• Use frequency at time period where
click_time belongs to as “prior” instead of
global frequency.
Use Time information
• Smooth data
Unigram to Bigram
• Likelihood of Naive Bayes:
–𝑝 𝑞𝑖 =
𝑤𝑘 ∈𝑞 𝑝(𝑤𝑘 |𝑖)
• Here 𝑤𝑘 is word.
• Use Bigram instead of Unigram(word).
–
–
–
–
Use query “xbox call of duty”
Rerank: “call duty of xbox”
Bigram: [“call duty”, ”call of”, ”call xbox”… “of xbox”]
Once We have bigram training data, the rest is the same as
unigram
• Blending unigram and bigram:
– 𝑝 = 𝑤1 × 𝑝𝑢𝑛𝑖𝑔𝑟𝑎𝑚 + 𝑤2 × 𝑝𝑏𝑖𝑔𝑟𝑎𝑚
Data Processing
• The most important part: Query Correction
– Lemmatization
– Split words and number
– Query correction(in small version)
• A lot of thing that can help to improve:
– “x box”, “x men”
– New algorithm for query correction
• Rank predictions that user clicked lower.
Conclusion
• Data Preprocessing and feature Engineering
are most important things.