90 Days at Yahoo!

Download Report

Transcript 90 Days at Yahoo!

90 days Intern at Yahoo! Lab
(6/1/2009~8/21/2009)
Chun-Sheng Chen
11/12/2009
Yahoo! Confidential
1
1
The Application
When to apply:
My Case:
Late January ~ Middle March
I sent my resume in both
way, and I got the
interview opportunity by
internal reference.
(middle of March)
How to apply:
Send your resume to
Yahoo through
•Internal reference
•Website: Yahoo! Hot Jobs
Yahoo! Confidential
2
The Interviews
Two phone interviews for an internship position.
First interview (45 minutes):
•
Questions cover all past experiences on my resume that related to data mining/machine
learning.
•
Question asked: fundamental data mining/machine learning knowledge: data cleaning,
data preprocessing, feature extraction, feature selection, Neural network design, decision
tree…
Second interview by the manager of the research group (30 minutes):
• Less technical questions
• Focus on my data mining/machine learning work experience:
• Knowing my previous projects
• The size of the largest dataset I have ever deal with
• The programming skill (languages used, familiar OS…)
Yahoo! Confidential
3
I passed and I went
Yahoo! Confidential
4
The Places !!
• Place I lived: East San Jose
• Place I worked: Yahoo! Lab.
• Commute 40 minutes one trip by light rail (VTA)
Yahoo! Confidential
5
The Yahoo!
• Theme Color:
Purple
• My Office: Great America Campus, Santa Clara,
CA (not the headquarter)
• My intern manager: Abraham Bagherjeiran
• Headquarter: Yahoo Sunnyvale Campus.
• We went there sometimes for free food,
talks(Jerry & David) and some intern activities
Yahoo! Confidential
6
This First Day
New hire orientation:
• Company Policies
• Corporate Environment, Buildings…
• Get my Badge
• Get my laptop ( You can pick a HP window laptop or a Macbook)
• Set up accounts (email, intranet, VPN…)
• Yahoo perks
Meet my intern manager and coworkers in the afternoon.
Yahoo! Confidential
7
Working Culture and Environment
Office Environment:
•
One cell for each person,
•
Linux based desktop computer for work
•
Several computer clusters(thousands of nodes) are available for heavy computing tasks
The projects you can work on:
1.
Assigned by supervisors
2.
Apply for joining an existing project, there lots of data mining/machine learning projects inside Yahoo!
Lab:
1.
Ads serving system (similar to recommend system)
2.
Pricing (learning a good pricing structure for selling the ads)
3.
Content filtering / categorization
3.
Initiate a research project and work on it if it get approved
4.
Publication of paper and pattern application are encouraged
I normally went to the office before 9:00 am and left between 6:30~7:00pm
Yahoo! Confidential
8
My Project: Contextual ads ranking using user
profile
Existing System:
Keystone
Page  Text ads
Behavior Targeting (BT)
User  Display ads
Keystone and BT data are joined by
same bcookie and date
This project:
Keystone User Modeling
User profile, Page, Ads  ad click
Yahoo! Confidential
9
Experiment Design : Baseline
Models (Maxent)
•Baseline features :
–Matching unigrams on the adside and unigrams on the pageside
–Matching bid phrases on the
ad-side and bid phrases on the
page-side
Page content
Yankees' Wang has shoulder surgery
The Associated Press
Posted: 07/29/2009 02:56:55 PM PDT
Updated: 07/29/2009 02:56:55 PM PDT
unigram match
ST. PETERSBURG, Fla.—Yankees starting pitcher ChienMing Wang has had season-ending arthroscopic surgery on his
injured right shoulder. Noted orthopedist Dr. James Andrews
performed the procedure Wednesday to repair what Yankees
manager Joe Girardi said was a tear in the
capsule.
team is
Bid
termThematch
awaiting reports from the surgery before setting a potential
timetable for Wang's return next year.
After winning 19 games in 2006 and 2007, Wang missed the
final 3 1/2 months last season after injuring his right foot
while running the bases in Houston. He was 1-6 with a 9.64
ERA in 12 games this season, after missing time from April 19
to May 21 with a hip injury.
Also, the Yankees obtained right-hander Jason Hirsh from
Colorado for a player to named, and pitcher Brett Tomko,
designated for assignment on July 22, was released.
•The baseline model will be
trained by the following
dataset and each model will be
tested by each of the three
dataset
–Ksgrid5 : Keystone data
–10% : 10% uniformly sampled
Ksgrid5 data
–Bcookie: Only the Ksgrid5 data
that has bcookie
Yankee’s ticket sale
Low price everyday
Ad content http://www.somesite.com
Yahoo! Confidential
10
Baseball,
Player,
games
Ad. bid terms
What I Get from the 90 Days
•New experience:
•The Business nature of Yahoo!
•Learned about many different machine learning/data mining research projects
inside Yahoo! Science Lab.
•Talked to other researchers and got advices from them.
•Attended some interesting talked from outside invited speakers.
•Knowledge and skills
•Parallel computing skills using Hadoop software, Pig script language and
Mapreduce framework.
•Data processing on a huge data set (hundreds gigabytes to several terabytes)
•Read a number of machine learning papers for internet data.
•Free T-shirts, Jacket, Cups, Toys, Lots of Coffee (Avg. 2 latte a day)
•People: Interns and full time researchers at Yahoo!
Yahoo! Confidential
11