Transcript Slide 1

Haas Career Management Center
Database
IEOR 115
Ken Goldberg
Final Presentation
December, 10, 2011
Samuel Lin | AlliZheng | Tao Luo | Yorke Lee | Carney Lu
Justin Pao | Butian Li | AdibKashem
Overview
Background
Haas School of Business Career
Management Group
Provides candidates access to employers
Organizes on campus career events
Handles 1000+ electronic job postings per year
Ranked #4 by BusinessWeek survey
Maintains relationships with hiring managers and
recruiters to track industry firm nights, career panels,
corporate visits, alumni mixers and job fairs.
Plans workshops, panels, networking events and
receptions
Background
Existing Database Support
Excel
Data stored in unrelated Excel spreadsheets
Custom reports formulated manually by
tracking across many spreadsheets
•Extremely tedious and time consuming
•Possibility of Human Errors
Project Objective
Clients Expectation
Ability to build customized reports instantly
Ability to search for records with specific
properties
Ability to come up with intelligent
recommendations for events, job listings etc. for
students.
Reduce time and stress involved in data entry and
management
Reduce human errors
Project Objective
Database Objective
To provide ranking metrics to magazines such as US
News
To provide all kinds of employment reports including,
but not limited to:
Top Full-Time Employers
Class-wise Full-Time Employment Report
Employment By Industry/ Function
Advance functions
Salary Forecast
Class preferences based to career inclination
Search alumni nearby
EER Diagram
EER Diagram
Relational Schema
Relational Schema
1. Person(pid,fname, lname, mi, street_address, apt_no, zip_code4, salutation, gender)
a.
Student(sid,pid1, date_of_admission, date_of_graduation, department7)
b. Professor(faculty_id,pid1)
c.
Supervisor(pid1, cid2)
d. Staff(SSN,pid1)
e. Advisor(SSN, pid1, department7)
f.
Philanthrophist(pid1)
g.
Potential_applicant(pid1, sourced_through)
h.
Subscriber(pid1)
2. Company (cid, name, address, liaison_name, liaison_tel, liaison_email)
2a. Branch (cid2,branch no., address)
3. University (uid, name, university_location4, liaison_name,liaison_tel, liaison_email)
4. Location (zipcode, country, state, city, latitude, longitude)
5. Job_Listing (listingid, title, description, job_location4, uploaded_by_company2,
salary_range)
6. Job_Offer (sid1, company_id2, position, job_location4, salary, bonus,
stock_options, date, accepted_or_declined)
7. Department (did, name, head_of_department1b)
8. Course_instance (ccn, course_no8a, semester, instructor1b)
8a. Course(course_no, title, department7)
Relational Schema
9. Magazine_Publication (mid, name, address, liaison_name, liaison_tel,
liaison_email)
10. Data_Request (drid, date, magazine_or_publication9, data_requested,
handled_by_staff1d, deadline, fulfillment_status)
11. Ranking (rid, magazine_or_publication9, date, ranking_details)
12. Social_Media_Account (acc_id, profile_link, staff_in_charge1d, acc_email,
acc_password)
13. Social_Media_Subscription(pid1, social_media_acc_id12, profile_email)
14. Announcement (aid, date, title, description, posted_by_staff1d,
posted_on_account12)
15. Event (eid, created_by_staff1d, held_at_venue17, caterer_id16, title, date)
a.
Workshop (eid15, advisor_in_charge1e)
b. Recruitment_Event (eid15, advisor_in_charge1e)
c.
Alumni_Mixer (eid15)
d. Outreach_Event (eid15, hosted_by_university3)
16. Caterer (caterer_id, name, address, liaison_name, liaison_tel, liaison_email)
17. Venue (vid, building, room, contact_name, contact_tel, contact_email)
18. Donation (donation_id, donation_date, donation_amount, donated_by1,
received_by1dspecial_notes)
Relational Schema
•These are junction tables to help describe the n to nrelationships
19. Working_Experience(sid1a, company_worked_for2, type_of_employment,
start_date, end_date)
20. Studied_At(sid1a, school_attended3, degree_type, field, start_date, end_date,
GPA)
21. Supervisor_Recommendation(sid1a, supervisor_pid1, cid2, date, letter)
22. Professor_Recommendation(sid1a, professor_pid1, date, letter)
23. Event_Attendance(eid15, attendee_pid1)
24. Professor_In_Department(prof_pid1b, department7 )
25. Event_Invite(eid15, cid2)
26. Advisor_Hosts_Event(pid1,eid15)
27. Venue_Booking(staff_in_charge1d, booking_for_event15, date_reserved,
venue_booked17)
28. Caterer_Order(staff_in_charge1d, caterer_id16, date_of_order)
29. Applies_For(sid, listingid, date, application)
30. Student Takes Course in Semester Course_Taken(sid1, ccn8,grade)
31. Professor Teaches Course in Semester Course_Taught(prof_pid1b, ccn8)
32. Email_Address(pid1, email_address)
33. Contact_Number(pid1, contact_no)
34. Student_In_Department(sid1, department7)
35. Advisor_In_Department(advisor_id1e, department7)
Normalization Analysis
Normalization
Unnormalized Relation
1.
Person (pid,fname, lname, mi, street_address, apt_no, city, state,
country, zip_code, contact_no, email, salutation, gender)
1a. Student(sid,pid1, date_of_admission, date_of_graduation,
department)
1e. Advisor(SSN, pid1, department7)
1NF
Person (pid,fname, lname, mi, street_address, apt_no, zip_code4,
salutation, gender)
1a.
Student(sid,pid1, date_of_admission, date_of_graduation)
1e.Advisor(advisor_id,SSN, pid1)
32. Email_Address(pid1, email_address)
33. Contact_Number(pid1, contact_no)
24. Student_In_Department(sid1, department7)
25. Advisor_In_Department(advisor_id1e, department7)
1.
Normalization
Unnormalized Relation
1. Person(pid,fname, lname, mi, street_address, apt_no,
city, state, country, zip_code, contact_no, email, salutation,
gender)
2NF
1. Person (pid,fname, lname, mi, street_address, apt_no,
zip_code4, salutation, gender)
4. Location (zipcode, country, state, city, latitude, longitude)
3NF (Not Implemented)
4. Location (zipcode, country, state, city)
5. City_Location(Longitude,Latitude, City)
Normalization
Unnormalized Relation
8. Course (course_id, title, department7, semester,
instructor1b)
BCNF
8. Course_instance (ccn, course_no8a, semester,
instructor1b)
8a. Course(course_no, title, department7)
Queries
Queries
Query 1: Salary forecast
Description:
•Given a selected industry, predict the expected average salary for a
new job position within that industry over the next X years
•Useful to help job-seeking graduates easily visualize the salary trends
for each industry
Queries
Java Code:
•Connects to Access database and executes the sql query:
select o.year_offered, avg(o.salary)
from [Job Offer] as o, Company as c, Department as i
where o.status = 'Accepted' and o.cid = c.cid and c.iid = <iid>
group by o.year_offered
order by o.year_offered ASC;
•This extracts the average salary for each year for the given
industry.
• The Java code performs a linear regression in order to determine
the linear relationship between the year and expected salary, then
creates a new table and inserts those predicted values (over the
next X years) into the database
Queries
Demo
Queries
Query 2: Nearby alumni
Description:
•Given a selected city, select all the alumni who are currently working within X
km from that city
•Used for current students to seek out seniors in the region (for advice, meet-ups),
or to invite alumni to attend events that are being held in a certain city
Queries
VBA Code:
• Contains function needed to determine distance between two geographical
coordinates
• Iterates through location database and determines which cities are within x
km of given city using the haversine formula:
• Creates and populates NearbyCities table with list of cities
To retrieve all alumni working in nearby cities:
SELECT s.SID, c.city, p.First_Name, p.Last_Name, o.position
FROM [Job Offer] AS o, nearbyCities AS c, Student AS s, Person AS p
WHERE o.sid=s.SID And o.job_location=c.LID And p.pid=s.pid;
Queries
Demo
Queries
Query 3: Donation Capacity
Description
•Ranks alumni by their donation capacity
•The donation capacity of each alumni is calculated by a weighted
combination of several factors: salary, years out of school, donation history
•The weights of these factors were researched and simplified from research
papers) to result in the following formula:
Donation_Capacity = 0.26 * salary/10000 + 0.17 * years_out_of_school +
0.64 * sum_of_previous_donations/5000
Queries
Retrieve the last recorded salary of each student
SQL:
SELECT s.sid, o.salary
FROM student AS s, [Job Offer] AS o
WHERE o.sid = s.sid and o.status = 'Accepted' and o.[date accepted] in
(
select max(o2.[date accepted])
from [Job Offer] o2
where s.sid = o2.sid
);
Retrieve the sum of all donations made by each student
SQL:
SELECT s.pid, Sum(d.donation_amount) AS total_donations
FROM student AS s, donation AS d
WHERE d.donated_by=s.pid
GROUP BY s.pid;
Queries
SQL:
SELECT DISTINCT s.sid, p.First_name, p.Last_Name,
((0.26*ls.salary)/10000+(0.64*ds.total_donations)/5000+0.17*((Date()-s.[date of
graduation])/365)) AS Donation_Capacity
FROM person AS p, student AS s, donation AS d,
(SELECT s2.pid, Sum(d2.donation_amount) AS total_donations
FROM student AS s2, donation AS d2
WHERE d2.donated_by=s2.pid
GROUP BY s2.pid) AS ds,
(SELECT s3.sid, o3.salary
FROM student AS s3, [Job Offer] AS o3
WHERE o3.sid = s3.sid and o3.status = 'Accepted' and o3.date accepted in
( select max(o2.[date accepted])
from [Job Offer] o2
where s3.sid = o2.sid )) AS ls
WHERE s.sid=ls.sid And s.pid=p.pid And s.pid=ds.pid
ORDER BY ((0.26*ls.salary)/10000+(0.64*ds.total_donations)/5000+0.17*((Date()-s.[date of
graduation])/365)) DESC;
Queries
Query 3: Donation Capacity
Queries
Query 4: Magazine Influence
Description
•Rank each business magazine/publication rankings by their
influence on the number of job offers our graduates get
•Help the career management center prioritize data requests from
each magazine/publication
•Perform a linear regression to obtain the coefficient of
determination for each magazine
Queries
Query 4: Magazine Influence
Java Code
•Connects to Access database and executes the dynamically created SQL
query for each magazine in the table:
SELECT Count(*) AS Offer_Count, r.ranking, o.year_offered
FROM [Job Offer] AS o, ranking AS r, [Magazine Publication] AS mp
WHERE (((r.magazine_or_publication)=mp.mid) And ((o.year_offered)=r.year)
And ((mp.mid)=<selected mid>))
GROUP BY o.year_offered, r.ranking;
•This extracts a pairing of the number of offers received by graduates
together with how highly we were ranked by that magazine that year. With
these data pairs
•Performs a regression to calculate the correlation factor between number
of job offers received and our ranking in the magazine
•Writes these correlation factors back into the database under a table
called Influence
Queries
Query 4: Magazine Influence
SQL Code to retrieve magazines sorted by influence:
SELECT i.mid, m.name, i.influence_factor
FROM [Magazine Publication] AS m, Influence AS i
WHERE m.mid=i.mid
ORDER BY i.influence_factor DESC;
Queries
Query 5: Course Recommendations
Description
•Suggest recommended courses a student should take if he wants to enter
a specific industry
•Ranks each class by the number of times they were enrolled in by
students who end up receiving offers from companies in a particular
industry
SQL
SELECT c.course_id, c.title, Count(*) AS course_count
FROM course AS c, student AS s, [Job Offer] AS o, course_taken AS ct, company
AS com
WHERE o.sid=s.sid And ct.sid=s.sid And ct.course_id=c.course_id And
o.cid=com.cid And com.iid= <selected industry>
GROUP BY c.course_id, c.title
ORDER BY Count(*) DESC;
Queries
Query 5: Course Recommendations
to ask
Q
and