Transcript Slide 1
Haas Career Management Center
Database
IEOR 115
Ken Goldberg
Final Presentation
December, 10, 2011
Samuel Lin | AlliZheng | Tao Luo | Yorke Lee | Carney Lu
Justin Pao | Butian Li | AdibKashem
Overview
Background
Haas School of Business Career
Management Group
Provides candidates access to employers
Organizes on campus career events
Handles 1000+ electronic job postings per year
Ranked #4 by BusinessWeek survey
Maintains relationships with hiring managers and
recruiters to track industry firm nights, career panels,
corporate visits, alumni mixers and job fairs.
Plans workshops, panels, networking events and
receptions
Background
Existing Database Support
Excel
Data stored in unrelated Excel spreadsheets
Custom reports formulated manually by
tracking across many spreadsheets
•Extremely tedious and time consuming
•Possibility of Human Errors
Project Objective
Clients Expectation
Ability to build customized reports instantly
Ability to search for records with specific
properties
Ability to come up with intelligent
recommendations for events, job listings etc. for
students.
Reduce time and stress involved in data entry and
management
Reduce human errors
Project Objective
Database Objective
To provide ranking metrics to magazines such as US
News
To provide all kinds of employment reports including,
but not limited to:
Top Full-Time Employers
Class-wise Full-Time Employment Report
Employment By Industry/ Function
Advance functions
Salary Forecast
Class preferences based to career inclination
Search alumni nearby
EER Diagram
EER Diagram
Relational Schema
Relational Schema
1. Person(pid,fname, lname, mi, street_address, apt_no, zip_code4, salutation, gender)
a.
Student(sid,pid1, date_of_admission, date_of_graduation, department7)
b. Professor(faculty_id,pid1)
c.
Supervisor(pid1, cid2)
d. Staff(SSN,pid1)
e. Advisor(SSN, pid1, department7)
f.
Philanthrophist(pid1)
g.
Potential_applicant(pid1, sourced_through)
h.
Subscriber(pid1)
2. Company (cid, name, address, liaison_name, liaison_tel, liaison_email)
2a. Branch (cid2,branch no., address)
3. University (uid, name, university_location4, liaison_name,liaison_tel, liaison_email)
4. Location (zipcode, country, state, city, latitude, longitude)
5. Job_Listing (listingid, title, description, job_location4, uploaded_by_company2,
salary_range)
6. Job_Offer (sid1, company_id2, position, job_location4, salary, bonus,
stock_options, date, accepted_or_declined)
7. Department (did, name, head_of_department1b)
8. Course_instance (ccn, course_no8a, semester, instructor1b)
8a. Course(course_no, title, department7)
Relational Schema
9. Magazine_Publication (mid, name, address, liaison_name, liaison_tel,
liaison_email)
10. Data_Request (drid, date, magazine_or_publication9, data_requested,
handled_by_staff1d, deadline, fulfillment_status)
11. Ranking (rid, magazine_or_publication9, date, ranking_details)
12. Social_Media_Account (acc_id, profile_link, staff_in_charge1d, acc_email,
acc_password)
13. Social_Media_Subscription(pid1, social_media_acc_id12, profile_email)
14. Announcement (aid, date, title, description, posted_by_staff1d,
posted_on_account12)
15. Event (eid, created_by_staff1d, held_at_venue17, caterer_id16, title, date)
a.
Workshop (eid15, advisor_in_charge1e)
b. Recruitment_Event (eid15, advisor_in_charge1e)
c.
Alumni_Mixer (eid15)
d. Outreach_Event (eid15, hosted_by_university3)
16. Caterer (caterer_id, name, address, liaison_name, liaison_tel, liaison_email)
17. Venue (vid, building, room, contact_name, contact_tel, contact_email)
18. Donation (donation_id, donation_date, donation_amount, donated_by1,
received_by1dspecial_notes)
Relational Schema
•These are junction tables to help describe the n to nrelationships
19. Working_Experience(sid1a, company_worked_for2, type_of_employment,
start_date, end_date)
20. Studied_At(sid1a, school_attended3, degree_type, field, start_date, end_date,
GPA)
21. Supervisor_Recommendation(sid1a, supervisor_pid1, cid2, date, letter)
22. Professor_Recommendation(sid1a, professor_pid1, date, letter)
23. Event_Attendance(eid15, attendee_pid1)
24. Professor_In_Department(prof_pid1b, department7 )
25. Event_Invite(eid15, cid2)
26. Advisor_Hosts_Event(pid1,eid15)
27. Venue_Booking(staff_in_charge1d, booking_for_event15, date_reserved,
venue_booked17)
28. Caterer_Order(staff_in_charge1d, caterer_id16, date_of_order)
29. Applies_For(sid, listingid, date, application)
30. Student Takes Course in Semester Course_Taken(sid1, ccn8,grade)
31. Professor Teaches Course in Semester Course_Taught(prof_pid1b, ccn8)
32. Email_Address(pid1, email_address)
33. Contact_Number(pid1, contact_no)
34. Student_In_Department(sid1, department7)
35. Advisor_In_Department(advisor_id1e, department7)
Normalization Analysis
Normalization
Unnormalized Relation
1.
Person (pid,fname, lname, mi, street_address, apt_no, city, state,
country, zip_code, contact_no, email, salutation, gender)
1a. Student(sid,pid1, date_of_admission, date_of_graduation,
department)
1e. Advisor(SSN, pid1, department7)
1NF
Person (pid,fname, lname, mi, street_address, apt_no, zip_code4,
salutation, gender)
1a.
Student(sid,pid1, date_of_admission, date_of_graduation)
1e.Advisor(advisor_id,SSN, pid1)
32. Email_Address(pid1, email_address)
33. Contact_Number(pid1, contact_no)
24. Student_In_Department(sid1, department7)
25. Advisor_In_Department(advisor_id1e, department7)
1.
Normalization
Unnormalized Relation
1. Person(pid,fname, lname, mi, street_address, apt_no,
city, state, country, zip_code, contact_no, email, salutation,
gender)
2NF
1. Person (pid,fname, lname, mi, street_address, apt_no,
zip_code4, salutation, gender)
4. Location (zipcode, country, state, city, latitude, longitude)
3NF (Not Implemented)
4. Location (zipcode, country, state, city)
5. City_Location(Longitude,Latitude, City)
Normalization
Unnormalized Relation
8. Course (course_id, title, department7, semester,
instructor1b)
BCNF
8. Course_instance (ccn, course_no8a, semester,
instructor1b)
8a. Course(course_no, title, department7)
Queries
Queries
Query 1: Salary forecast
Description:
•Given a selected industry, predict the expected average salary for a
new job position within that industry over the next X years
•Useful to help job-seeking graduates easily visualize the salary trends
for each industry
Queries
Java Code:
•Connects to Access database and executes the sql query:
select o.year_offered, avg(o.salary)
from [Job Offer] as o, Company as c, Department as i
where o.status = 'Accepted' and o.cid = c.cid and c.iid = <iid>
group by o.year_offered
order by o.year_offered ASC;
•This extracts the average salary for each year for the given
industry.
• The Java code performs a linear regression in order to determine
the linear relationship between the year and expected salary, then
creates a new table and inserts those predicted values (over the
next X years) into the database
Queries
Demo
Queries
Query 2: Nearby alumni
Description:
•Given a selected city, select all the alumni who are currently working within X
km from that city
•Used for current students to seek out seniors in the region (for advice, meet-ups),
or to invite alumni to attend events that are being held in a certain city
Queries
VBA Code:
• Contains function needed to determine distance between two geographical
coordinates
• Iterates through location database and determines which cities are within x
km of given city using the haversine formula:
• Creates and populates NearbyCities table with list of cities
To retrieve all alumni working in nearby cities:
SELECT s.SID, c.city, p.First_Name, p.Last_Name, o.position
FROM [Job Offer] AS o, nearbyCities AS c, Student AS s, Person AS p
WHERE o.sid=s.SID And o.job_location=c.LID And p.pid=s.pid;
Queries
Demo
Queries
Query 3: Donation Capacity
Description
•Ranks alumni by their donation capacity
•The donation capacity of each alumni is calculated by a weighted
combination of several factors: salary, years out of school, donation history
•The weights of these factors were researched and simplified from research
papers) to result in the following formula:
Donation_Capacity = 0.26 * salary/10000 + 0.17 * years_out_of_school +
0.64 * sum_of_previous_donations/5000
Queries
Retrieve the last recorded salary of each student
SQL:
SELECT s.sid, o.salary
FROM student AS s, [Job Offer] AS o
WHERE o.sid = s.sid and o.status = 'Accepted' and o.[date accepted] in
(
select max(o2.[date accepted])
from [Job Offer] o2
where s.sid = o2.sid
);
Retrieve the sum of all donations made by each student
SQL:
SELECT s.pid, Sum(d.donation_amount) AS total_donations
FROM student AS s, donation AS d
WHERE d.donated_by=s.pid
GROUP BY s.pid;
Queries
SQL:
SELECT DISTINCT s.sid, p.First_name, p.Last_Name,
((0.26*ls.salary)/10000+(0.64*ds.total_donations)/5000+0.17*((Date()-s.[date of
graduation])/365)) AS Donation_Capacity
FROM person AS p, student AS s, donation AS d,
(SELECT s2.pid, Sum(d2.donation_amount) AS total_donations
FROM student AS s2, donation AS d2
WHERE d2.donated_by=s2.pid
GROUP BY s2.pid) AS ds,
(SELECT s3.sid, o3.salary
FROM student AS s3, [Job Offer] AS o3
WHERE o3.sid = s3.sid and o3.status = 'Accepted' and o3.date accepted in
( select max(o2.[date accepted])
from [Job Offer] o2
where s3.sid = o2.sid )) AS ls
WHERE s.sid=ls.sid And s.pid=p.pid And s.pid=ds.pid
ORDER BY ((0.26*ls.salary)/10000+(0.64*ds.total_donations)/5000+0.17*((Date()-s.[date of
graduation])/365)) DESC;
Queries
Query 3: Donation Capacity
Queries
Query 4: Magazine Influence
Description
•Rank each business magazine/publication rankings by their
influence on the number of job offers our graduates get
•Help the career management center prioritize data requests from
each magazine/publication
•Perform a linear regression to obtain the coefficient of
determination for each magazine
Queries
Query 4: Magazine Influence
Java Code
•Connects to Access database and executes the dynamically created SQL
query for each magazine in the table:
SELECT Count(*) AS Offer_Count, r.ranking, o.year_offered
FROM [Job Offer] AS o, ranking AS r, [Magazine Publication] AS mp
WHERE (((r.magazine_or_publication)=mp.mid) And ((o.year_offered)=r.year)
And ((mp.mid)=<selected mid>))
GROUP BY o.year_offered, r.ranking;
•This extracts a pairing of the number of offers received by graduates
together with how highly we were ranked by that magazine that year. With
these data pairs
•Performs a regression to calculate the correlation factor between number
of job offers received and our ranking in the magazine
•Writes these correlation factors back into the database under a table
called Influence
Queries
Query 4: Magazine Influence
SQL Code to retrieve magazines sorted by influence:
SELECT i.mid, m.name, i.influence_factor
FROM [Magazine Publication] AS m, Influence AS i
WHERE m.mid=i.mid
ORDER BY i.influence_factor DESC;
Queries
Query 5: Course Recommendations
Description
•Suggest recommended courses a student should take if he wants to enter
a specific industry
•Ranks each class by the number of times they were enrolled in by
students who end up receiving offers from companies in a particular
industry
SQL
SELECT c.course_id, c.title, Count(*) AS course_count
FROM course AS c, student AS s, [Job Offer] AS o, course_taken AS ct, company
AS com
WHERE o.sid=s.sid And ct.sid=s.sid And ct.course_id=c.course_id And
o.cid=com.cid And com.iid= <selected industry>
GROUP BY c.course_id, c.title
ORDER BY Count(*) DESC;
Queries
Query 5: Course Recommendations
to ask
Q
and