Czech to English Translation: Session Preview
Download
Report
Transcript Czech to English Translation: Session Preview
Machine Translation in Academia
and in the Commercial World:
a Contrastive Perspective
Alon Lavie
Research Professor – LTI, Carnegie Mellon University
Co-founder, President and CTO – Safaba Translation Solutions
WMT-2014
June 26, 2014
4/3/2017
WMT-2014
2
LTI Education Committee
• Standing LTI Faculty committee mandated to review
discuss and propose changes to the LTI education
programs and course offerings
• Meets about once a month over lunch
• Primary activities include:
– Reviewing new course proposals from faculty
– Assisting with speaker recruitment for the LTI colloquium
– Special tasks and projects related to our educational programs
• Current members: Bob Frederking, Carolyn Rose, Noah
Smith, Alan Black, Eric Nyberg, Teruko Mitamura, Ralf
Brown, Alon Lavie
December 8, 2011
11-711: Algorithms for NLP
3
LTI Curriculum Review
• Special project the committee took upon itself in the fall
• Goals:
– Develop a more comprehensive understanding of the current
state of our curriculum and how it has evolved over the years
– Are our current course offerings appropriate and necessary for
our graduate programs?
– Do we have significant gaps that need to be filled?
– Analyze student enrollment in our courses, how it has changed
over the years, and draw conclusions
– Draw conclusions regarding potential changes in our course
offerings, their scheduling, frequency, and/or sequencing
– Look at the LTI teaching requirements and salary compensation
model and whether it should be tweaked or modified
LTI Curriculum Review
•
•
So far mostly a fact and information gathering
exercise with some limited amount of analysis
performed by individual committee members
Three main sub-tasks:
–
–
–
A comparison of our LTI course offerings with
similar course offerings at major competing peer
institutions.
An analysis of student enrollment data in our
courses over the past 15 years.
A basic-level comparison of the teaching
requirements and teaching compensation model
used across the various departments and units
within SCS
LTI Curriculum Review
•
•
•
A full report of findings from these three
activities was circulated by email yesterday
I will present highlights from the findings
Faculty discussion and guidance:
–
–
–
–
What other information should we be gathering?
What kinds of analyses would you like to see on this
data?
Goal is to come up with some recommendations
regarding changes to our courses, our programs
and/or our teaching salary compensation model.
Full faculty will get to discuss any proposed changes
Comparison of LTI Course
Offerings with Peer Institutions
• Compiled by Noah Smith and Ralf Brown
• Looked at course offerings at Edinburgh, JHU
and Stanford and attempted to map these to
equivalent courses at LTI/SCS
• Departmental structures are somewhat different
• Table of LTI courses and their corresponding
equivalents
• Table of SCS courses typically taken by LTI
students and their corresponding equivalents
• Table of courses offered by peers that we don’t
have
Comparison of LTI Course
Offerings with Peer Institutions
•
General Findings:
– We are very strong on speech offerings, maybe rivaled by JHU.
– We are stronger than these peers in information retrieval offerings.
– We are relatively weak on linguistics offerings.
•
Courses that make the LTI special, compared to this set of peers:
– Grammars and lexicons (721) has a “grammar engineering” analogue at
Stanford, but is unique in being an LT-oriented introduction to the phenomena of
human language.
– Machine translation (731) as a full-on course
– Structured prediction (763), an advanced statistical NLP course (this course
combines two older courses, Language and Statistics 2 (762) and Information
Extraction (748)).
– Social media analysis (772).
– Software engineering courses (791 and 792) that emphasize language
technologies.
– Inventing future services (794).
– Summarization and personal information management (899).
Comparison of LTI Course
Offerings with Peer Institutions
• Obvious ideas for courses offered by peers but not
by LTI:
– Intro to programming for language technologies, for new
MLTs who lack a CS background. This could become a service
course for CS masters and PhD students from other applied SCS
departments who need to catch up on programming skills
quickly.
– Bioinformatics. Should discuss with faculty in the Lane Center
for Computational Biology.
– Cognitive science of language. Should discuss with faculty
in Psychology.
– Data mining (and text mining); likely of interest to some
students in Tepper and Heinz.
– Corpus linguistics. Should discuss with Linguistics faculty in
Modern Languages, English, and Philosophy.
Enrollment Data Analysis
• Compiled by Bob Frederking
• Based on a spreadsheet generated from a database
dump containing every registration for an 11-xxx course
since Fall 1996.
• There is a line in the spreadsheet for each student in
each class each semester, for a total of 7328 raw data
points.
• Note that this total includes 119xx research registrations
and 11700 LTI Colloquium registrations. These have
been filtered out of the following charts, except where
explicitly shown.
Course Enrollments
Course Enrollments
Course Enrollments
Course Enrollments
Course Enrollments
Course Enrollments
Course Enrollments
Course Enrollments
Total cumulative course enrollments sorted by size
11319 1
11592 1
11746 2
11691 4
11695 6
11749 9
11755 11
11765 15
11744 20
11716 27
11682 38
11742 54
11731 94
11792 213
11791 484
11521 1
11724 1
11513 3
11735 4
11773 6
11782 9
11793 11
11767 15
11796 20
11780 30
11344 39
11748 54
11925 94
11721 239
11700 496
11552 1
11727 1
11541 3
11795 4
11683 7
11693 10
11120 12
11763 18
11756 21
11753 32
11713 40
11732 57
11411 95
11761 292
11910 2519
11554 1
11135 2
11717 3
11726 5
11490 8
11441 11
11617 12
11794 18
11719 24
11743 35
11745 40
11722 68
11935 118
11741 294
11561 1
11390 2
11747 3
11783 5
11511 8
11728 11
11725 13
11929 19
11733 24
11734 36
11754 50
11752 71
11751 148
11711 313
11590 1
11512 2
11531 4
11611 6
11718 9
11736 11
11928 13
11723 20
11899 25
11772 37
11762 51
11920 78
11712 164
11930 451
Discussion