GoogleDictionary

Download Report

Transcript GoogleDictionary

GoogleDictionary
Paul Nepywoda
Alla Rozovskaya
Goal

Develop a tool for English that, given a word,
will illustrate its usage
Who Will Benefit



Learners of English
Teachers of English
Native speakers who wish to find common
usages of a word
Similar Tools?
Dictionaries
BUT our tool
• focuses on the usage of words and not on
defining their meanings
• ranks expressions based on frequency
• extracts examples straight from context

Similar Tools?

Google
BUT our tool
•
focuses on finding high frequency
neighboring words instead of simply
the documents that contain the target
word
Data Resources

•
•

•
•
Corpus of newspaper articles (3.5 Million words)
[used for demo]
Advantage: large amount of data
Disadvantage: limited domain
Use a search engine to build a corpus of documents
containing the target word
Advantages: various domains, dynamic data source
Disadvantage: time to download documents
Implementation (1)
Search a corpus to determine the most
typical words by extracting words within a
certain window of the target word and rank
words based on their frequencies
-compute rank of single words and pairs of
words within a window

Implementation (2)

•
Computing rank of expression
Tf :
raw count
# sentences _ in _ corpus
# sentences _ containing _ the _ word
•
Idf of a word : log
•
Position Normalization:
Reward context words
closer to the target
Interface
Output ranked list of expressions with
example sentences via the Web
Examples:

course
information
notorious
come
come
(without idf)
Further Improvements



Use a search engine to build a corpus
Allow phrase searching
Provide option to search for highly frequent
phrases as opposed to idiomatic expressions
Conclusion


We have presented a tool that given a word
will find typical usages of the word in natural
language
The tool should be useful for
•
•
learners of English
native speakers