GoogleDictionary
Download
Report
Transcript GoogleDictionary
GoogleDictionary
Paul Nepywoda
Alla Rozovskaya
Goal
Develop a tool for English that, given a word,
will illustrate its usage
Who Will Benefit
Learners of English
Teachers of English
Native speakers who wish to find common
usages of a word
Similar Tools?
Dictionaries
BUT our tool
• focuses on the usage of words and not on
defining their meanings
• ranks expressions based on frequency
• extracts examples straight from context
Similar Tools?
Google
BUT our tool
•
focuses on finding high frequency
neighboring words instead of simply
the documents that contain the target
word
Data Resources
•
•
•
•
Corpus of newspaper articles (3.5 Million words)
[used for demo]
Advantage: large amount of data
Disadvantage: limited domain
Use a search engine to build a corpus of documents
containing the target word
Advantages: various domains, dynamic data source
Disadvantage: time to download documents
Implementation (1)
Search a corpus to determine the most
typical words by extracting words within a
certain window of the target word and rank
words based on their frequencies
-compute rank of single words and pairs of
words within a window
Implementation (2)
•
Computing rank of expression
Tf :
raw count
# sentences _ in _ corpus
# sentences _ containing _ the _ word
•
Idf of a word : log
•
Position Normalization:
Reward context words
closer to the target
Interface
Output ranked list of expressions with
example sentences via the Web
Examples:
course
information
notorious
come
come
(without idf)
Further Improvements
Use a search engine to build a corpus
Allow phrase searching
Provide option to search for highly frequent
phrases as opposed to idiomatic expressions
Conclusion
We have presented a tool that given a word
will find typical usages of the word in natural
language
The tool should be useful for
•
•
learners of English
native speakers