投影片 1 - 雲林科技大學
Download
Report
Transcript 投影片 1 - 雲林科技大學
國立雲林科技大學
National Yunlin University of Science and Technology
Correct your text with Google
Presenter : You Lin Chen
Authors : St´ ephanie Jacquemont, Franc ¸ois Jacquenet,
Marc Sebban
2007.WI.7
1
Intelligent Database Systems Lab
Outline
Motivation
Objective
Methodology
Experiments
Conclusion
Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Motivation
N.Y.U.S.T.
I. M.
In our everyday life, when one wants to find the correct
spelling of words, many people are used to ask some help
from Google.
At the moment, there are mainly three well known spell
checkers (Ispell, Aspell and Word).
500 misspellings,100
500 misspellings,
words are correct
30 words found first
suggestions,
on list,
SI=100/500*100=20%
SIF=30/500*100=6%
3
Intelligent Database Systems Lab
Objectives
N.Y.U.S.T.
I. M.
Increase the SI and SIF , to take into account the whole
sentence that contains this misspelling.
We propose to use the Google search engine and some
machine learning techniques.
4
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
5
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
Features that do not use the Web
Size
Distance
Position of error
Punctuation tag
Frontier tag
Keyboard distance
Question:edit distance (cs,long T)
Misspelling T=Micrasoft
> editPosition(T,CS)=5/9
distance (cs,short T)
Frontier Words Belong
Be or long
6
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
Features that use theWeb
Frequency of a candidate suggestion
Probability of 2-grams
Probability of left 3-grams
Probability of centered 3-grams
Probability of co-occurrence
陳冠C ?
7
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
8
Intelligent Database Systems Lab
Conclusion
N.Y.U.S.T.
I. M.
We have presented the WebSpell system which is able to spell
check a document thanks to the help of the Web and more
precisely the Google search engine.
9
Intelligent Database Systems Lab
Comments
Advantage
…
Drawback
N.Y.U.S.T.
I. M.
…
Application
…
10
Intelligent Database Systems Lab