投影片 1 - 雲林科技大學

Download Report

Transcript 投影片 1 - 雲林科技大學

國立雲林科技大學
National Yunlin University of Science and Technology
Correct your text with Google
Presenter : You Lin Chen
Authors : St´ ephanie Jacquemont, Franc ¸ois Jacquenet,
Marc Sebban
2007.WI.7
1
Intelligent Database Systems Lab
Outline

Motivation

Objective

Methodology

Experiments

Conclusion

Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Motivation
N.Y.U.S.T.
I. M.

In our everyday life, when one wants to find the correct
spelling of words, many people are used to ask some help
from Google.

At the moment, there are mainly three well known spell
checkers (Ispell, Aspell and Word).
500 misspellings,100
500 misspellings,
words are correct
30 words found first
suggestions,
on list,
SI=100/500*100=20%
SIF=30/500*100=6%
3
Intelligent Database Systems Lab
Objectives
N.Y.U.S.T.
I. M.

Increase the SI and SIF , to take into account the whole
sentence that contains this misspelling.

We propose to use the Google search engine and some
machine learning techniques.
4
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
5
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Features that do not use the Web
 Size
 Distance
 Position of error
 Punctuation tag
 Frontier tag
 Keyboard distance
Question:edit distance (cs,long T)
Misspelling T=Micrasoft
> editPosition(T,CS)=5/9
distance (cs,short T)
Frontier Words Belong
Be or long
6
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Features that use theWeb
 Frequency of a candidate suggestion
 Probability of 2-grams
 Probability of left 3-grams
 Probability of centered 3-grams
 Probability of co-occurrence
陳冠C ?
7
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
8
Intelligent Database Systems Lab
Conclusion

N.Y.U.S.T.
I. M.
We have presented the WebSpell system which is able to spell
check a document thanks to the help of the Web and more
precisely the Google search engine.
9
Intelligent Database Systems Lab
Comments

Advantage


…
Drawback


N.Y.U.S.T.
I. M.
…
Application

…
10
Intelligent Database Systems Lab