Round 2 - Amro Khasawneh Blog

Download Report

Transcript Round 2 - Amro Khasawneh Blog

Distributed
knowledge
acquisition
Arabic Lexikon
Game
Created by:
• Amro Khasawneh
• Khalil Abu-Dahab
Web-based
game
Arabic Lexikon
Round 2
Problem
How difficult is the problem being addressed?
How well is it being defined?
• The lack of complete accurate Arabic lexicons presents a
major issue in the domain of computer science and
especially the area of natural language processing, mainly
because developing traditional algorithms requires
extensive testing on some kind of data which, in the case of
Arabic language, is scarce and deficient.
• We address the problem of collecting and building a
database of Arabic language lexicon using a computer
game. This would consists of classification of words by
type, establishing relationships between words and the
most debated subject of words stems.
Through online
games,
people can
collectively
solve large-scale
computational
problems.
Design
Innovation – applications that approach a new problem, or look at an
old problem in a new way
• Several efforts have been devoted to this problem, however not
succeeded in gathering enough data because the manual process
of collecting, entering and verifying these facts is tedious.
• We therefore introduce ArabicLexikon, an entertaining web-based
game that it is amusing, and as a side effect of playing it, we
collect accurate knowledge. (game with a purpose)
• By playing ArabicLexikon, people help us collect data not because
they feel helpful, but because they have fun.
• ArabicLexikon is an example of a new class of games that provide
entertainment in exchange for human processing power. In
essence, we solve a typical computer problem with HumanComputer Interaction alone.
Design
Impact - applications that either impact a large number of people
very broadly, or impact a smaller number of people very deeply
• Whereas previous approaches have relied on experts or
volunteers, we put much stronger emphasis on creating a
system that is appealing to a large audience of people,
regardless of whether or not they are interested in
contributing to Artificial Intelligence. We have transformed
the activity of entering facts into an enjoyable interactive
process taking the form of a game.
• The collected data can be applied towards significantly
improving and testing new algorithms.
Design
Effectiveness – to what degree the application actually solves the
problem in question
• ArabicLexikon can be considered a “human algorithm”: given word
as input, it outputs a set of facts related to the word. Instead of
using a computer processor, we uses ordinary humans interacting
with computers throughout the Web. Our system therefore
significantly contributes to HCI in two ways: it collects valuable
data that can improve CS applications, and it addresses a typical
AI problem with novel HCI tools.
• Accuracy and quality of the data
– We employ a set of design strategies to ensure the accuracy of
facts entered and to prevent cheating which would lead to
corrupted data
•
•
•
•
Random pairing of the players and IP address checks
Limited or no communication between players
Introducing sincerity tests (pre-checked words)
Spelling check & Inappropriate word filters
Final Round
Development
Development prospect:
Were the domain experts consulted?
Is the project open enough for further evolution?
• Several references and papers were consulted, in addition
to talking to some experts in the Arabic language
(professors from YU university).
• Expandability: the system’s backend was designed with a
certain amount of abstraction in mind, which makes it
easily ported to other games such as:
– Relations between words
– Stems and origin of words
– Words translation
• This game can be implemented two ways:
– Symmetrical - the two players are given same word
– Asymmetrical - one player’s output is the other’s input
Development
System intelligence: Does the application leverage the presence of
fresh and updated data?
• The game can be fed with its own output: The
data collected during the game that many players
agreed on can be used as an input to the game.
• In case the list of words in the database become
largely consumed, a simple crawler could be used
on a regular way to collect new data by using
popular search engines APIs.
ArabicLexikon
Thank You
© 2007 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.