Cross Language Information Exploitation of Arabic Dr

Download Report

Transcript Cross Language Information Exploitation of Arabic Dr

Cross Language Information
Exploitation of Arabic
Dr. Elizabeth D. Liddy
Center for Natural Language Processing
School of Information Studies
Syracuse University
Why Cross-Language Systems Matter
• There are approximately 4,500 living languages
• 32 million Americans switch from English to
another language when they get home from
work (U.S. Census 1990)
• Internationally, some who wish to do harm to
the US communicate in other languages
• There are too few intel analysts who know the
languages of interest
Internet Language Statistics
http://global-reach.biz/globstats/
Internet Language Statistics (2)
http://www.glreach.com/globstats/evol.htm
l
How Cross-Language Retrieval Works
•
•
User who speaks just one language asks their
question of the system in that language
Cross-language retrieval system:
•
•
•
•
•
•
Will have indexed documents (e.g. foreign reports,
emails, message traffic) written in other languages
Translates user query into language of the
documents
Matches translated query against document index
Produces a ranked list of relevant documents that
are automatically translated into user’s language
User then reads documents in their own
language
User can now make more fully informed
decisions
SU’s Cross-Language Retrieval Research
•
Have produced systems for French, Spanish,
Japanese
– DARPA, Intel, & corporate funding of $3.5 million
•
Currently working in Dutch and Chinese
– 2nd demo is of cross-language English-Chinese on a
patent database from China
•
•
Today’s funding announcement will enable us
to specialize our current cross-language
retrieval capabilities for Arabic
Future work on information extraction and
visualization in Arabic is of keen interest
1. LIVIA – English/English IR System
• Accepts users’ natural language expressions of
complex information needs
• Provides precise retrieval against government
compiled documents about terrorist activities
• Core technology funded by DARPA and
Syracuse Research Corporation
• Demo’d by Ozgur Yilmazel
2. English-Chinese Retrieval Demo
• Cross – Language Retrieval of English queries
against a Chinese patent database
– Development funded by Unilever Corp, a
multinational corporation which owns 140
companies in more than 100 countries
• Jiang Ping Chen
– PhD student in School of Information Studies
Look to the Future
• Incorporate the next level of sophistication in
Information Exploitation into Arabic
• Here seen in English
– Adds Information Extraction as next step to
Information Retrieval
• Seek your ongoing support for its extension into
Arabic
Thank You!
Questions?
Care to try a query on LIVIA!