Omar LAROUK CERSI - enssib IUT
Download
Report
Transcript Omar LAROUK CERSI - enssib IUT
INTEGRATION OF THE TEXTUAL DATA FOR
INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC
INFORMATION OF VICINITY
Omar LAROUK
ELICO -ENS SIB
University of Lyon-France
Computational linguistics & Information Science
École Nationale Supérieure des Sciences de l’Information et des Bibliothèques
ENSSIB-Lyon, Université de Lyon, France
1
FRAMEWORK PRESENTATION
- IMPORTANT PRODUCTION OF TEXTS (web)
- IMPORTANT RETRIEVAL OF TEXTS BY USERS
INFORMATION
• AIM OF PROJECT : -LOOK FOR A SYSTEM WHO :
-TO AUTOMATE MANUAL PROCESS OF INDEXING
-TO STRUCTURE THE INFORMATION
-TO PERMIT TO OBTAIN ‘pertinent’ INFORMATION
FRAMEWORK PRESENTATION
METHOD :
Elaboration of Filtering DATABASES from informations
extracted from the textual data
– Use of linguistic techniques for
processing documents (textual data).
– Search strategies that allow the user to
exploit documents.
Why linguistic reference in
information retrieval
• We think that looking of information is a
presupposition of the speaker in a context at first of
indexing of the document (during the creation) as if
the designer pronounce a contextual sentence.
• The absence of words presupposed in the
answer given by the server (search engines)
can be interpreted as the lack at the truth of
this sentence in this context of indexation.
Information Retrieval on the WEB
Co-operative process for reformulation
Queries
Q0, Q1 , ...Qk , ..., Qn
user(s)
WEB
Answers
A0, A1 , ...Ak , ..., An
ex: Google,
Yahoo
Title,
Abstract,
Linguistic problem of Search Engines :
• A Search Engine is used to retrieve electronic
documents on the Web but they not use linguistic
analysis. When a user types a query:
– the engine returns a ranked list of documents
that contain exactly all the words of its request
(query processing).
– looking for documents made by the user requires
writing a query in the form of few words, so that
the search engine is to return the URL of classified
documents containing those words exactly.
user-Queries
system-Answers
steps:
Q0
Q1
Q2
A0
A1
A2
Qk
Ak+1
Qn-1
Qn
An-1
An
...
...
s0
s1
...
sk
sn-1
sn
An = nn is the filtered final answers with the
oriented needs where n etn
20
Documentary Information system
• Documentary automatization is blocked by the
problem of indexing and interrogation
Use natural language as textual data for automatic
indexing because language integrates the
contextual and temporal factor through
connectors, verbs, adverbs, etc..
• We use the LINGUISTICAL
APPROACH
Queries : simple or complex Predicates
• Questions : simple Predicates
– /project/, /law/, /station/, /flag/, /black/,
/white/,..
• combining predicate :
– / white flag/,
/white and black/, /.....
• Propagation of the predicates left and/or right
– /... project of law ... / ;
– / ... flag white and black/... /; ...
LINGUISTIC INFORMATION OF VICINITY:
Hierarchic informational
NP or linguistical
expression
left_predicate
Predicate
predicate_right
Les produits informatiques de la société
computer products company
in French
in English
15
Reformation of questions
We propose to processing natural language
using the contribution of reformulate the
queries for computer-human dialogue.
The reformulation of questions in natural
language using classical technique for
information retrieval in databases, but the
new systems of question/answers are a very
open in the web content.
we present a study based on the formulation of
‘questions key’ based on ‘core predicates’
enrichment by left or right of center of terms.
How the search engine gives responses ?
• For example, if you search 'informati*', it does not contain,
computerization (informatisation in French, ….).
Lack of technical language in search engines
__________________________________________
The search tools and web database following are not
equipped with linguistic tools :
-Information retrieval in structured documents ,
-Multilingual Information retrieval,
- Personalized Information Retrieval ,
- Information automatic processing of natural language ,
- Search for information based on Ontology ,
-Multimedia Information Retrieval ,
-Answers/Questions,
-etc.