Semantic Search Tutorial

Download Report

Transcript Semantic Search Tutorial

Semantic Search Tutorial
Introduction
Miriam Fernandez | KMI, Open University, UK
Thanh Tran | Institute AIFB, KIT, DE
Peter Mika| Yahoo Research, Spain
Search
Document Retrieval vs. Data Retrieval
• Differences of search technologies
– Representation of the user need (query model)
– Representation of the underlying resources (data model) and
– Matching technique
• Information Retrieval (IR) support the retrieval of documents
(document retrieval)
– Representation based on lightweight syntax-centric models
– Work well for topical search
– Not so well for more complex information needs
• Database (DB) and Knowledge-based Systems (KB) deliver more
precise answers (data retrieval)
– More expressive models
– Allow for complex queries
– Retrieve concrete answers that precisely match queries
Semantic Search
• Semantic search can be seen as a retrieval
paradigm, that is
– Centered on the use of semantics
– Precisely: incorporates semantics entailed in query &
resources into the matching process
– Broadly: incorporates the semantics into some steps
throughout the process
• Wide range of semantic search systems
– Employ semantic models of varying expressivity at
different steps
Semantic Models
• Semantics is concerned with the meaning of the resources made
available for search
• Meaning is established through semantic models
• Linguistic model
– Model relationships at the world level
– Taxonomies, thesauri
• Conceptual model (“explicit”)
– Mole relationships between syntactic elements denoting entities of
universe of discourse
– Interpretation: mapping of syntactic elements to universe of discourse
• Expressivity
– Number & kinds of modeling constructs
• Formality
– Interpretations are computable
4
Categories of Semantic Search
• DB and KB systems belong to “heavyweight semantic
search systems”
–
–
–
–
Explicit and formal models of semantics, e.g.
Entity Relationship schemas, and
Knowledge models available in RDF(S) and OWL
Mainly semantic data retrieval systems
• Semantics-based IR systems belong “lightweight semantic
search systems”
– Lightweight semantic models, e.g. taxonomies and thesauri
– Semantic data (RDF) embedded in or associated with
documents
– Semantic document retrieval systems
Semantic Search – a Process View
Knowledge Representation
Query
Construction
Query
Processing
Result
Presentation
Query
Refinement
•Keywords
•Forms
•NL
•Formal language
Semantic Models
Resources
•IR-style matching & ranking
•DB-style precise matching
•KB-style matching & inferences
•Query visualization
•Document and data presentation
•Summarization
Documents
•Implicit feedback
•Explicit feedback
•Incentives
Document Representation
6
Convergence of DB & KB & IR on the Web
• Trend: increased availability of structured and semantic data
• Data Web Search
– Semantic data retrieval & inferences in larger Web scenario
– Differences: scale, heterogeneity, quality, vague information needs
– Adopt IR technologies for scalability, to deal with quality problem
of Web data
• Document Web Search
– Database and Semantic Web technologies are applied to the IR
problem to exploit richly structured and highly expressive data
– Examples: Yahoo, Google, Bing Facebook
• Convergence in terms of the
– Data and techniques
Semantic Search Systems
Semantic search systems might combine a
range of techniques, ranging from statisticsbased IR methods for ranking, database
methods for efficient indexing and query
processing, up to complex reasoning
techniques for making inferences!
Agenda
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
9am: Introduction
9:15am: Representation of the Serch Space
---------------------------------------------------------------------10:00am: Coffee Break
10:15am: Data Preprocessing: Indexing & Crawling
11:30am: Query Processing
---------------------------------------------------------------------12:30pm: Lunch Break
2:00pm: Query Processing
2:20pm: Demo
2:30pm: Ranking
---------------------------------------------------------------------3:30pm: Coffee Break
3:45pm: Demo
4:00pm: Result Presentation
4:15pm: Demo
4:30pm: Evaluation
5:00pm: Discussion and wrap-up
5:30pm:End