4 BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
Download
Report
Transcript 4 BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
Cooperative Answering Systems in Big Data
Géraud FOKOU, Stéphane JEAN, Allel HADJALI
LIAS/ENSMA-University of Poitiers, FRANCE
BIG DATA – 2014, Chasseneuil, France
BIG DATA CONTEXT
Increase of Data Production
o Sensoring Data, E.Business, Social Network
Diversification of Data Structuration
o Unstrutured, semi-structured, Structured data
Distribution of data through multiple and distinct data sources
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
2
BIG DATA RETRIEVING
From 4-V to 5-V in Big Data: Visualisation
o Retrieving, querying Big Data
Efficiency : Speed of Process
Objectives
Effectiveness: Answers Quality
Plethoric Answers Problem:
Big data
Big answers set
Empty Answer Problem:
Big data
Empty answers set
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
3
CONTEXT AND PROBLEMATIC
Context
Structuration : Semantic Data
•
Data Format : RDFS, OWL, N3,…
•
Physical represenation Storage : Triplet or Vertical, Horizontal and Binaire .
•
Query language : SQL, SPARQL and Hybrid Language
Problematic
Empty Answers Set: Return Alternative Answers
L1 : Lack of relaxation control
→ O1 : Definition of relaxation operators with control parameters
L2 : Instance-independent ranking → O2 : Our ranking function depends both on instances and queries
L3 : Integration in query language → O3 : A SPARQL extension implemented on top of Jena
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
4
CONTRIBUTIONS
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
5
Contributions: Relaxation Operators
Relaxation Operators
Based on Relation between Data
•
Order Relation (Order in Integer Set)
•
Conceptual relation (Generalization)
Similarity between query
•
Based on value distance
•
Based on Conceptual/Structural distance
𝑆𝑖𝑚 𝑄, 𝑄 ′ =
2∗𝑑𝑒𝑝𝑡ℎ(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝑑𝑒𝑝𝑡ℎ 𝐶1 +𝑑𝑒𝑝𝑡ℎ(𝐶2 )
𝑆𝑖𝑚 𝑄, 𝑄 ′ =
𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝐼𝐶 𝐶1 +𝐼𝐶 𝐶2 −𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
(Distand-Based) [Huang08]
(Content-Based) [Jean13]
Operators Proposed
•
Clause de Relaxation: APPROX(OP, TopK)
•
Relaxation de prédicat : PRED(Q, Prop, epsilon)
Select ?Title
Where {(?movie rdf:type Drama).
(?movie mo:Title ?Title).
•
Généralisation: GEN(Q, C, level)
(?movie mo:start 4)}
•
Substitution: SIB(Q,C,[C1, C2,…, Cn])
•
Agrégation of operators : AND
APPROX { GEN (Drama, 1) AND (PRED (Start, δ)}
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
6
Contributions: Data Distance
Data Distance
𝑆𝑖𝑚 𝑄, 𝑄 ′ =
2∗𝑑𝑒𝑝𝑡ℎ(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝑑𝑒𝑝𝑡ℎ 𝐶1 +𝑑𝑒𝑝𝑡ℎ(𝐶2 )
𝑆𝑖𝑚 𝑄, 𝑄 ′ =
𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝐼𝐶 𝐶1 +𝐼𝐶 𝐶2 −𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
(Distand-Based) [Huang08]
(Content-Based) [Jean13] où 𝐶2 dans 𝑄 ′ relaxe 𝐶1 dans 𝑄, 𝐼𝐶 is the function of the
information content of a class, and 𝐿𝐶𝐴 is the most nearest common ancestor of class 𝐶1 and 𝐶2 (Less common Ancestor)
𝑆𝑎𝑡𝑄 𝑡𝑖 = min(𝑆𝑖𝑚 𝑄, 𝑄 ′ , 𝑆𝑎𝑡𝑄 ′ 𝑡𝑖 ) whete 𝑡𝑖 is a database instance tuple and Q’ is a relaxed variant of Q
Levenstein_Distance: Mathematic distance for measuring similarity between two string
Ranking Relaxed Queries and alterntives answers
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
7
Contributions: Relaxation Strategies
Relaxation Strategies
Automatic Relaxation
•
•
Base on the similarity and the distance
Finding all relaxed queries more similar than the original query
• Find the nearest answers to the abstract model answer wanted
Using MFS (Minimal Failing Subqueries)
•
•
Query as conjunction of criteria
Finding all the minimal conjunction of criteria which return an empty answers set
Using XSS ( maXimal Success Subqueries)
•
•
Query as conjunction of criteria
Finding all the maximal conjunction of criteria which not return an empty answers set
Interactive Relaxation
•
•
•
User based strategy
Return advice for refining query or most similar answers
Ask the queries refined
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
8
Perspectives
Performance
Optimization of the relaxation process by using the database statistics to find
the optimal step of relaxation: Selectivity
Multiple-query optimization by using the similarity between the original query
and the relaxed queries
Optimization of the relaxation process to quickly find a set of alternative
answers
User-aware relaxation process
Leveraging user profiles/preferences to customize the relaxation process
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
9
Publications and References
Géraud FOKOU, Un Framework pour la relaxation des requêtes dans les bases de données du Web Sémantique, Actes VII ièmes
Forum Jeunes Chercheurs, XXXII ièmes Congrès INFORSID 2014 (FJC-INFORSID 2014)
Géraud FOKOU, Stéphane JEAN, Allel HADJALI, Endowing Semantic Query Languages with Advanced Relaxation Capabilities,
Proceedings of the 21st International Symposium on Methodologies for Intelligent Systems (ISMIS 2014), 2014
Stéphane JEAN, Allel HADJALI, Ammar M., Towards a Cooperative Query Language for Semantic Web Database Queries,On the
Move to Meaningful Internet Systems : OTM 2013 Conferences, Springer Berlin Heidelberg, September
Corby O., Dieng-Kuntz R., Faron-Zucker C., Gandon F. L., Searching the Semantic Web : Approximate Query Processing Based on
Ontologies , IEEE Intelligent Systems, 2006.
Godfrey P., Minimization in cooperative response to failing database queries, IJCIS, 1997.
Hogan A., Mellotte M., Powell G., Stampouli D., Towards Fuzzy Query-Relaxation for RDF, ESWC’12, 2012.
Huang H., Liu C., Zhou X., Approximating query answering on RDF databases, Journal of World Wide Web, 2012.
Hurtado C. A., Poulovassilis A., Wood P. T., Query Relaxation in RDF, JODS, 2008.
Poulovassilis A., Wood P. T., Combining Approximation and Relaxation in Semantic Web Path Queries, Proceedings of the 9th
International Semantic Web Conference (ISWC’10), 2010.
Hai Huang, Chengfei Liu, and Xiaofang Zhou. Approximating query answering on rdf databases. World Wide Web, January 2012.
Islam M. S., Liu C., Zhou R., On Modeling Query Refinement by Capturing User Intent Through Feedback, Proceedings of the TwentyThird Australasian Database Conference - Volume 124, ADC ’12, Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 2012.
Jannach D., Finding Preferred Query Relaxations in Content-Based Recommenders , Intelligent Techniques and Tools for Novel System
Architectures, vol. 109, Springer Berlin, Heidelberg, p. 81-97, September.
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
10
Thank you for your attention …
Web site : http://www.lias-lab.fr
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
11