4 BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Download Report

Transcript 4 BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Cooperative Answering Systems in Big Data
Géraud FOKOU, Stéphane JEAN, Allel HADJALI
LIAS/ENSMA-University of Poitiers, FRANCE
BIG DATA – 2014, Chasseneuil, France
BIG DATA CONTEXT
 Increase of Data Production
o Sensoring Data, E.Business, Social Network
 Diversification of Data Structuration
o Unstrutured, semi-structured, Structured data
 Distribution of data through multiple and distinct data sources
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
2
BIG DATA RETRIEVING
 From 4-V to 5-V in Big Data: Visualisation
o Retrieving, querying Big Data
Efficiency : Speed of Process
Objectives
Effectiveness: Answers Quality
Plethoric Answers Problem:
Big data
Big answers set
Empty Answer Problem:
Big data
Empty answers set
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
3
CONTEXT AND PROBLEMATIC
 Context
 Structuration : Semantic Data
•
Data Format : RDFS, OWL, N3,…
•
Physical represenation Storage : Triplet or Vertical, Horizontal and Binaire .
•
Query language : SQL, SPARQL and Hybrid Language
 Problematic
 Empty Answers Set: Return Alternative Answers
L1 : Lack of relaxation control
→ O1 : Definition of relaxation operators with control parameters
L2 : Instance-independent ranking → O2 : Our ranking function depends both on instances and queries
L3 : Integration in query language → O3 : A SPARQL extension implemented on top of Jena
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
4
CONTRIBUTIONS
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
5
Contributions: Relaxation Operators
 Relaxation Operators
 Based on Relation between Data
•
Order Relation (Order in Integer Set)
•
Conceptual relation (Generalization)
 Similarity between query
•
Based on value distance
•
Based on Conceptual/Structural distance

𝑆𝑖𝑚 𝑄, 𝑄 ′ =
2∗𝑑𝑒𝑝𝑡ℎ(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝑑𝑒𝑝𝑡ℎ 𝐶1 +𝑑𝑒𝑝𝑡ℎ(𝐶2 )

𝑆𝑖𝑚 𝑄, 𝑄 ′ =
𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝐼𝐶 𝐶1 +𝐼𝐶 𝐶2 −𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
(Distand-Based) [Huang08]
(Content-Based) [Jean13]
 Operators Proposed
•
Clause de Relaxation: APPROX(OP, TopK)
•
Relaxation de prédicat : PRED(Q, Prop, epsilon)
Select ?Title
Where {(?movie rdf:type Drama).
(?movie mo:Title ?Title).
•
Généralisation: GEN(Q, C, level)
(?movie mo:start 4)}
•
Substitution: SIB(Q,C,[C1, C2,…, Cn])
•
Agrégation of operators : AND
APPROX { GEN (Drama, 1) AND (PRED (Start, δ)}
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
6
Contributions: Data Distance
 Data Distance

𝑆𝑖𝑚 𝑄, 𝑄 ′ =
2∗𝑑𝑒𝑝𝑡ℎ(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝑑𝑒𝑝𝑡ℎ 𝐶1 +𝑑𝑒𝑝𝑡ℎ(𝐶2 )

𝑆𝑖𝑚 𝑄, 𝑄 ′ =
𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
𝐼𝐶 𝐶1 +𝐼𝐶 𝐶2 −𝐼𝐶(𝐿𝐶𝐴 𝐶1 ,𝐶2 )
(Distand-Based) [Huang08]
(Content-Based) [Jean13] où 𝐶2 dans 𝑄 ′ relaxe 𝐶1 dans 𝑄, 𝐼𝐶 is the function of the
information content of a class, and 𝐿𝐶𝐴 is the most nearest common ancestor of class 𝐶1 and 𝐶2 (Less common Ancestor)

𝑆𝑎𝑡𝑄 𝑡𝑖 = min(𝑆𝑖𝑚 𝑄, 𝑄 ′ , 𝑆𝑎𝑡𝑄 ′ 𝑡𝑖 ) whete 𝑡𝑖 is a database instance tuple and Q’ is a relaxed variant of Q

Levenstein_Distance: Mathematic distance for measuring similarity between two string
Ranking Relaxed Queries and alterntives answers
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
7
Contributions: Relaxation Strategies
 Relaxation Strategies
 Automatic Relaxation
•
•
Base on the similarity and the distance
Finding all relaxed queries more similar than the original query
• Find the nearest answers to the abstract model answer wanted
 Using MFS (Minimal Failing Subqueries)
•
•
Query as conjunction of criteria
Finding all the minimal conjunction of criteria which return an empty answers set
 Using XSS ( maXimal Success Subqueries)
•
•
Query as conjunction of criteria
Finding all the maximal conjunction of criteria which not return an empty answers set
 Interactive Relaxation
•
•
•
User based strategy
Return advice for refining query or most similar answers
Ask the queries refined
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
8
Perspectives
 Performance
 Optimization of the relaxation process by using the database statistics to find
the optimal step of relaxation: Selectivity
 Multiple-query optimization by using the similarity between the original query
and the relaxed queries
 Optimization of the relaxation process to quickly find a set of alternative
answers
 User-aware relaxation process
 Leveraging user profiles/preferences to customize the relaxation process
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
9
Publications and References

Géraud FOKOU, Un Framework pour la relaxation des requêtes dans les bases de données du Web Sémantique, Actes VII ièmes
Forum Jeunes Chercheurs, XXXII ièmes Congrès INFORSID 2014 (FJC-INFORSID 2014)

Géraud FOKOU, Stéphane JEAN, Allel HADJALI, Endowing Semantic Query Languages with Advanced Relaxation Capabilities,
Proceedings of the 21st International Symposium on Methodologies for Intelligent Systems (ISMIS 2014), 2014

Stéphane JEAN, Allel HADJALI, Ammar M., Towards a Cooperative Query Language for Semantic Web Database Queries,On the
Move to Meaningful Internet Systems : OTM 2013 Conferences, Springer Berlin Heidelberg, September

Corby O., Dieng-Kuntz R., Faron-Zucker C., Gandon F. L., Searching the Semantic Web : Approximate Query Processing Based on
Ontologies , IEEE Intelligent Systems, 2006.

Godfrey P., Minimization in cooperative response to failing database queries, IJCIS, 1997.

Hogan A., Mellotte M., Powell G., Stampouli D., Towards Fuzzy Query-Relaxation for RDF, ESWC’12, 2012.

Huang H., Liu C., Zhou X., Approximating query answering on RDF databases, Journal of World Wide Web, 2012.

Hurtado C. A., Poulovassilis A., Wood P. T., Query Relaxation in RDF, JODS, 2008.

Poulovassilis A., Wood P. T., Combining Approximation and Relaxation in Semantic Web Path Queries, Proceedings of the 9th
International Semantic Web Conference (ISWC’10), 2010.

Hai Huang, Chengfei Liu, and Xiaofang Zhou. Approximating query answering on rdf databases. World Wide Web, January 2012.

Islam M. S., Liu C., Zhou R., On Modeling Query Refinement by Capturing User Intent Through Feedback, Proceedings of the TwentyThird Australasian Database Conference - Volume 124, ADC ’12, Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 2012.

Jannach D., Finding Preferred Query Relaxations in Content-Based Recommenders , Intelligent Techniques and Tools for Novel System
Architectures, vol. 109, Springer Berlin, Heidelberg, p. 81-97, September.
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
10
Thank you for your attention …
Web site : http://www.lias-lab.fr
BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
11