LDBC: Benchmarking Graph Data Management Systems

Download Report

Transcript LDBC: Benchmarking Graph Data Management Systems

LDBC: Benchmarking Graph
Data Management Systems
www.cwi.nl/~boncz/graphta.ppt
Peter Boncz
Why Benchmarking?
• make
competing
products
comparable
© Jim Gray, 2005
www.cwi.nl/~boncz/graphta.ppt
• accelerate
progress, make
technology
viable
What is the LDBC?
Linked Data Benchmark Council = LDBC
• Industry entity similar to TPC (www.tpc.org)
• Focusing on graph and RDF store benchmarking
Kick-started by an EU project
• Runs from September 2012 – March 2015
• 9 project partners:
www.cwi.nl/~boncz/graphta.ppt
SNB: Social Network Benchmark
www.cwi.nl/~boncz/graphta.ppt
Data Correlations between attributes
SELECT personID from person
WHERE firstName = ‘Joachim’ AND addressCountry = ‘Germany’
Anti-Correlation
SELECT personID from person
WHERE firstName = ‘Cesare’
AND addressCountry = ‘Italy’
 Query optimizers may underestimate or overestimate the result size of
conjunctive predicates
Joachim
Cesare Loew
Cesare Prandelli
Joachim
Compact Correlated Property Value Generation
Using geometric distribution for function F()
Correlated Edge Generation
“1990”
<Britney Spears>
Student
P4
“University of
Leipzig”
“Anna”
P5
P1
“Laura”
“Germany”
“1990”
<Britney
Spears>
P3
P2
“University of
Leipzig”
“University of
Leipzig”
“University of
Amsterdam”
“1990” “Netherlands”
www.cwi.nl/~boncz/graphta.ppt
How to Generate a Correlated Graph?
<Britney Spears>
• Compute
similarity
of two nodes based
P4
Student
on their (correlated)
properties.
“Anna”
• Use a probability density function wrt to
this similarity
nodes
P5 for connecting“Germany”
“1990”
“University of
Leipzig”
“Laura”
P1
connection
probability
<Britney Spears>
P3
“1990”
“University of
highly similar  less similarLeipzig”
P2
“University
Danger: this is
of Leipzig”
very expensive to compute “University
on a largeof
Amsterdam”
graph! (quadratic,“1990”
random access)
“Netherlands”
www.cwi.nl/~boncz/graphta.ppt
Window Optimization
<Britney Spears>
P4
Trick: disregard
nodes with
too large similarity distance
Student
(only connect nodes in a similarity window)
“Anna”
“1990”
“University of
Leipzig”
Window
P5
“Germany”
P1
“Laura”
connection
probability
<Britney Spears>
P3
“1990”
“University of
Leipzig”
highly similar  less similar
P2
“University that two nodes are connected is skewed w.r.t
Probability
“University of
of
Leipzig”
the similarity between
the nodes (due to probability
Amsterdam”distr.)
“1990”
“Netherlands”
www.cwi.nl/~boncz/graphta.ppt
Workloads by system
System
Graph databases
Graph programming
frameworks
RDF databases
Relational databases
NoSQL Key-value
NoSQL MapReduce
Interactive
Business Intelligence
Graph Analytics
Yes
Yes
Maybe
-
Yes
Yes
Yes
Yes
-
Yes
Yes
Maybe, by keeping
state in temporary
tables, and using the
functional features of
PL-SQL
Maybe
Maybe
-
-
Maybe
Yes
www.cwi.nl/~boncz/graphta.ppt
Plans For 2014
• Finishing Interactive workload
– updates (transactional)
– substitution parameters
• New BI and Graph Analytical Workloads
• Data Generator Improvements
– improve dictionaries and distributions for BI
– Scale factors and dataset (SN graph) validation
• Query Drivers
– Parallel update generator
• Auditing Rules for SNB
www.cwi.nl/~boncz/graphta.ppt
Pointers
• Code&Queries: github.com/ldbc
– ldbc_socialnet_bm
• ldbc_socialnet_dbgen
• ldbc_socialnet_qgen
• Wiki: ldbc.eu:8090/display/TUC
– Background & Discussions + Detailed report:
ldbc.eu:8090/download/attachments/4325436/LDBC_SNB_
Report_Nov2013.pdf
• LDBC Technical User Community (TUC) meeting:
– Thursday April 3, CWI Amsterdam (see wiki – next week)
www.cwi.nl/~boncz/graphta.ppt