Graph Databases (GDB)
Download
Report
Transcript Graph Databases (GDB)
Graph Databases (GDB)
Adrian Silvescu
Doina Caragea
Anna Atramentov
Problems And Motivations
•
The necessity to represent, store and manipulate
complex data make RDBMS somewhat obsolete
•
[P1] Problem 1: Violations of the 1NF
–
–
–
•
Multi-valued attributes
Complex attributes
Complex combination of the previous two
[P2] Problem 2 : Accommodate Changes
–
–
Appears when acquiring data from autonomous
dynamic sources or Web (eg: genexp & restaurants ).
RDBMS may require schema renormalization
Problems and Motivations (contd)
• [P3] Problem 3: Unified representation for:
–
–
–
–
Data
Knowledge (Schemas are a subset of this)
Queries (More generally: Goals) [results+def]
Models (Concepts are a particular example)
• In order to facilitate the application of
learning and reasoning methods on these
structures
The Big Picture
Discovery & Modelli ng
Information
Exploitation
USER
Query = Goal Specification
Information
AVAILABLE
INFORMA TION
IN THE WORLD
INFORMA TION INTEGRATION
Information
Integration
...
KNOWLEDGE
SOURCES
...
LINK
SOURCES
...
DATA
SOURCES
Existing Approaches
• RDBMS – may need schema renormalisation
• Approaches that try to fix the above mentioned
problems:
– OO Databases [P1], [P2] - graphs [but procedural]
– XML Databases [P1] (somewhat [P3]) – trees
– OORDBMS [P1] – graphs with foreign keys
• Others
– Datalog – More Efficient Crippled Prolog
– Network Models - graphs
– Hierarchical Models – trees
•
Therefore the motivation for Graph Databases
Outline
• Graph Databases
–
–
–
–
–
–
Examples
DDL
DML – Queries
DML – UPDATES
Informal Semantics
DB => GDB
• GDB vs. OO, XML, OR, …
• Conclusions and Further Work
Graph Databases
• We propose a new kind of Database: Graph
Databases (GDB) as a solution to Problems
[P1],[P2] and [P3].
• In order to define the GDB we will specify:
– The Data Definition Language (DDL)
– The Query Language (more generally DML)
– Informal Semantics of the above languages
• We will also show how to convert existing DBs (RDBMS)
into the GDB DDL to facilitate the transition to GDBs
Goals and Design Choice
• Goals
– Declarativity
– Change
• Design Choice : Have unique instance
identifiers vs. having foreign keys
– Close in Spirit to OO
– Will allow us to cope easier with Change
– Declarativity is an issue in OO, but not for GDB as we
will show
Database Representation
• Sailors(sid:integer, sname:char(10), rating: integer, age:real)
• Boats(bid:integer, bname:char(10), color:char(10))
• Reserve(sid:integer, bid:integer, day:date)
Sailors
Reserves
sid
sname
rating
age
22
dustin
7
45.0
31
lubber
8
55.5
58
rusty
10
35.0
Boats
sid
bid
day
22
101
10/10/96
58
103
bid
bname
color
101
Interlake
red
102
Clipper
green
103
Marine
red
11/12/96
sid
22
name
dustin
rating
7
Graph Representation
IOF
Sailor
s
IO
F
TB
ID1
age
45.0
L
IOF
31
name
lubbe
r
rating
8
IOF
IOF
Boat
s
Reserves
IOF
ID2
IOF
age
55.5
ID8
IOF
ID6
ID5
ID4
:
sid
:
:
ID3
ID7
:
…
Foreign Keys
ID1
sid
22
name
dustin
rating
7
age
45.0
Sailor
sid
22
day
10/10/96
ID4
bid
101
Boat
bid
ID6
bname
color
101
Interlake
red
Data Representation in the GDB DDL
Name1
Val
1
Name2
ID
Val
2
……
Name
N
Val
N
• ID:(Name1=Val1,…,NameN=ValN)
Examples:
ID1:(sid=22, name=“Dustin”, rating=7, age=45.0)
ID4:(sailor=ID1, day=“10/10/96”, boat=ID6)
ID6:(bid=101, bname=“Interlake”, color=“red”)
Defining New Concepts in GDB DDL– Grandson
_ID1
GrSon
_ID2
:-
Person
Person
Person
IOF
IOF
IOF
_ID1
Son
_ID1:(GrSon=_ID2) :_ID1:(IOF=“Person”,Son=_ID3),
_ID3:(IOF=“Person”, Son=_ID2),
_ID2:(IOF=“Person”).
_ID3
Son
GrSon
_ID2
[DML-QL] Writing simple queries:
•The names of all sailors who have reserved a red boat
_X
Sailors
_X
Boats
Name
IOF
Name
IOF
_ID
Boat
_ID1
_ID
:-
Color
Red
_ID:(Name = _X) :_ID:(IOF = Sailors, Boat = _ID1, Name = _X),
_ID1:(IOF = Boats, Color = Red).
Informal Semantics
• Three kinds of Definitions
– Facts:
• G1. [Extensional definition]
– Definitions:
• G1 :- G2. [Intensional definition]
• G1 :- PROC f(x1,…,xn). [Procedural definition]
• Queries = Graphs to be Matched = QG
– The same as a definition: Query :- QG.
Informal Semantics - Picture
Query
Query match
Facts
Extended Graph
RDBMS => GDB
• Sailors(sid:integer, sname:char(10), rating: integer, age:real)
• Boats(bid:integer, bname:char(10), color:char(10))
• Reserve(sid:integer, bid:integer, day:date)
Sailors
Reserves
sid
sname
rating
age
22
dustin
7
45.0
31
lubber
8
55.5
58
rusty
10
35.0
Boats
sid
bid
day
22
101
10/10/96
58
103
bid
bname
color
101
Interlake
red
102
Clipper
green
103
Marine
red
11/12/96
DML - Updates
• Inline Query
– _X : [ _ID: (IOF = Sailor, sname = lubber, rating = _X)]
• Updates: MODIFY (QryGraph, UpdList)
– Add to all GrandSons the money of the Grandparent as a
potential inheritance.
– MODIFY ( _ID : (GrSon = _ID1),
(=> NEWID:(IOF = POT_INHER, BENFICIARY = _ID1,
AMOUNT = _AMNT: [ _ID:(Money = _AMNT )])
))
– .
Change
Book
IOF
Title
Databases
Author
Ramakrishan
_ID
Author
Gehrke
Change
Gene_ex
IOF
_ID
value
0.7
IOF
x /
Exp
IOF
_ID1
_ID2
value
value
x1
x2
:
IOF
Aggregate
Operation
_IDn
value
xN
High Order Queries
Find all the fields from Tables that contain the name John.
_ID:(Name=_X) :_ID1:(IOF=Tables),
_ID2:(IOF=_ID1, _X=“John”).
GDB vs. OO, XML, OR, CG, …
• GDB are close in spirit to OO but not the same
(GDB : no encapsulation + more IDs).
• Close To Datalog but with IDs(links) vs foreign
keys
• The same for ORDBMs and somewhat XML
• Close to Conceptual Graphs But CG do not have
IDs
• We can also use foreign keys:
_ID:[
_ID(IOF = Sailors, sname = lubber)].
OO vs. GDB
OO
GDB
ID1
Class: Person
age: 42
name: john
car
Person
IOF
ID1
ID2
Class: Car
color: red
age
42
Car
name
John
IOF
car
ID2
color
red
Conclusions and Further work
• Advantages & Disadvantages of GDB
• Conclusions
– Proposed a new database model DBG that copes with
problems existing in previous approaches
– Showed how to link existing DB with GDB
– Showed advantages of GDB over existing database
approaches
• Further Work
– Use GDB for learning