Queries and Data Models for Prediction and Measurement In Remos
Download
Report
Transcript Queries and Data Models for Prediction and Measurement In Remos
LDAP Query Access:
Challenges and Opportunities
Beth Plale, Georgia Tech
with
Peter Dinda, Northwestern
Part of GIS Task Force on Relational Data Models
Goals of Talk
Pose problem:
Pose possible solutions:
query interface could be limiting factor in
directory server useability and
performance.
Extensions to LDAP query language
SQL query processing front-end
Adopt relational model as information
service data model
Stimulate discussion with questions
Talk topics
Yes
Data models (e.g.,
hierarchical,
relational, objectoriented)
Query languages
No
Schemas
Communication
protocols
Interchange formats
Message-passing
layers
Event-based services
Establishing a Common
Terminology
LDAP: protocol or data model?
Difference between schemas and data
models
Difference between hierarchical,
relational, and object-oriented data
models
LDAP: Protocol or Directory?
• LDAP v2: “provide access to X.500 directory” (RFC
1777). (i.e., LDAP is gateway to X.500 directory)
LDAP
client
TCP/IP
LDAP
server
OSI
X.500
server
directory
• LDAP v3: “provide access to directories supporting
X.500 model” (RFC 2251) (i.e., LDAP can implement
directory itself)
LDAP
client
TCP/IP
LDAP
server
directory
Schema versus data model
Data model
Describes entities, structure, relationships
e.g., relations, tuples, attributes, domains
Schema
Description of structure of data in a
particular database
e.g., creates the tables, defines the
attributes and specifies domains for a given
application
Hierarchical, relational, or objectoriented data model?
Hierarchical – tree structure; child has
only one parent; partitions easily; tree often
directly reflected in physical storage. Query
language low-level and procedural.
alias
foreign key
compositional
hierarchy
Relational – set of tables; query language
(SQL) efficient, well-founded, and declarative.
Doesn’t handle complex data types well; flat
organization not always natural.
Object-oriented – enhanced conceptualization;
Handles complex data types; SQL-like interface;
query language inefficient; no standard exists;
no formal model
Object-relational – adopted OO features into relational
Problem
Existing LDAP query access interface
is inadequate for typical types of
queries posed by users of grid
information service.
Example Queries
“Where can I find load measurement stream
for host “kanga?””
source:tcp:kanga:5000,
source:udp:239.99.99.99:5000
“Need 1 to 4 machines, all same OS and
arch, with combined memory of 1 GB”
(mojave),(sahara),((poconos,pyramid,foo),
(manch1,2,3,4), etc)
Relational Database Schema
normalized
hosts
IP name
hostdata
IP numproc mhz arch os osv mem vmem dasd loc user note UR
modules
MID mt dsid IP note
moduleexecs
mt arch os minosv ver
name
note
endpoints
MID EPID
endpointdata
EPID IP protocol port datatype
datasources
dsid dst
UR
Hierarchical Schema
ou=grid1
host class
moduleexecs
hostdata
modules
alias
endpoints
datasources
endpointdata
Relational Query 2: I need 2 machines having total
memory between 512 and 1024 bytes
SELECT host1.name, hd1.arch, hd1.os, host2.name, hd2.arch, hd2.os,
hd1.mem + hd2.mem as TotalMem
FROM hosts as h1, hostdata as hd1, hosts as h2, hostdata as hd2
WHERE
host1.ip = hd1.ip and host2.ip = hd2.ip and
host1.ip != host2.ip and
hd1.mem + hd2.mem > 512 and hd1.mem + hd2.mem < 1024
+-----------+-------+-------+-----------+-------+-------+----------+
| name
| arch | os
| name
| arch | os
| TotalMem |
+-----------+-------+-------+-----------+-------+-------+----------+
| poconos. | ALPHA | DUX
| innuendo. | I386 | LINUX |
640.00 |
| poconos. | ALPHA | DUX
| pyramid. | ALPHA | DUX
|
640.00 |
| innuendo. | I386 | LINUX | poconos. | ALPHA | DUX
|
640.00 |
| pyramid. | ALPHA | DUX
| poconos. | ALPHA | DUX
|
640.00 |
| poconos. | ALPHA | DUX
| firenze. | I386 | LINUX |
640.00 |
+-----------+-------+-------+-----------+-------+-------+----------+
Hierarchical Version
Lacking aliasing to
dynamically define
logical relationships.
Base
#define SEARCHBASE “ad=Grid1”
LDAP * ld, LDAPMessage * res;
Scope
Main {
ldap_search_s(ld, SEARCHBASE,
LDAP_SCOPE_SUBTREE,
Search filter
“hostdata.name = *”,
“”hostdata.name”, “hostdata.arch”, “hostdata.os”,
Lacking aggregate
“hostdata.mem””,
operator to
Return attributes
0, &res);
perform functions
…
over data before it
/* results processed using */
is returned
ldap_first_entry(),
ldap_next_entry(),
+-----------+-------+-------+--------+
ldap_first_attribute(),
| name
| arch | os
| Memory |
etc.
+-----------+-------+-------+--------+
| poconos. | ALPHA | DUX
|
256 |
}
| innuendo. | I386 | LINUX | 2048 |
Low-level
| pyramid. | ALPHA | DUX
|
256 |
results
| firenze. | ALPHA | DUX
|
512 |
processing
+-----------+-------+-------+--------+
dc=att, dc=com
LDAP query
access
limitations
dc=research dc=products
objectClass=orgUnit
dc=services
surName=jagadish
surName=jagadish
A. Use of different base entries
(-(dc=att, dc=com ? Sub ? surName=jagadish)
(dc=research, dc=att, cd=com ? Sub ? surName=jagadish))
Query: “Locate directory entries whose surname is Jagadish in AT&T except those
in research.”
B. Selecting parents and children
(c(dc=att, dc=com ? Sub ? objectClass=orgUnit)
(dc=att, cd=com ? Sub ? surName=jagadish))
Query returns each entry that satisfies objectClass=orgUnit and
has at least one child entry that satisfies surName=jagadish.
Relational Version of Query: Where can I find a load
measurement stream for host ‘kanga’
SELECT ed.protocol, h.name, ed.port, m.name
FROM host as h, module as m, endpoint as e, endpointdata as ed
WHERE
h.name = “kanga” and
ed.datatype = LOAD_MEASUREMENT and
h.IP = m.IP and
m.MID = e.MID and e.EPID = ed.EPID
Search all endpoints for all running modules on host
kanga to find endpoints containing data type
LOAD_MEASUREMENT.
Returns -> tcp:kanga:5000:resource_module
Hierarchical Version
Explicit start point in
search space: more
encompassing queries
obtained by starting
higher in tree, expense of
costlier queries.
#define SEARCHBASE “ad=Grid1”
LDAP * ld, LDAPMessage * res;
Main {
ld = ldap_open();
ldap_simple_bind_s(ld, user, Passwd);
ldap_search_s(ld, SEARCHBASE,
LDAP_SCOPE_SUBTREE,
“modules.hostdata.name = “kanga” &
modules.endpoints.endpointdata = LOAD_MEASUREMENT”,
“”modules.endpoints.endpointdata.protocol”,
“modules.hostdata.name”,
“modules.endpoints.endpointdata.port”,
“modules.name””,
0, &res);
Explicit path traversal to walk
…}
aliases: requires users know
structural detail; difficult to
write accurate queries.
LDAP query access limitations; summary
LDAP limitation
Impact
Relational data
opportunity
No queries selecting
parents and children
User generates
multiple queries, joins
results
Supported implicitly by
flat tables
No complex queries
using different base
addresses
Can’t cross admin
domains. User
generates multiple
queries.
Distributed relational
database? Front-end
interface?
Need explicit path
knowledge to traverse
aliases
Low-level for user
Removed by flat tables
No floating point
support
No aggregate selection
supported
Imposes low-level
processing on user
supported
Solutions
Query access language extensions
Adopt relational data model
Database community looking at extensions to
LDAP query language. May be possible to
influence or adopt.
Relational data model enables efficient query
access. Expressive language. Prototype exists as
part of RPS.
Embed converter in data stream exported by
directory server
dQUOB evaluates SQL-style queries over
streaming data; may be part of a solution.
Discussion
Hierarchical model superior for partitioned
data space.
Queries across partitions likely?
If so, LDAP referrals using server chaining or
front-end interface.
What types of queries are likely?
What’s the metric?
Minimize number of accesses to server?
More expressible queries?
Floating point support