Registry Replication
Download
Report
Transcript Registry Replication
SF, RB, JW
Registry Replication
• Registry calls are forwarded by a registry Service to a single registry
instance (i.e. replica) per VDB.
• If a replica cannot be contacted or a better alternative becomes
available, the registry service will switch (not the
producer/consumer).
• Information in the old replica (whether it’s still working, or fails and
recovers) will go stale which causes two problems:
– It may send erroneous removeProducer messages to consumers
(because it’s no longer receiving showResourceSignOfLife messages
for the producer)
– It may replicate out-of-date entries to other replicas, reinserting
producer/consumer entries that have actually been deleted elsewhere
• The first can be solved by treating addProducer/removeProducer
messages as notifications to the consumer: the consumer must
decide for itself whether or not to act on them.
• The second can be solved by adding sequence numbers generated
by the producers and consumers to each registry update call.
JW, SH
Schema Replication
• So table names can be re-used, use table numbers
instead of table names in system calls where mutual
agreement on table definition is required. User only sees
names (except in one or two direct calls to the registry
and schema). Replace SELECT strings with parsed
select statements.
• Numbers assigned using the “majority voting” algorithm
outlined before.
• Producer stores table definitions for each table number it
is publishing (to check inserts) so immune to schema
changes.
• (would like to use registry replication mechanism for
schema too, but still not clear how to handle recovering
schemas and how to ensure deletions are propagated to
all replicas)
SF, AW
External tuple stores
• User will be able to specify a logical name for the tuple
store when they create a producer.
• In secure installations, it will be prefixed with user’s DN,
so only has to be unique to them; in insecure systems
(single namespace?)???
• Stores can only be re-used by the user that created
them, and only if the predicate (and current table
definition) matches; only one producer per store.
• User can query R-GMA for real name of database (up to
sysadmin whether or not they are then granted access)
• Still need some way of cleaning up unwanted stores.
• Not too many new operations please! (JW)
JW, SH
Chunking
• Consumer service sends execute() and abort() calls to
producer for all types of query (so these replace
startStreaming() and stopStreaming() too).
• One-time and continuous queries both stream results
back from producer service to consumer service (how it
does this is a design issue – it doesn’t affect the
interface).
• For ODP’s I suggest the user code must implement
start(), pop(maxCount) and abort() and the ODP service
will call these to retrieve the tuples in chunks and stream
them to the consumer.
SH, MC
Security
• See chapter 10 in the current spec.
MC, AW, JW, RB
Mediation 1/2
• Move mediation into a separate module in the Consumer
Service for now (i.e. remove the getPlansForQuery call
from the registry, amend getProducersForQuery to return
whatever registry details the mediator will require.
Change registerContinuousQuery to just register and not
return plans).
• Can we make decisions now on changes to current spec
(ch 9)?
– Changes to definition of producer predicate (see next slide)
– Changes to definition of simple/complex query (just add union?)
– Whether or not Secondary Producers are picked for continuous
queries (no?)
– Any changes to producer-selection rules for query types/query
plans? (no?)
– Any changes to when warnings are applied? (no?)
MC, AW, JW, RB
Mediation 2/2 (producer predicates)
WHERE (col1 op11 value11 AND col2 op21 value21 AND ...) OR
(col1 op12 value12 AND col2 op22 value 22 AND ...) OR ...
• 'op' may be any one of =, >, >=, < and <= (as in a simple query);
• OR is allowed provided the query is translated into the above form
(disjunctive normal form)
• String ranges are also allowed (e.g. column > 'A')
• They also have proposals to allow ‘column IN (value1, value2, ... )’,
however they estimate the registry database would be twice as
complex to support this. I don't think it's worth it at this stage. In any
case I don't see why it couldn't be translated into a statement
involving ORs which can be stored in the existing proposed
database (MC).
SH, MC
Run-time monitoring / config.
• Don’t know yet.
SF, AW
Time-stamps
• R-GMA TIMESTAMP type represented as ISO8601-compatible
string: YYYY-MM-DDTHH:mm:ss.sZ where “.s” means 0-9 decimal
places, on INSERT and in a result set.
• Always UTC and no abbreviations allowed.
• Declared with precision e.g. TIMESTAMP(5) in CREATE TABLE and
stored in schema. Default precision is zero.
• Producer database MUST support at least the requested precision
or fail the declareTable.
• The requested precision must be stored in the registry so that
mediation can ensure that a secondary producer does not lose
precision. If a secondary producer precision is less than one of the
primary producers from which it would otherwise consume it will just
ignore this primary producer. (sounds like bunk to me - can’t the
Secondary Producer can get the precision from the schema and fail
the declareTable just like a primary producer? JW)
• People should not request more precision than they need.
SF, AW
Data integrity 1/2
• Q: I think R-GMA must make some statement about preservation of
data values through R-GMA for each data type, e.g. string/integer
values are guaranteed not to change, timestamps/real/floats might
degrade by up to ... (or "best endeavours").
• A: best endeavours (hmmm JW)
• Q: But... do we remove quotes from strings (I don't think we quote
strings in the XML ResultSet) and do we resolve embedded quotes
that have been doubled up?
• A: we should do them correctly i.e. escape or double quotes as
needed and then remove them again to ensure that they are
transmitted unchanged
SF, AW
Data integrity 2/2
• Q: How do we represent floating point numbers in XML result sets
(i.e. do we ever use scientific notation etc...)
• A: Use the normal 1.234E-05 etc - i.e. E an optional minus sign and
up to 3 digits. No spaces inside the number.
• Q: How do we represent values that don't fit into one of our types - I
would suggest use a VARCHAR.
• A: This can only happen for derived types where the database type
has no mapping onto an R-GMA type. I guess the string
representation is the best we can do.
RB, SH
Popping data from Consumer
• Keep hasAborted() as consumer operation.
• Put end-of-result-set flag on result set.
• Deprecate isExecuting() (and count() and
popAll()??)
• To check if result is complete, call hasAborted()
after the pop() loop.
• Another thought (JW) – should we add an API
method to all APIs to make it easy to glue result
sets together?
JW, AD
Error Numbers
• No progress.
AD, MC
Naming virtual databases
• Don’t know yet…
AW, AD
A couple more questions
• Don’t know yet…