Transcript EMBL-EBI
EMBL-EBI
Pragmatics
Goal of pragmatics group
EMBL-EBI
• Expose existing practice of
databases and discuss it in the
context of database methods.
EMBL-EBI
Four domain overviews
• Engineering – the maintenance of list of
components with different versions
used in electronic engineering products
• Astronomy – data from sky surveys at
different wavelengths and catalogues of
objects
• Environmental simulation – data
generated by simulations of the marine
environment in a river mouth
• Bioinformatics – shared data collections
of biomolecular information.
EMBL-EBI
Engineering
Astronomy
Simulation
Bioinforma
tics
Key driver
for
provenance
Ensuring
correctness
Evoking
scientific
trust
Ensuring
comparabilty of
data, liability
issues
Evoking
trust, data
quality,
attribution
Key issues
Recording
“used in”
information
to allow
updates
Data
transformatio
n from raw
data
Tweaking of
details of
simulation
creates
incompatibilitie
s
Assertions
based on
whole
database
comparison
s
EMBL-EBI
Motivations for capturing provenance
information
•
•
•
•
•
•
•
Quality/Trust
Attribution
Priority
Interpretation
Interoperability
Roll-back
Emphasis differed from discipline
to discipline.
EMBL-EBI
Cost/Benefit/Risk (1)
• Cheap, reproducible data carry little
risk, can be redetermined, less
pressure to record provenance
• High-aggregation provenance (e.g.,
information derived from comparison
with large, changing databases)
expensive, suppliers give up on
provenance.
• Expensive data (e.g., sky-surveys,
macromolecular structures) create
pressureis cheap by comparison with
recollecting the data.
EMBL-EBI
Cost/Benefit/Risk (2)
• Unique event data (e.g., cosmic
events) cannot be verified after the
fact, provenance important
• Life-critical data – (e.g., drug
ingredient pedigree, aircraft
component versions) high risk we
are prepared to tolerate cost in
capturing the information.
Reducing the cost of provenance
information
EMBL-EBI
• Only create provenance
information information when it’s
essential or easy?
Obvious
EMBL-EBI
• Archive rather than serve
provenance
• Automate provenance collection
EMBL-EBI
…and the non-obvious
• Can the task of the provenance of
a complex and ever-changing
database be turned into a
database engineering problem
rather than a domain problem?
• Could the very technology used to
maintain the database also
maintain the provenance
information? The domain
specialists look to the computer
scientists for help here.