Transcript ProvDay1Sum
Deep Thoughts after < 15 minutes
of reflection
Michael Franklin
Jim Frew
The Big Question: Why are we here?
Two Primary Answers:
There is a sea change in the way that science
will be done.
– Processing on a world-wide ever increasing
collection of experimental data
– From primary collecting to “data mining”
2
We need to reason about, assemble, manage,
and avoid re-doing loosely coupled, longrunning, world wide computations.
Some Major Issues from Day 1
3
(re) creation vs. validation/explanation
Data vs. Process
Talmud vs. (Kosher) Sausage Factory
The “S” Word…
The “Bogus”(?) Distinction
4
Derivation
Provenance
Lineage
Annotation
Pedigree
(Re) creation vs. validation
– Is the DPLAP Executable?
Subject, Object, Verb?
5
Which should be our focus?
– Edges
– Boxes
– Both of the above
What flows along the edges?
– Data
– Control
– Events
– All of the above
Talmud vs. (Kosher) Sausage Factory
We seem to have two very different (perhaps
irreconcilable?) modes of operation.
1. e.g.,Bioinformatics Databases – Continual Accretion of
knowledge.
2. e.g., Physics Experiments – Composable
DAGs/GRAPHS/Pipelines leading towards
“data products”
Differences:
– Scale: Time, Resources, …
– Human involvement vs. automation
6
How desirable, necessary is each?
The Great Thing About Standards is…
Core + Extensibility Framework:
Keys for Objects (i.e., Identifiers)
Granularity
Equivalence and Similarity
– Are these application/scientist-dependent?
Common Terminology/Data Integration
Versioning and Schema Evolution
Need to Choose:
– Declarative vs. Procedural
– Strongly Typed?
7
What else?
8