Transcript ProvDay1Sum

Deep Thoughts after < 15 minutes
of reflection
Michael Franklin
Jim Frew
The Big Question: Why are we here?
Two Primary Answers:
 There is a sea change in the way that science
will be done.
– Processing on a world-wide ever increasing
collection of experimental data
– From primary collecting to “data mining”

2
We need to reason about, assemble, manage,
and avoid re-doing loosely coupled, longrunning, world wide computations.
Some Major Issues from Day 1




3
(re) creation vs. validation/explanation
Data vs. Process
Talmud vs. (Kosher) Sausage Factory
The “S” Word…
The “Bogus”(?) Distinction






4
Derivation
Provenance
Lineage
Annotation
Pedigree
(Re) creation vs. validation
– Is the DPLAP Executable?
Subject, Object, Verb?
5

Which should be our focus?
– Edges
– Boxes
– Both of the above

What flows along the edges?
– Data
– Control
– Events
– All of the above
Talmud vs. (Kosher) Sausage Factory


We seem to have two very different (perhaps
irreconcilable?) modes of operation.
1. e.g.,Bioinformatics Databases – Continual Accretion of
knowledge.
2. e.g., Physics Experiments – Composable
DAGs/GRAPHS/Pipelines leading towards
“data products”
Differences:
– Scale: Time, Resources, …
– Human involvement vs. automation

6
How desirable, necessary is each?
The Great Thing About Standards is…
Core + Extensibility Framework:
 Keys for Objects (i.e., Identifiers)
 Granularity
 Equivalence and Similarity
– Are these application/scientist-dependent?
 Common Terminology/Data Integration
 Versioning and Schema Evolution
 Need to Choose:
– Declarative vs. Procedural
– Strongly Typed?
7
What else?
8