Transcript uncertainty

Uncertainty and Quality
UncertML, UncertWeb and Geoviqua
Dan Cornford, Matthew Williams
Computer Science, Aston University, Birmingham, United Kingdom,
OGC meeting, Bonn, Mar 2011
The key aspects of data quality
• Think about why we worry about data quality …
• There is not universal agreement on what aspects of data
quality are important but we might propose:
–
–
–
–
–
–
accuracy (uncertainty): value correctly represents the real world
completeness: degree of data coverage for a given region and time
consistency: are rules to which the data should conform met
usability: how easy is it to access and use the data
traceability: can one see how the results have arisen
utility: what is the user view of the data value to their use-case
• Uncertainty is key – need to define the “real world”
Accuracy and uncertainty
• I view reality as a set of continuous space-time fields
of discrete or continuous valued variables
– the variables represent different properties of the system,
e.g. temperature, land cover
– note of course there are other feature based examples
• A big challenge is that reality varies over almost all
space and time scales:
– we need to be precise about these when defining reality
• Providing uncertainty (accuracy) allows:
– combination of multiple data sources
– propagation through further processing
– decision making in a principled and understood framework
Uncertainty in geospatial data
• We know very little about reality with certainty
• Everything we know is derived from a sensor
– all sensors measure with some uncertainty (location /
support of measurement), value of measurement result
(electronics), sensor model (many sources), ...
• The result of a sensor measurement is a value (or a
series of values) – where is the uncertainty?
– the value is known, but is subject to various sources of
errors, i.e. it is uncertain w.r.t. reality (resultQuality)
• Other outputs, e.g. from models might be
intrinsically uncertain - result
How to represent uncertainty?
• Bayesian probability is the natural framework
– this leads us to think about everything as a random
quantity, which can be described by a probability
distribution – maybe a Dirac delta if you are certain
– sometimes we might not have a complete distribution and
we might get: samples or summary statistics
• Once we have a distribution we can use the tools of
probability theory to do inference etc
– knowing data has passed a QC is less useful – it has less
information
• Probability separates uncertainty from utility
(QA4EO also seeks this split)
Uncertainty in current systems
• Typically uncertainty about a (sensor) result is encoded in the
resultQuality which is of type DQ_Element (xml:anyType)
• This is fine, but has limited interoperability on an
implementation level
<om:Observation gml:id="I90579489_12412">
<om:samplingTime>
<gml:TimeInstant xsi:type="gml:TimeInstantType">
<gml:timePosition>2009-12-19T13:41:10</gml:timePosition>
</gml:TimeInstant>
</om:samplingTime>
Example
using
SWE Common
<om:procedure
xlink:href="urn:ogc:object:feature:Sensor:WU:I90579489"/>
<om:resultQuality>
<QuantityRange
definition="urn:ogc:def:property:OGC:tolerance2std">
<!– any XML type
-->
<value>-0.02 0.02</value>
</om:resultQuality>
</QuantityRange>
<om:observedProperty xlink:href="urn:ogc:def:phenomenon:OGC:temperature"/>
<om:featureOfInterest xlink:href=“http://www.mydomain.com/foi” />
<om:result uom="degC">0.6</om:result>
</om:Observation>
Quality and uncertainty?
• ISO19115 deals with metadata and (thus) quality
– provides a framework in DQ_QuantitativeResult
– the standard does not provide a dictionary for what
errorStatistics can go in here – interoperable?
– not clear that ISO19157 will address that
• What to do if it is my result that is uncertain?
– following processing this will almost always be the case
• UncertML is a vocabulary + conceptual model and
encoding for uncertain information
– provides support for distributions, statistics and samples
– aim to cover a very wide range of uses: SWE, SBML,
Semantic Web, the Web
UncertML 2.0
• UncertML 2.0 refines UncertML 1.0:
– removes dependencies making schema simpler and more usable
– clearly identifies the conceptual model, the controlled vocabulary and
the encodings (XML, JSON)
– provides an API for using UncertML, being developed
• Main reasons for further development are that uncertainty is
a cross cutting aspect (remove dependencies), and to
increase interoperability using hard typed design
• Within UncertWeb, UncertML is being used in O&M, NetCDF
and ISO19139 DQ_QuantitativeResult elements (presented in
SWE WG)
• Key aim is to provide simple, complete description of
uncertain values
Summary
• UncertML 2.0 is designed to be easy to use
– within UncertWeb it is the primary method for
communicating uncertainty between components in
workflows
– within GeoViQua we are proposing UncertML as a means
of improving the treatment of data quality in GEOSS,
alongside QA4EO
• Addressing uncertainty in a principled manner is
critical to rational decision making and optimal use
of available information
The research leading to these results has received funding from the European Union Seventh
Framework Programme (FP7/2007-2013) under grant agreements n° [248488 and 265178].
Further information
• Further details from:
– UncertML: www.uncertml.org
– UncertWeb: www.uncertweb.org
– GeoViQua: www.geoviqua.org
– Dan Cornford ([email protected])
– Matthew Williams ([email protected])
The research leading to these results has received funding from the European Union Seventh
Framework Programme (FP7/2007-2013) under grant agreements n° [248488 and 265178].