Transcript Document
July, 2001
Outline
Purpose of GeoSEM
Background
Methods
Examples
Project Status
Future Developments
July 2001
Background
• Currently, the EPC is typically estimated using one of two
formulas1:
•
•
Assume normal distribution
sˆ
ˆ
95th UCL t0.05,n
n
Assume log-normal distribution (Land)
H 0.05,n, sˆ
2
95th UCL exp ˆ 0.5sˆ sˆ
n 1
1Details
are provided on pages 138 and 169 in Gilbert 1987, Statistical Methods for Environmental Pollution
Monitoring.
July 2001
Purpose of GeoSEM
Provide a user-friendly tool to incorporate spatial
statistics in risk assessment.
Three contributions to risk assessment:
1.
Consider sample locations in the estimated EPC (reduce
bias)
–
2.
Maximize information from limited sample sizes (increase
precision)
–
3.
Point estimate of the mean; UCL on the mean
When sampling scale > EU scale
Quantify and display variability and uncertainty in the
spatial distribution of risk (specify PDFs, CIs)
–
Point and ‘block’ (i.e., area) estimates
July 2001
Estimating the EPC – 3 methods
Spatial Weighting of Sample Data
Small sample sizes OK
Thiessen polygons
Weights are proportional to area of polygon
Kriging
Kriging over an irregular block (exposure unit)*
Minimum sample size required
Simulation (soon to be added to GeoSEM)*
Less sensitive to assumptions of normality and constant variance
Estimates of uncertainty are conditional to the samples used
* Unique to GeoSEM
July 2001
Spatially weighting data
EU boundaries
S#
S#
perpendicular bisectors
S#
S#
Thiessen
Thiessenpolygons
polygonsassign
is oneaof several
methods
S#
Small polygons are
representative
area to each
sample
available to calculate
spatial
weights
for
sample
S#
Polygons are formed byindicative
connecting
of samples
data. We elected to use Thiessen
polygons
S#
that
are located close
perpendicular
drawn
S#
method in GeoSEM because it is bisectors
a commonly
to other samples (i.e.,
between
each
pair
of
adjacent
used, straightforward
method
for
accounting
for
S#
spatial
clustering),
S#
spatial
weight
forsamples.
sample
i = that spatial
spatial
clustering
of
Note
sampling
locations.
and therefore receive
clustering (area
may arise
from the
use
of
random
as * n
of polygon
/
total
area
of
EU)
S# i
#S smaller weights than
= exposure
unit
well as non-random EU
sampling
methods.
samples which are
n = number of samples
S#
S#
July 2001
S#
spaced further apart.
Kriging (ordinary kriging, ‘OK’)
• Many types of kriging can be performed with GeoSEM.
• At this time, OK is recommended for estimating the EPC. (Research is
planned to explore the accuracy of EPCs estimated with selected
kriging and geostatistical simulation algorithms.)
• On the following slides, OK1 is used to describe the steps involved in
calculating the kriging estimate of a concentration (or any other
attribute) at an unsampled location (i.e., point kriging). The estimation
of the mean concentration within a square area (i.e., block kriging) is
then described. Finally, the algorithm GeoSEM uses2 to extend the
block kriging algorithm to estimate the EPC for irregular areas is
explained.
1other
forms of kriging consist of modifications of the OK algorithm and thus share
many of the same features with OK.
2At this time, GeoSEM calls the gstat freeware executable to perform all geostatistical
calculations.
July 2001
First Step: Estimating the Variogram
• The variogram is used to model
the spatial dependency (i.e.,
autocorrelation) present in the
data. Spatial autocorrelation is
expressed as the tendency for
samples located close together to
have similar concentration levels.
• The x-axis shows the distance
between sample locations
The spherical semivariogram model shown
above has the following parameters:
sill (sample variance) = 1600 units
nugget (variance at 0 distance) = 0
range (distance at which variance = sill) = 1980 units
July 2001
• The y-axis shows the average
variance between samples
located ‘x-units’ apart.
OK – Point Kriging
Kriging Estimate, zˆ :
S2
= weighted average of measured concentrations,
d2
d1
S1
d5
n
d3
0
d4
S3
S4
j 1
2
ok ):
n
ˆ
ˆ
ˆ
wi Ci 0
i 1
2
j
1 , which ensures the point estimate is unbiased,
n
using the following equation:
= point or location where estimate is desired
= sample locations
d1 = distance between sample and point to be estimated
2
ok
The weights are determined by minimizing the error
variance, under the constraint that the weights sum to 1
w
search radius
n = number of samples located within search radius
j 1
n
S5
Kriging variance (ˆ
zˆ w j x j ;
Cˆ i 0 w j Cˆ ij
j 1
Where
= the Lagrangian parameter, which is required to satisfy the
unbiasedness constraint.
The terms Cij and Ci0 represent the covariance between sample
pairs i and j and sample i and the point to be estimated (0),
respectively; the covariances are read from the semivariogram,
based on the distances, d1-d5.
July 2001
OK – Block Kriging (square block)
The equation used to calculate the block
kriging estimate is very similar to the point
kriging equation with one exception: the Ci0
term (see previous slide) is replaced by the
CiA term which represents the average
covariance between sample i and the points
used to discretize the block.
S2
S3
S1
S4
S5
Red lines: sample point to
block covariance
Green lines: within block
covariance
= block discretizing point
= sample locations
n
1 n ˆ
ˆ
ˆ
ˆ
CiA w j Cij , where: CiA Cij
A j | jA
j 1
Block Kriging variance (ˆ Bok ):
2
n
ˆ
ˆ C AA wi Cˆ iA , where:
i 1
1
Cˆ AA
Cˆ
2 ij = average covariance
| A | i |iA j | jA between discretizing points
2
Bok
A = area of block
July 2001
Block Kriging: irregular area
Exposure Unit (EU)
GeoSEM1 is able to calculate a block kriging estimate
for an area (i.e., EU) of any shape or size. The area is
discretized by a grid of points as shown to the left. The
grid spacing is determined by the user. GeoSEM will
return a mean (i.e., EPC) and kriging variance that can
be used to assess the uncertainty in the mean (e.g., to
calculate the 95th UCL on the mean). The equations for
the block kriging mean and variance are the same as
those shown on the previous slide.
The EPC for an irregular shape such as the one shown here could be estimated by computing
block kriging estimates for each of a series of square blocks, as indicated in the figure to the left.
The EPC could then be estimated by averaging the block means for each of the square blocks.
However, averaging the block kriging variances for each of the square blocks to estimate the
kriging variance for the EU is not valid. The ability to compute a kriging variance for an irregular
shaped EU, and therefore to assess uncertainty in the EPC, is unique to GeoSEM. This ability will
soon be extended by offering the user the option to perform geostatistical simulation to asses
uncertainty in the EPC, as well to describe the uncertainty in the spatial distribution of
contaminants across a site or EU.
July 2001
1all
geostatistical analyses are performed by gstat
Kriging estimate of a UCL
• An upper confidence level on a point kriging estimate and block
kriging estimate (EPC) can be estimated using the same equation.
• The equation assumes the distribution for uncertainty can be modeled
by a normal distribution. For example:
95th UCL ˆ t0.05,n ˆ OK
Where,
ˆ = kriged point or block estimate
ˆ OK = kriging variance
July 2001
Simulation (SGS)
• At this time GeoSEM uses the gstat sequential Gaussian simulation
(SGS) algorithm; additional simulation methods will be supported in
the future.
• The SGS algorithm assumes the sample data are derived from a
population that can be modeled by a multivariate normal distribution.
However, preliminary research has indicated the SGS method is robust
to departures from normality.
• Similar to kriging, GeoSEM can perform point or block simulation.
The block simulation method is used for estimating EPCs; point
simulation is used to describe the spatial distribution of a pollutant
across the EU or site. Both types of simulation are described on the
following slides.
July 2001
Point Simulation – spatial distributions
The SGS algorithm is an extension of the kriging algorithm. The
simulation algorithm follows a random path through the grid,
eventually producing an estimate at each grid point. In the first step
of the estimation procedure, the kriging estimate for a given point is
calculated according to the method described on the previous slides,
with one important distinction: the kriging estimate is a weighted
average of all sample locations and previously simulated estimates
that are located within the search radius chosen by the user. In the
second step, the kriging estimate and variance for the grid point are
Point being
used to define a normal probability distribution function (pdf)
search radius
estimated
representing uncertainty in the concentration at the grid point. A
= point or location where a concentration has
value is then randomly selected from the pdf and assigned to the grid
been simulated
point. The two-step process is then repeated at the next randomly= point or location where a concentration has
selected grid point, until all points have been estimated. Completion
not yet been simulated
of the simulation process at all grid nodes constitutes one simulation,
= sample locations
or ‘realization’. The number of simulations (‘R’) is determined by
data used (some arrows are not shown)
the user.
Uncertainty in the spatial distribution of a pollutant is modeled from the R simulated concentrations at each of the
grid points. For example, if 100 simulations were performed, a map of the 95 th % from the distribution of uncertainty
could be prepared by ranking the simulated concentrations at each node and selecting the value corresponding to the
95th simulation.
July 2001
Block Simulation – estimating EPC
Exposure Unit (EU)
Block simulation is accomplished using an algorithm that is a
mixture of the block kriging and point simulation algorithms
that were described previously. In the first step of the
simulation, kriging is used to estimate a block kriging mean
and kriging variance for the EU. The block kriging mean and
variance define a normal probability distribution that is used
‘R’ realizations
to represent uncertainty in the kriged mean. In the second
step of the simulation, the estimate of the mean for the EU is
produced by randomly selecting a value from the distribution
of uncertainty. The two-step process is repeated for each EU.
Once an EPC for each EU has been estimated, the simulation
is complete. The entire process is repeated for the desired
= block discretizing point
number of simulations (R), which produces R estimates of the
mean for each EU.
Uncertainty in the EPC is modeled from the R simulated means for each EU. For example, if the user
wished to use the 90th UCL on the mean as the EPC, and assuming 1000 simulations were performed,
the EPC could be estimated by the ranking the simulated means for each EU and selecting the 900 th
largest mean for each EU.
July 2001
Purpose 1: unbiased estimates(?)
• The next slide contains a map of a Superfund site that shows the spatial
distribution of surface soil samples collected from the site. The site
boundary shown was assumed for the purposes of illustration.
• The two slides after the site map contain tables comparing the mean
and 95th UCL on the mean calculated using the current methodology
(i.e., non-spatial methods) and the two of the spatial approaches
currently available in GeoSEM: spatial weighting and OK.
• The tables illustrate the importance of considering the spatial
distribution of samples at the site when estimating the EPC. The tables
do not offer proof that the spatial methods totally remove the bias
present in the non-spatial estimates; further research is required in this
area.
July 2001
Site 1 sample distribution
July 2001
Comparison for Arithmetic Mean
Chem/ Rad Count
Mean
SpMean KrigMean
Arsenic (ppm)
52
20.60
26.55
61.74
Pb-210 (pCi/g)
52
15.19
21.50
51.15
Ra-226 (pCi/g)
52
17.68
25.92
69.99
Ra-228 (pCi/g)
52
1.13
1.31
1.86
Po-210 (pCi/g)
52
18.56
26.79
67.45
Blue = exceeds 1x10-6 cancer risk s Red = exceeds 1x10-5 cancer risk
SpMean = spatially-weighted mean
Exposure scenario: wildlife worker (arsenic RBC from Bunker Hill) Expedited
Screening Level Assessment, URS Greiner/CH2MHill/SRC, 1998)
Comparison for 95th UCL
SpLnChem/ Rad Norm
Norm norm
Sp Lnnorm
Krig
Mean
Arsenic (ppm) 28.1
34.8
28.6
41.8
76.9
Pb-210 (pCi/g) 22.0
29.9
18.9
32.2
65.9
Ra-226 (pCi/g) 26.9
37.3
30.1
64.8
95.3
Ra-228 (pCi/g)
1.3
1.5
1.3
1.5
2.1
Po-210 (pCi/g) 27.6
37.9
24.8
44.8
86.6
Blue = exceeds 1x10-6 cancer risk s Red = exceeds 1x10-5 cancer risk
Exposure scenario: wildlife worker (arsenic from RBC Bunker Hill)
Expedited Screening Level Assessment, URS Greiner/CH2MHill/SRC, 1998)
Purpose 2: maximize information
•
•
•
•
The next four slides address the use of kriging to improve the accuracy of
estimates of the EPC.
There are 236 samples available for the 50-acre site. This would generally be
considered more than an adequate number of data to estimate the EPC for the
site; i.e, if the EU = site.
However, if the current or future use of the site was residential, the EU is
defined as a residential yard. Despite the relatively high number of samples
for the site, we go from a ‘data rich’ (n=236) to a ‘data poor’ situation (0 <= n
<= 8) if an estimate of the EPC for each yard is required.
The next slide describes some of the advantages of kriging and simulation in
this situation. We do not offer any proof at this time that the spatial methods
are more accurate than the non-spatial methods. At the very least however,
kriging and simulation allow one to estimate the concentration for an EU
without samples located within it. This could be very useful for designing
future sampling/remediation efforts, as discussed on the following slides.
July 2001
Max. info from small n - methods
Kriging
Considers the spatial continuity (spatial
autocorrelation) in the data
Samples located outside the EU are considered
Application to any geographic shape (EU or site)
Simulation
Robust to departures from normality and constant
variance
Estimates of uncertainty are conditional to the
samples used
Simulation over any geographic shape (EU or site)
July 2001
Site 2 sample locations
July 2001
Max info from small n
• The following slide focuses on the eastern portion
of the site (which has been rotated 90o clockwise).
• The polygons were arbitrarily drawn and are
intended to represent residential yards (i.e.,
EUs), that range in size from approximately
0.5-2 acres in size.
July 2001
Estimate EPC
spatially-weighted
mean
The spatially-weighted estimate and the
kriging estimate indicate the
Simple
mean
EPC is higher than the simple mean. The samples are shifted to the
eastern portion of the EU, towards the lower contamination levels (as
indicated by the samples in the surrounding EUs). The spatiallyweighted method gives a higher weight to the sample located in the
middle of the EU to account for the sample configuration. Kriging uses
the data in the surrounding EUs to estimate the EPC for this EU.
In this case, kriging is able to produce an
Again, kriging is able to produce an
estimate where the other two methods can
estimate where the other two methods can
to lack
of samples
in this
not,not,
due due
to a lack
of samples
located located
in the
EU.EU.
The The
high estimate
for the EPC
(2165
low estimate
for the
EPC (16.2
ppm)
indicates
additional
samplingsampling
may not may
ppm)
indicates
additional
be necessary to determine if this EU
not be necessary for this EU.
requires remediation. Geostatistics can be
used however to determine which portions
of the EU should be remediated (see the
‘CDFs for Spatial Uncertainty’July
slide).
2001
kriged
mean
Uncertainty - Pr(EPC > RBC) ?
Issue 2: Constant variance assumption:
Simulation
The second issue this slide addresses is the constant Kriging
Issue 1: Maximum
info from small n:of kriging. Use of the kriging
An effect of variance
violatingassumption
the constant variance
calculate
UCLs
on probability
the
mean (or
slide to
shows
3 in
maps
of
the
ofthe
the EPC
Theassumption
EUs shownThis
inillustrated
white
(‘-999’)
the
logisvariance
in the
kriging
map.
The
probability
exceeding
an
RBC)
requires an
exceeding
a of
risk-based
concentration
ofmap,
1100the simulation
normal
indicate
thethe
probability
exceeding
EUsmap
delineated
with
blue
lineof
are
in an
area
In
contrast
to the(RBC)
kriging
assumption
thataddresses
the
variance
is constant
ppm.
slide
two
issues:
maximizing
the of
RBC
not
beThis
estimated
duekriging
to in fact,
the could
site with
low
contamination;
many
map
looks
much 1)
more
consistent with the
across
the
site;
i.e.,
the
data
exhibit
the
same
information
available
from
small
samples
;
2)
theSimulation indicates it
insufficient
sampleare
size,
or in some The
cases,
the concentrations.amount
of the samples
non-detects.
kriging
measured
ofinvariability
inaall
ofof
thekriging.
site. This
constant
variance
assumption
lackestimate
of variation
theindicates
data
(i.e.,
allregions
the
samples
however
high
isprobability
very
unlikely
that the EPC exceeds the RBC of
assumption
is
often
a
poor
one
for
contaminated
are that
non-detects).
the EPC > RBC for all of these
EUs.
1,100
ppm for the EUs delineated Log-normal
by the blue
sites;
data
from
areas
of
high
contamination
will
th
Another
way to
ofestimate
lookingthe
at probability
this: ifline.
the of
95
UCLthe low number of samples (in
Despite
Note:
the inability
exhibit
high
variability
while
data from areas of low
was used
as the
for these
thecases
EPC
exceeding
the RBC
is EPC
equivalent
to anEUs,
inability
to
some
0 samples) located within these EUs,
contamination
will
tend
to
be
less
variable. In fact,
th UCL (or
would
than
RBC,
despite
estimate
thebe
95greater
anythe
other
UCL)
on thethe
mean. simulation, by considering the
geostatistical
variancemeasured
is determined
the geometrical
consistently the
lowkriging
concentrations
in thisbyof
spatial pattern
contamination, indicates it is
arrangement
of
the
data
(i.e.,
sample
locations)
that is
area of the site.
not worthwhile to collect additional data in this
used to estimate the mean; the actual concentrations
area.
are not considered directly by the kriging variance.
July 2001
Purpose 3: spatial distributions
Kriging
Spatial Variability: MVUE of concentrations (regardless of
underlying distribution); maps are “smoothed”
Spatial Uncertainty: assumes normality and constant
variance
Simulation
Spatial Variability: approximately reproduces the spatial
structure (autocorrelation) and variability of the sample data
Spatial Uncertainty:
Robust to departures from normality and constant variance
Estimates of uncertainty are conditional to the data
July 2001
Spatial Distributions
• The following slides show some examples
of the use of geostatistics to produce maps
and probability distributions that describe
the spatial variability and uncertainty in
surface soil
• This information can be used to prepare
future sampling plans and in remediation
planning and design.
July 2001
Spatial Variability
Ordinary Kriging
• accurate (MVUE)
• smoothing
Sequential Gaussian
Simulation
• less smoothing:
better reproduction
of spatial structure
July 2001
CDF for Spatial Variability
F(x)
Ordinary Kriging
1.00
0.75
0.50
0.25
0.00
0
1000
2000
3000
4000
Pb (ppm)
July 2001
5000
Quantile
Estimate
5
12
10
24
25
102
50
475
75
1638
95
4831
99
18875
Spatial Uncertainty – 95th UCLs
Ordinary Kriging
• constant variance
assumption
• variance =
f(sample geometry)
Sequential Gaussian
Simulation
• variance is conditional
to neighborhood
July 2001
CDFs for Spatial Uncertainty
CDFs for 2 remedial
units located within
EU 88
EU 88
F(x)
1.00
This information can be
used in remedial action
planning and design: which
remedial units should be
cleaned up to achieve the
remedial action objectives?
- the RUs with the greatest
probability of exceeding the
PRG.
0.75
0.50
0.25
0.00
0
1000
2000
3000
mean Pb (ppm)
July 2001
4000
5000
Site 1 sample locations
July 2001
Ra-226 (pCi/g) Spatial Variability
July 2001
Pb-210 (pCi/g) Spatial Variability
July 2001
Another way to view uncertainty
• Convert uncertainty in media concentration to
uncertainty in risk
Pr(estimated conc > PRG)
Pr(RME risk > risk level of concern)
• Point estimate (95th UCL) or probabilistic (PDFu)
results
• Map with color shading to highlight areas of
concern (concentrations or risks)
July 2001
Pb-210 Cancer Risk
July 2001
Ra-226 Cancer Risk
July 2001
SGS – Constant Variance
untransformed
data
normal scoretransformed data
July 2001
Uncertainty in Risk - As
OK
SGS
July 2001
Uncertainty in Risk – Ra-226
July 2001
Project Status - Software capabilities
Estimate the 95% UCL or P(mean > PRG)
Normal PDFv (t-statistic), Lognormal PDFv (h-statistic)
Spatial methods: spatially-weighted data and kriging
Exposure Unit
Define the size and shape with the mouse
Legend Tool
Display results by percentiles or assigned intervals
Add labels, graphics and text to maps
July 2001
Project Status - Software capabilities
Import Data
Access, Excel, DBF, Arcview, ArcInfo, ArcSDE, DOD
Imports Maps and Images
AutoCAD, Geo-TIFF (e.g., USGS/DOT quads), GIF,
JPEG
Export Results
DBF, Arcview, other ESRI copy/paste maps and statistics
to clipboard
July 2001
Future Enhancements
Spatial Statistics
Geostatistical simulation module is being tested; soon to be
completed
Thiessen polygons
Data Analysis Features
PDFu and CDFu
Identify potential outliers (‘masking’)
Probability Plots
Output Features
Improve legend tools for labeling and rendering
Create contour maps for sample data, and kriged & simulated data
Create interface with ISE model, Variowin freeware
July 2001