Transcript Document
Modeling
compositional data
Some collaborators
Deformations: Paul Sampson
Wendy Meiring, Doris Damian
Space-time: Tilmann Gneiting
Francesca Bruno
Deterministic models: Montserrat
Fuentes, Peter Challenor
Markov random fields: Finn Lindström
Wavelets: Don Percival
Brandon Whitcher, Peter Craigmile,
Debashis Mondal
Background
NAPAP, 1980’s
Workshop on biological monitoring,
1986
Dirichlet process: Gary Grunwald, 1987
Current framework: Dean Billheimer,
1995
Other co-workers: Adrian Raftery,
Mariabeth Silkey, Eun-Sug Park
Compositional data
Vector of proportions
z (z1,..., z k )
T
zi 0
k
zi 1
z k1
1
Proportion of taxes in different
categories
Composition of rock samples
Composition of biological populations
Composition of air pollution
The triangle plot
1
Proportion 1
(0.55,0.15,0.30)
0
0
Proportion 2
1
0
1
Proportion 3
The spider plot
0.2
0.4
0.6
0.8
1.0
(0.40,0.20,0.10,0.05,0.25)
An algebra for
compositions
Perturbation: For ,
define
k k
1 1
,..., k
k
k 1
i i
i i
1
1
1
1
The composition ,..., acts as a
k
k
zero, so .
1
1
1
1
.
,...,
Set
so
k
1
k1
1
Finally define
.
The logistic normal
T
z
zk 1
1
alr(z)
log
,...,log
~ MVN(m,S)
If
z
zk
k
we say that z is logistic normal, in short
Z ~ LN(m,S).
Other distributions on the simplex:
Dirichlet — ratios of independent
gammas
“Danish” — ratios of independent
inverse Gaussian
Both have very limited correlation
structure.
Scalar multiplication
Let a be a scalar. Define
a
a
1
k
a
a ,...,
a
i
i
k 1,,is a complete inner product
space, with inner product given, e.g., by
, alr()T N 1alr()
N is the multinomial covariance N=I+jjT
j is a vector of k-1 ones.
, is a norm on the simplex.
The inner product and norm are invariant to
permutations of the components of the
composition.
Some models
Measurement error:
where ej ~ LN(0,S) .
zj e j
Regression:
j g uj centered
covariate
compositions
Correspondence in Euclidean space:
mj
0
1
(xj x )
alr 1(m j ) alr 1(0 ) alr 1(1 ) (x j x)
j
g
uj
Some regression lines
Time series (AR 1)
zk1 zk ek
A source receptor model
Observe relative concentration Yi of k
species at a location over time.
Consider p sources with chemical
profiles qj. Let i be the vector of
mixing proportions of the different
sources at the receptor on day i.
EYi
p
ijqj Qi
i1
Y Qi ei
Q ~ LN, i ~ indep LN, ei ~ zero mean LN
Juneau air quality
50 observations of relative mass of 5
chemical species. Goal: determine the
contribution of wood smoke to local
pollution load.
Prior specification:
f(Q, i ,e i ,m ,, S e )
f( i m ,) f(e i S e )f(m )f()f(S e )
Inference by MCMC.
Wood smoke contribution
95% CL
50% CL
Source profiles
(pyrene)
(benzo(a))
(fluoranthene)
(chrysene)
(benzo(b))
State-space model
Space-time model of proportions
State-space model:
zj unobservable compositionk ~ LN(mj,Sj)
yj k-vector of counts ~ Mult( y j i , z j )
i1
Inference using MCMC again
Stability of arthropod
food webs
Omnivory thought to destabilize ecological
communities
Stability: Capacity to recover from shock
(relative abundance in trophic classes)
Mount St. Helens experiment: 6 treat-ments
in 2-way factorial design; 5 reps.
Predator manipulation (3 levels)
Vegetation disturbance (2 levels)
Count anthropods, 6 wks after treatment.
Divide into specialized herbivores, general
herbivores, predators.
Specification of structure
S is generated from independent
observations at each treatment
mean depends only on treatment
Benthic invertebrates
in estuary
EMAP estuaries monitoring program:
Delaware Bay 1990. 25 locations, 3 grab
samples of bottom sediment during
summer
Invertebrates in samples classified into
–pollution tolerant
–pollution intolerant
–suspension feeders (control group;
mainly palp worms)
Site j, subsample t
z jt : LN(q j x j ,)
qj ~ CAR process
E(q j q j ) m
Var(q j q j )
n (qk m)
kN(j) j
nj
Effect of salinity