lognormal - Coronal Diagnostic Spectrometer

Download Report

Transcript lognormal - Coronal Diagnostic Spectrometer

Quiet Sun intensity
distributions
C. A. Young1, D. Bewsher2 , J. Ireland1
1L3Com/GSI,
NASA’s GSFC, Code 612.5, Greenbelt MD 20771
2Department of Physics, Astronomy and Mathematics, UCLAN,
Preston, PR1 2HE, United Kingdom.
This work was supported by NASA contract # NAS5-00220.
Introduction
The statistical properties of the solar atmosphere are a result of
the physical processes present. There have been several
attempts to describe the intensity distribution intensity in the quiet
Sun. Knowledge of the intensity distribution may yield
information on its formation, or the number of identifiable
components to the quiet Sun.
Two recent studies have looked at the quiet Sun intensity
distribution with two very different purposes in mind. Gallagher et
al. (1998) sought to define a threshold distinguishing network
from internetwork, whilst Pauluhn et al. (2000) sought the best
model description for the entire distribution. We examine these
studies in more detail.
Gallagher et al. (1998) used
• SOHO1-Coronal Diagnostic Spectrometer (CDS) quiet Sun data
• Every pixel in each raster.
• Mixture modeling to find the number of Gaussian distributions
present in the histogram of observed intensities.
• Fitting of the sum of Gaussians to the histogram of observed
densities to find the Gaussian parameters.
However
• Neighbouring pixels are not statistically independent due to the
CDS point spread function.
• The binning of the histogram is arbitrary, which necessarily
affects the fitting.
• The choice of using Gaussians to model the histogram is
arbitrary.
• The purpose of the study is to find a network/internetwork
threshold, not to model the distribution.
1Solar
and Heliospheric Observatory
Pauluhn et al. (2000) used
• Coronal Diagnostic Spectrometer (CDS) and Solar and Ultraviolet
Measurements of Emitted Radiation (SUMER) quiet Sun data.
• Every pixel in each raster.
• Fitting of a variety of arbitrary test distributions to the histogram
of observed intensities.
However
• Neighbouring pixels are not statistically independent due to the
CDS and SUMER point spread function.
• The binning of the histogram is arbitrary, which necessarily
affects the fitting.
• The choice of test distributions to model the histogram is
arbitrary.
• The purpose of the study is to model the distribution.
Both studies use data which are not statistically
independent and make quality judgments on the best
model based on a fitting technique which includes an
arbitrary parameter.
The point spread function inherent in many instruments means
that neighbouring pixels are unlikely to be statistically
independent. This can be circumvented by undersampling the
image.
Fitting a distribution, or mixture of distributions, to a histogram
clearly depends on the histogram binning, introducing
subjectivity. This approach can be sidestepped completely by
using an Expectation-Maximization (EM) algorithm.
The choice and mixture of distributions can be motivated by
physical concerns, through considering fragmentation
mechanisms and mixture modeling.
Mixture Modeling
The distribution of the n data, y, is described
by a k-component
k
p(y |  )   m1 m p(y |  m )
finite mixture distribution,
. The ms
are the mixing probabilities,    1 and  is the complete set
of parameters needed to specify the distributions.
k
m1
m
For the component distributions we use either a normal or a
lognormal:
p(y |  m ) 
(i )
 21
(2 )
m
1
1 (y(i )  m )
exp(
)
2
2
m
(2 ) 2
1 (log y(i)  m )
(i )
p(y |  m ) 
exp(
)
2
y m
2
m
An EM algorithm
In order to determined the parameters in the mixture distribution
we use the maximum likelihood
estimate,
)

ML
=arg max {log p(y |  )}

This cannot be found analytically so we use a modified form of
the EM or Expectation-Maximization algorithm.
• E-step: compute the conditional expectation the
complete log-likelihood, the so-called Q function.
)
)
Q( , (t))  E[log p(y, z |  ) | y, (t))]
• M-step: Iteratively estimate the parameters that
maximize the Q function.
)
 (t+1)=arg max Q( , (t))

We use a modified method that is unsupervised in that it
determines the number of components.
Lognormal distributions
and fragmentation
Pauluhn et al. (2000) show that distributions of EUV intensities in
quiet Sun SOHO-CDS1 and SOHO-SUMER2 data can be well
represented by a lognormal distribution
2 

1
1 ln x   
exp 
 
x
 2    
where x is the EUV intensity,  is the location parameter and 
is the shape parameter. This single distribution is found to fit
the observed distribution better than a two Gaussian distribution,
as found 
by Gallagher et al. (1998). The presence of one
distribution, rather than two, implies that a single process may
be occurring, as opposed to two processes. Lognormal
distributions also arise in the distribution of sunspot areas
(Bogdan et al., 1988) and decay rates (Martínez-Pillet et al.,
1993).
A lognormal distribution arises in the fragmentation of a quantity
A into two fractional pieces A(1-xi) and Axi. After n fragmentations
n
An  A 1 x i 
i1
If the set xi are independent random variables drawn from the
same distribution p(x) then, under certain conditions on p, the
logarithm of the distribution of fragmented areas is normally
distributed. 
Given that lognormal distributions are observed in the quiet Sun,
and that a reasonable physical mechanism may be responsible
for their presence, we model the observed intensity
distributions using a mixture of lognormals. We compare
these results to using a mixture of Gaussian distributions. The
data used are the same as in Gallagher et al. (1998).
The Dataset
• Instrument: CDS/NIS
• Observed lines: He I, He II, O III, O IV,
O V, Ne VI, Mg IX, Mg X
• Area imaged: 240 x 240 arcsec2
• Number of images: 10
• Subsampling: use every third pixel in x
and y
Example images
true CDS raster image
undersampled image
Example mixture for fully
sampled image
Gaussian mixture
lognormal mixture
Example mixture for
subsampled image
Gaussian mixture
lognormal mixture
Results:
Gaussian
vs.
lognormal
for fully
sampled
data
Results:
Gaussian
vs.
lognormal
for undersampled
data
Discussion
The unsupervised expectation maximization analysis shows
(for the CDS lines chosen in the quiet Sun) that the intensity
distribution is more economically modeled with lognormal
distributions in preference to Gaussian distributions,
suggesting that fragmentation mechanisms may be
operating in the quiet Sun. Undersampling also reduces the
number of components in the mixture.
The presence of lognormal distributions is taken as a signature
of the presence of a fragmentation mechanism. However, it is not
clear what may be fragmenting to cause the observed intensity
distribution.
Fragmentation has been well studied in the kinetics of polymer
degradation. Cheng and Redner (1990) describe a model
equation for the evolution of a distribution of particles c(x,t) that
fragment independently under the influence of external forces,

c(x,t)
 a(x)c(x,t) 
t

 c(y,t)a(y) f (x | y)dy
(1)
x
Here a(x)dt is proportional to the probability that a particle of
mass x breaks in a time interval dt, while f(x|y) is the conditional
probability at which x is produced from the breakup of y.
The first term on the RHS is number of particles of mass x lost
due to their breakup: the 2nd term is represents the number of
particles created from particles of larger mass. Conserving mass
and assuming
a(x)  x 
(2)
that is, the overall breakup rate depends on the fragment mass
only. A scaling ansatz is introduced,
2
c(x,t)

s(t)
x /s(t)

(3)
Cheng and Redner (1990) show the existence of an asymptotic
lognormal solution for small fragment size x,

 (x,t 0 )  exp c 0 ln x 
2

(4)
for some constant c0, fixed time t0, and a conditional probability
that has cut off (zero probability of fragmentation) at small
fragment sizes - a minimum fragment size.

At large fragment size x, the distribution behaves as
(x,t0 )  x exp c2 x
c1


(5)
for some constants c1, c2. It can be seen from this analysis that
fragmentation distributions can be more complex than just
lognormal.

Equation (1) describes the evolution of a population
undergoing fragmentation only. There are many other
processes going on in the quiet Sun; for example, Schrijver at
al. (1997) describe the equations of magnetochemistry
which govern the distribution of positive and negative
magnetic flux (magnitude ) in the quiet Sun:
N 
1
 S 
t
2


 N (x)N (  x)l(x,  x)dx N  N (x)l(, x)dx




0
0


2  N  (x)k(, x)dx N   k(x,   x)dx

(6)



  N  (  x)N (x)m(  x, x)dx N  ( )  N (x)m(, x)dx
0
0
0


N± is the number of positive (negative) flux fragments. A source
S± is present. The red terms describe the gain and loss by like
fragments merging respectively,
 controlled by the rate l.
The blue terms describe gain and loss by binary
fragmentation (2 daughter fragments produced in each
fragmentation; equation 1 permits more than 2 fragments to
be created per fragmentation) respectively, mediated by the
rate k. The green terms describe the gain and loss by flux
cancellation respectively, governed by the rate m.
Equation 6 has solutions in special cases. Parnell (2002)
measures the distribution of flux concentrations in SOHO
Michelson Doppler Imager (MDI) data and finds that a Weibull
distribution
 x  
x  1 exp  

   

(7)
fits the data much better than a simple power law (where x is
the flux in a flux concentration,  is the shape parameter
and  is the scale parameter). This distribution arises in
the study
of fractures in materials. Parnell (2002) derive
functional forms for k,l and m from the measured Weibull
distribution plus assumptions about the form of solution to (6).
Brown and Wohletz (1995) point out that the lognormal and
Weibull distributions are very similar for certain parameter ranges
which may lead to the misidentification of distributions. Although
the Weibull distribution can be motivated from fragmentation
concerns, Parnell (2002) did not compare the Weibull and power
law fits with a lognormal distribution.
It is hoped that by (1) applying more advanced data analysis
techniques and (2) elements of fragmentation theory - both
of which we have introduced here - we can arrive at a more
complete understanding of the statistical properties of the quiet
Sun and the mechanisms that create them.
References
Bogdan T. J., Gilman, P. A., Lerche, I., Howard, R., ApJ, 327, 451, 1988
Brown, W. K., Wohletz, K. H., J. Appl. Phys., 48(4), 2758, 1995.
Cheng, Z., Redner, S., J. Phys. A: Math. Gen., 23, 1233, 1990.
Figueiredo, M., Jain, A., IEEE Trans. Patt. and Mach. Int., 24, 381, 2002.
Gallagher, P. T., Phillips, K. J. H., Harra-Murnion, L. K., Keenan, F. P., A. & A., 335, 733.
Martínez Pillet, V., Moreno-Insertis, F., Vázquez, M., A. & A., 274, 521, 1993.
Parnell, C. E., MNRAS, 335, 389, 2002.
Pauluhn, A., Solanki, S. K., Rüedi, I., Landi, E., Schüle, U., A. & A., 362, 737, 2000.
Schrijver, C. J., Title, A. M., van Ballegooijen, A. A., Hagenaar, H. J., Shine, R. A., ApJ,
487, 424, 1997.
Modified EM
The EM algorithm is very successful, however there are
several drawbacks to the standard method.
• EM does not determine the number of components,
only the parameters.
• EM may converge to the edge of parameter space, i.e.
one of the s may approach zero causing some of the
parameters to become singular.
We use a robust new modified method that is unsupervised
(determines the number of components and is insensitive
to initial conditions) and avoids the convergence problems.
(Figueiredo and Jain 2002)
Modified EM
(Figueiredo and Jain 2002)