Lecture Notes File

Download Report

Transcript Lecture Notes File

CHAPTER II
REVIEW OF BASIC
STATISTICAL CONCEPTS
METU, GGIT 538
OUTLINE (Last Week)
Introduction to Spatial Data Analysis
1. Introduction
1.1. Introduction
1.1.1. Scope of spatial statistics
1.2. Spatial versus non-spatial data analysis
1.2.1. Relatiaonship between classes of spatial entities
1.2.1. Facts on attributes of spatial entities
1.3. Types of spatial phenomena and relationships
1.4. Problem types in spatial data analysis
1.4.1. Problems of spatially discrete point data
1.4.2. Problems of spatially continuous point data
1.4.3. Problems of area data
1.4.4. Problems of spatial interaction data
METU, GGIT 538
OUTLINE
Review of Basic Statistical Concepts
2.1.Random Variables and Probability Distributions
2.1.1.The Binomial Distribution
2.1.2.The Poisson Distribution
2.1.3.The Normal Distribution
2.2. Expectation
2.3. Maximum Likelihood Estimation
2.4. Stationarity and Isotropy
METU, GGIT 538
2.1. Random Variables and Probability Distributions
Basic Definitions
Stochastic process: Phenomena, which are subject to uncertainty or
governed by laws of probability.
Random variable: They are the variables used to present stochastic
phenomena mathematically and said to be variables whose values
are subject to uncertainty. The term random implies that variables
are equally likely to take any of their possible values, in other words
they are subject to some form of uncertainty.
Random variables
The specific value of a r.v.
METU, GGIT 538
 Upper case letters
(e.g. X, Y)
 Lowercase letters
(e.g. x, y)
Basic Definitions
A stochastic process is described by random variables
and their probability distributions
 not that are equally likely to take on any of all
possible values
 phenomena subject to uncertainty
METU, GGIT 538
 E.g.
Y: Result of throwing a die
y : Particular value or outcome (1,2,3,4,5,6) after the die
thrown.
There are two types of random variables:
Discrete
(Integer values, counts)
Continuous (Any value within a continuous range)
METU, GGIT 538
Probability distribution: It is a mathematical function
which specifies the probability that particular values or
ranges of values will occur; i.e. relative chance of a
random variable taking different possible values are
characterized by its probability distribution.
Probability density at the value of y is fY(y) (i.e.
Probability that Y takes the value of y)
Probability that a random variable Y takes values
between (a, b) is expressed by:
b
å f (y )
Y
y =a

Discrete

Continuous
b
ò f (y )
Y
METU, GGIT 538
a
Cumulative Probability Distribution (Distribution Function)
It is a mathematical function specifying the probability
that Y takes any value less than or equal to the specific
value y (FY(y)).
ì y
ïå fY (y )
ï -¥
FY (y ) = í y
ï f (y )
ïò Y
î- ¥
METU, GGIT 538
if Y is discrete
if Y is continuous
METU, GGIT 538
The joint probability distribution is used if more
than one random variable is of concern (e.g., X
and Y) and expressed by:
f XY (x, y ) 
METU, GGIT 538
Probability associated with X taking the
value of x and Y taking the value of y
Important Probability Distributions
 Discrete
 Binomial
 Poisson
 Geometric
 Continuous
 Normal
 Lognormal
 t
 F
 2
 Exponential
 Gamma
METU, GGIT 538
The three fundamental probability distributions are:
1. Binomial Distribution
 Dichotomous events
2. Poisson Distribution
 Discrete events
3. Normal Distribution
 Continuous events
METU, GGIT 538
2.1.1.The Binomial Distribution
It relates to the events in which there are only two
possible outcomes.
The nature of the distribution is
typically explained with reference to events involving
outcomes of two types (head or tail, yes or no, male or
female, success and failure, etc.). It can be used to define
probability of each outcome occurring.
METU, GGIT 538
If the probability of X successes of N trial is denoted by
p(X), the binomial distribution is defined by:
æ N ö X N-X
p(x ) = çç ÷÷ p q
èXø
Where;
æN ö
çç ÷÷
èXø
:The combinatorial expression
p
:Probability of a “success” (occurrence)
q
:Probability of a “failure” (non-occurrence) {q = 1- p}
X
:Specified number of successes (occurrence)
N
:Number of trials (sequence length)
METU, GGIT 538
æN ö
N!
çç ÷÷ =
è X ø X !(N - X )!
Gives the number of times a particular
combinations of success (occurrences)
will result.
 E.g.
How many different combinations are there of 3
successes in 5 trials? (N = 5 and X = 3)
i.e. There are 10 permutations of 3 successes in a
sequence of 5 trials.
METU, GGIT 538
The form and shape of the distribution depend on the
length of the trials in a sequence (N) and the probability
of an occurrence (p).
If p = q the binomial distribution is symmetric
If p > q the binomial distribution is right skewed
If p < q the binomial distribution is left skewed
If N is large the distribution is more peaked
METU, GGIT 538
Figure 2.1. Effect of N and p on probability distribution
METU, GGIT 538
The Parameters of Binomial Distribution
Mean

 = Np
Variance

2 = Npq
Standard Deviation

=
Skewness

=
METU, GGIT 538
p=q

β=0
p<>0,5

β↑
2.1.2.The Poisson Distribution
It is used in respect of events which have discrete
number of outcomes (i.e. counted integers). The
probability of occurrence reduces as the integer count of
the event increases. This distribution is particularly
useful when the number of times a phenomenon occurs
within a given unit of time or space.
The distribution function of Poisson distribution is:
p(X) =
Xe-
X!
Where;
X :Specific number of events
λ :Average number of events
e = 2.7183….
METU, GGIT 538
E.g.
Consider the location of grocers in a town. # of grocers
per grid determines the X and the average number of
grocers per study area gives the λ.
Figure 2.2. Location of grocers in Sunderland
METU, GGIT 538
Figure 2.3. Effect of the value of λ on the Poisson distribution
METU, GGIT 538
The Parameters of Poisson Distribution
Mean

=
Variance

2 = 
Standard Deviation

=
Skewness
λ↑
λ↓
METU, GGIT 538

=

1

 β → 0 (More symmetrical)
 β ↑ (More skewed)
Basic Shortcomings of the Poisson Distribution
 λ has critical importance, its value is assumed to be
constant over the considered time and space.
 Frequency distribution can easily be altered by
varying the units of space or time for which the events
are recorded
METU, GGIT 538
2.1.3.The Normal Distribution
It relates to situations where continuous rather than
discrete outcomes are recorded for the observations
under investigation. It is used when the phenomena are
measured on the interval or ratio scales rather than
counted.
The probability distribution function of the normal curve is:
Where;
Y : The frequency or number of occurrences of the measurement X
 : Standard deviation of all X’s
 : Mean of X
METU, GGIT 538
Figure 2.4. A typical normal probability curve
The points of inflection on the curve, where it changes
from convex to concave, occur at 1σ either side of the
mean. (i.e. at σ - μ and σ + μ). The mean and standard
deviation are of crucial importance in determining the
position and spread of the normal curve.
METU, GGIT 538
Figure 2.5. Normal curves with identical σ
but different μ
METU, GGIT 538
Figure 2.6. Normal curves with
identical μ but different σ
The Standard Normal Curve: The form of the normal
curve differs widely in response to the characteristics of
the data set.
This difficulty may be overcome by
expressing the raw values as deviations from the mean
(called standardization). The units of expression are
proportions of the standard deviation and are termed z
values. In this way observations numerically greater
than the mean become positive z values and those less
than the mean become negative z values. The z and the
probability density function are expressed:
METU, GGIT 538
Important Properties of Normal Distribution
 The probability of any particular value occurring is
infinitesimally small and can be regarded as zero. One can
only speak of the probability of values occurring within a
given range, which is shown by the area under the curve.
A normal curve is symmetrical about the mean, half of
the area beneath the curve, which is 1, lies on each side of
the mean.  50 % of the values are less than the mean
and the rest are more than the mean. i.e. probability of a
value being less than the mean is equal to that being
greater than the mean and are both equal to 0.5.
68 % of the area under a normal curve lies within 1
standard deviation of the mean, 95 % of the area and 99.7
% of that lies within 2 and 3 standard deviations of the
mean, respectively.
METU, GGIT 538
Useful Websites for Review
http://mathworld.wolfram.com/PoissonDistribut
ion.html
http://mathworld.wolfram.com/GaussianDistrib
ution.html
http://www.stats.gla.ac.uk/steps/glossary/proba
bility_distributions.html#randvar
METU, GGIT 538
2.2. Expectation
Expected value of a random variable Y refers to its
“mean”. In some cases expected value of a function of Y
(g(Y)) could be of interest. As its name implies, it is
simply the average value that we expect Y or g(Y) to
take.
Expected
value of Y can also be defined as the
weighted sum of the possible values, where the weights
used for each value are the probability associated with
that value.
METU, GGIT 538
Expectation represents the mean of a probability
distribution
If probabilities of obtaining the amounts a1, a2, a3,….an
are p1 , p2 , p3 ,… pn Then the mathematical expectation
is
E = a1p1+a2p2+...+anpn
E is a weighted average – where weights are the
probability for the value
METU, GGIT 538
ì ¥
ï å yfY ( y) if Y is discrete
ï y = -¥
E(Y) = í
ï¥
ï ò yfY ( y)dy if Y is continuous
î- ¥
ì ¥
ï å g ( y)f Y ( y) if Y is discrete
ï y = -¥
E[g (Y)]= í
ï¥
ï ò g ( y)f Y ( y)dy if Y is continuous
î- ¥
METU, GGIT 538
 E.g. Find the mean of the probability distribution of
the number of heads obtained in three flips of a balanced
coin.
1,3,3, 1
8 8 8 8
Probabilities for 0,1,2,3 heads are
  0*
METU, GGIT 538
1
8
 1*
3
8
2*
3
8
3*
1
8
The measure of how much the random variables’ values
tend to vary around their mean is called variance.
VAR(Y) = E (Y-E(Y))2
S.D.(Y) =
E (Y - E (Y ))
2
Covariance: Expected tendency for values of X to be
similar to values of Y.
COV(Y) = E[(X - E(X))(Y - E(Y))]
COV(X, Y)
Cor.Coef . =
S.D.(X)S.D.(Y)
METU, GGIT 538
Independence: Two variables are said to be
independent if the probabilistic behavior of one
remains the same no matter what values the
other might take. In this case their joint
probability distribution is:
f XY ( x, y) = f X ( x )f Y ( y)
COV (X,Y) = 0
METU, GGIT 538
and
Cor.Coef. = 0
2.3. Maximum Likelihood Estimation
There are different methods for estimating the parameters of
stochastic phenomena:
 Point Estimation methods
 Methods of moments
 Method of maximum likelihood estimation
 Interval estimation methods
Unlike method of moments, the maximum likelihood method
provides a procedure for deriving the point estimator directly.
METU, GGIT 538
The aim of maximum likelihood estimation is to find
the parameter value(s) that makes the observed data
most likely.
If the probability of a event X dependent on model
parameters (θ ), θ is written
P(X|θ)
then we would talk about the likelihood
L (θ | X )
that is, the likelihood of the parameters given the data.
METU, GGIT 538
Consider a random variable with a probability
distribution function of f(x;θ), in which θ is the parameter,
such as the mean λ in the exponential distribution. On
the basis of sample values, x1,…, xn, one can inquire:
What is the most likely value of θ that produces the set of
observations x1,…, xn?
Or
Among the possible values of θ, what is the value that
will maximize the likelihood of obtaining the set of
observations x1,…, xn?
METU, GGIT 538
The likelihood of obtaining a particular sample value xi can
be assumed to be proportional to the value of the
probability distribution function (probability density
function) evaluated at xi. The likelihood of obtaining n
independent observations x1,…, xn is:
L( x1,..., x n ; q) = f ( x1; q)f ( x n ; q)...f ( x n ; q)
The maximum likelihood estimator is the value of θ that
maximizes the likelihood function L(x1,…, xn ; θ).
is
obtained as the solution of the following:
¶L( x1,..., x n ; q)
=0
¶q
METU, GGIT 538
 E.g. Say we toss a coin 100 times and observe 56 heads and 44
tails. Instead of assuming that p is 0.5, we want to find the MLE for
θ. Then we want to ask whether or not this value differs
significantly from 0.50.
How do we do this?
We find the value for θ that makes the observed data most likely.
The observed data are considered as fixed. They will be constants
that are plugged into the binomial
probability model :
n = 100 (total number of tosses)
h = 56 (total number of heads)
METU, GGIT 538
100!
56
44
Lp  0.5 \ data
0.5 0.5  0.0389
56!44!
100!
56
44
Lp  0.52 \ data
0.52 0.52  0.0581
56!44!
METU, GGIT 538
P
L
0.48
0.0222
0.50
0.0389
0.52
0.0581
0.54
0.0739
0.56
0.0801
0.58
0.0738
0.60
0.0576
0.62
0.0378
It is frequently more convenient to maximize the
logarithm of the likelihood function
¶ log L( x1,..., x n ; q)
=0
¶q
If the distribution function has two or more parameters:
n
L( x1,..., x n ; q1,..., qm ) = Õ f ( x i ; qi ,..., qm )
i =1
¶L( x1,..., x n ; q1,..., qm )
= 0 j = 1,..., m
¶q j
METU, GGIT 538
Problems of MLE Method
1. The distribution of the population should be known
2. A solution to the likelihood equation(s) may not exist.
3. Even if a solution exits, it only gives a stationary point
of the likelihood function. i.e. it is a local maximum
or a local minimum or a saddle point. A check on the
second derivatives is necessary to determine the
nature of the stationary point.
4. The likelihood equations may have multiple solutions.
In this case the individual solutions must be checked
to determine which one yields the global maximum.
It is possible to have more than one global
maximum.
METU, GGIT 538
2.4. Stationarity and Isotropy (terminology)
A spatial phenomenon is represented within a spatial
domain (R) and the location of each stochastic
phenomenon is expressed by s. The set of s within R
referred as a spatial stochastic process, {Y(s), s Є R}.
sR 
Any data location in R
 s1 
T
s  s1, s 2    
 s2 
Location vector of point s
Z(s) : s  R 
METU, GGIT 538
Spatial Stochastic Process
2.4. Stationarity and Isotropy
Modeling real life problems requires data and
assumptions on nature of the phenomena
Figure 2.7. A spatial stochastic process
METU, GGIT 538
Spatial stochastic processes often exhibit a degree of spatial
correlation and this correlation by somehow has to be
incorporated into the analysis.
In general the behavior of spatial phenomena is the result of a
mixture of two types of effects:
 First order
 Second order
First order effects: They relate to variation in the mean value of the
process in space (global or large-scale trend).
Second order effects: They are resultant from the spatial
correlation structure or the spatial dependence in the process. In
other words, this effect occurs due to the tendency for deviations in
values of the process from its mean to follow each other in
neighboring sites (local or small-scale effects).
METU, GGIT 538
Behavior of Spatial Phenomena
First Order effects
Variation of a mean value in space - global or large scale
trend
Second order effects
Correlation in the deviations of the process values from
the mean
METU, GGIT 538
E.g.
Suppose that iron particles onto a sheet of paper marked
with a fine grid are scattered. The numbers of particles
landing in different grid-squares represent a spatial
stochastic process. As long as the mechanism by which
we scatter iron particles is purely random, they should
lack in both 1st and 2nd order effects (Case I).
Figure 2.7. Random scatter of iron particles (Case I)
METU, GGIT 538
Suppose that a small number of weak magnets are placed
under the paper at different points and we scatter the iron
particles again. The result will be a process with spatial
pattern arising from first-order effects (clustering in
numbers in grid-squares will occur globally at and around
the sites of magnets (Case II).
METU, GGIT 538
Figure 2.8. Scatter of iron particles with magnets
underneath the paper (Case I)
Now remove the magnets and weakly magnetized the iron
particles instead and scatter them again. The result is a
process with spatial pattern arising from a second-order
effect (some degree of local clustering will occur due to
the tendency of particles to attract or repel each other)
(Case III).
If the magnets are now replaced under the paper and the
magnetized particles scattered again we end up with a
spatial pattern arising from both first-order and secondorder effects
METU, GGIT 538
Stationarity: A spatial process is stationary or
homogeneous if its statistical properties are independent
of absolute location in R. This implies that:
 E[Y(s)] and VAR[Y(s)] are constant over R and do not
depend on locations.
 COV[Y(si),Y(sj)] between values at any two sites si and
sj, depends only on the relative locations of these
sites, the distance and direction between them. But
not their absolute location in R.
 If mean, variance and covariance structure changes
over R, the process exhibits non-stationarity or
heterogeneity.
METU, GGIT 538
Stationarity
Spatial data usually represent a single realization of a
random process
 Some degree of stationarity must be assumed to
make inferences about the data
 Stationarity is a form of location invariance
(invariance in the mean and variance of the process).
 Stationarity is the quality of a process in which the
statistical parameters (mean and standard deviation)
of the process do not change with space or time.
METU, GGIT 538
Stationarity
Strict or Strong Stationarity
Requires equivalence of distribution functions under
translation and rotation - all higher-order moments are
constant including the variance and mean
Weak Stationarity
Requires a constant mean and covariance that is
independent of location. The covariance is only
dependent on distance and direction between points
METU, GGIT 538
Non-Stationary Mean
Decreasing from west to east
METU, GGIT 538
Stationarity E(Y(s))=μ (Constant) for all s Є R
Cov [Y(s1), Y(s2)] = Cov [Y(s3), Y(s4)]
Cov [Y(s5), Y(s6)] = Cov [Y(s9), Y(s10)]
Cov [Y(s1), Y(s2)] ≠ Cov [Y(s7), Y(s8)]
METU, GGIT 538
Isotropy: The spatial process is called isotropic if the
covariance depends only on the distance between si and
sj, not the direction in which they are separated.
E.g.
Weakly
magnetized
iron particles scattered
onto paper with no
magnets represents an
isotropic process
Figure 2.9. Stationary and isotropic spatial processes
METU, GGIT 538
Isotropic
Refers to a spatial process that evolves the same in all
directions
Anisotropic
A spatial process in which the correlation and
covariance differs with direction
Most methods assume spatial correlation as isotropic
METU, GGIT 538
Modeling Spatial Processes
Most methods assume spatial correlation is isotropic
METU, GGIT 538

Heterogeneity in the mean

Deviations from the mean are stationary