#### Transcript CHAPTER III

CHAPTER III GENERAL CONCEPTS IN SPATIAL DATA ANALYSIS METU, GGIT 711 OUTLINE (Last Week) Review of Basic Statistical Concepts 2.1.Random Variables and Probability Distributions 2.1.1.The Binomial Distribution 2.1.2.The Poisson Distribution 2.1.3.The Normal Distribution 2.2. Expectation 2.3. Maximum Likelihood Estimation 2.4. Stationarity and Isotropy 2.5. Introductory Spatial Statistics 2.5.1. Points METU, GGIT 711 OUTLINE GENERAL CONCEPTS IN SPATIAL DATA ANALYSIS 3.1. Introduction 3.2. Visualizing Spatial Data 3.3. Exploring Spatial Data 3.3.1. Distinction between visualizing and exploring spatial data 3.3.2. Distinction between exploring and modeling spatial data 3.4. Modeling Spatial Data 3.5. Practical Problems of Spatial Data Analysis 3.6. Computers and Spatial Data Analysis 3.6.1. Methods of coupling GIS and spatial data analysis METU, GGIT 711 3.1. Introduction Spatial data analysis involves: Accurate description of data relating to a process in space. Exploration of patterns and relationships in data Search for explanations of such patterns and relationships These relate to: Visualizing spatial data Exploring spatial data Modeling spatial data METU, GGIT 711 3.2. Visualizing Spatial Data An essential requirement in any data analysis is the ability to be able to “see” the data being analyzed. Plots of data and other graphical displays of various descriptions are fundamental tools for: Seeking patterns Generating hypotheses Assessing the fit of proposed models Determining the validity of predictions derived from models METU, GGIT 711 Maps are the tools for visualizing the spatial data. Hence GIS can provide an environment to create maps for spatial data and to explore spatial patterns and relationships quickly and easily. Cartographic considerations are important in using maps in spatial data analyses. Because bad choices of map type or scaling used for data values can lead to Misleading conclusions drawn from the display Suggest inappropriate models for the process under study METU, GGIT 711 3.3. Exploring Spatial Data Exploratory methods for spatial data may be in the form of: Maps or Conventional plots E.g. Some exploratory techniques when applied to point events result in contour map of the estimated intensity of occurrences of events over the whole study area; others, applied to the same set of events result in a graph to throw light on the degree of spatial dependence between event locations. METU, GGIT 711 Exploring spatial data: Provides good descriptions of the data Help to develop hypothesis Help to establish appropriate models If many exploratory spatial techniques result in different forms of maps, then how do they really differ from visualization techniques? METU, GGIT 711 3.3.1. Distinction between visualizing and exploring spatial data Dividing line between visualization of spatial data and exploratory data analysis is somewhat artificial. The distinction is made based on the degree of data manipulation. E.g. Suppose that we have cause-specific death rates which are age-standardized in a number of administrative zone. METU, GGIT 711 Visualizing spatial data involves: A map of death rates Simple transformation of the rates (No data manipulation) Exploring spatial data involves: Map of spatial moving average of the rates in for smoothing out local variations in order to see clearly global trends (the moving averages are computed in which each rate is replaced by the average of itself and those neighboring districts) (Data manipulation) METU, GGIT 711 3.3.2. Distinction between exploring and modeling spatial data Exploratory methods do not involve any explicit model for the data. However several exploratory techniques involve informal comparison of some summary data. Hence models do enter into exploratory techniques. The distinction is based on the degree to what extent any comparison made between the model. Moreover models depend on certain assumptions. METU, GGIT 711 E.g. Stan Openshaw (a quantitative geographer) tried to detect clusters in point distributions of incidence of childhood leukemia. For this purpose he used a technique which exhaustively compares the observed intensity of events in circles of varying radius centered on a fine grid imposed over the study area. By this way the aim was to detect if cases were random in the circles. The circles with significant discrepancies are identified and retained for later display and investigation. This technique involves a model for searching a random pattern and performs repeated formal statistical comparisons with this model. However, the validity of such comparison does not depend on the assumption of any specific alternative model. The technique is detecting clusters not searching for an explanation for the process by which such clusters occur. Therefore, this form of analysis makes few a priori assumptions about the data and is fully in line with explanatory methods METU, GGIT 711 3.4. Modeling Spatial Data Models are mathematical abstraction of reality and not reality itself. A statistical model involve using a combination of both: Data Reasonable assumptions About the nature of phenomena being modeled. The assumptions are arise from: Background theoretical knowledge about the behavior of the phenomena The results of previous analysis on the same or similar phenomenon Judgement and intuition of the modeler. METU, GGIT 711 A statistical model for a stochastic process consists of specifying a probability distribution for the random variable/variables that present the phenomena. Once a probability distribution is fully specified there is effectively nothing further that can be said about the behavior of the process. A fitted model is evaluated and results may lead to modification of assumptions or using different model or updating the existing one. E.g. Consider modeling levels of ozone in a large rural area. The ozone level at each location s in R will vary during the day and from day to day. A model can be fitted to explain the distribution of ozone level based on a linear regression. METU, GGIT 711 Figure 3.1. Ozone levels METU, GGIT 711 Basic Assumptions: 1. Random variables { Y(s), s ÎR } are independent 2. The probability distribution of random variable Y(s) only differ in their mean value 3. The mean value is a simple linear function of location. 4. Y(s) has normal distribution about this mean with the same constant variance, σ2. METU, GGIT 711 The model: Where; s1 and s2 are spatial coordinates of s The assumptions provide a framework under which final model specifications reduce to a problem of estimation of unknown parameters. βi can be estimated based on Maximum Likelihood Estimation method. METU, GGIT 711 The next step is to test the reliability of the model or goodness of the fit. This can be achieved by using hypothesis-testing methods. Testing hypothesis, which involves comparison of the fit of a hypothesized model with that of an alternative, is in fact one facet of statistical modeling. At this step: Does a model in which certain parameters have prespecified values fit the data significantly well? METU, GGIT 711 Figure 3.2. Analysis of spatial data METU, GGIT 711 3.5. Practical Problems of Spatial Data Analysis There are basically four types of problem that an analyst can face: 1. Problem of geographical scale 2. Lack of spatial indexing 3. Problem of edge of boundary effects 4. Problem of modifiable areal unit METU, GGIT 711 Problem 1: Geographical scale at which analyses are performed. Spatial data analysis is concerned with detecting and modeling spatial pattern. However, pattern at one geographical scale may be simply random variations in another pattern at a different scale. METU, GGIT 711 E.g. Local variations in disease rates may die out against the national scale. The scale to which spatial analysis relates depends on: Phenomena under study Objective of the analysis Scale at which data collected METU, GGIT 711 Problem 2: Lack of spatial indexing or ordering in space. An indexing implies that we have a natural notion of what is next or previous. On a regular grid there is reasonably a natural ordering of locations. However, spatial data are not indexed most of the time. While some data (those from satellites) come in the form of regular grid or lattice, much spatial data are provided for a patchwork quilt of areal units or irregularly distributed set of sites. E.g. We can only speak of neighborhood of a zone for areal units that share a common boundary. METU, GGIT 711 Problem 3: Problem of edge or boundary effect. In the middle of a study area, a site or zone may likely to be surrounded by others; i.e. zone may have neighbors. However, at the edge of the map or study region, the neighbors extend in one direction only. In spatial domain there is potentially much greater set of observations around the edge of the map. Therefore edge effects play critical role. This problem can be overcome by leaving a guard area. METU, GGIT 711 Problem 4: Problem of modifiable areal unit. When data are measurements on a set of zones, often they are aggregated measurements such as households or individuals living in a zone. For the sake of confidentially, the data are realized for arbitrary areal units. The important point is to note that any result from the analysis of these area aggregations is usually conditional on the set of zones. Depending on different aggregated areas the result is subject to change. METU, GGIT 711 3.6. Computers and Spatial Data Analysis Q: Given that some spatial analysis capabilities are available in widely used systems, is there a need for spatial analysis functions beyond those currently provided by GIS? A: At present yes! E.g. A GIS will currently be able to overlay a set of points (childhood cancer) onto a set of polygons (buffer zones constructed along high voltage power lines). The GIS will then be able to count how many points lie within particular polygons by performing a “point-in-polygon” operation. METU, GGIT 711 However, it is hard to find a system, which evaluates significantly the nature of the association between the set of points and the set of polygons. If we want to know whether there is statistically significant association between the incidence of childhood cancer and proximity to high voltage power lines we can not do this readily by using GIS. There are several ways for the use of computers in spatial data analysis. Most of the time spatial analysis techniques are coupled with GIS. METU, GGIT 711 3.6.1.Methods of coupling GIS and spatial data analysis There are 4 different methods to use spatial analysis techniques with GIS: Full integration Loose coupling Close coupling Special combinations METU, GGIT 711 Full integration: Every method for exploratory spatial analysis and modeling are available within a GIS. Loose coupling: Data are exported from GIS for use within a spatial statistical framework, (i.e. having GIS and separate spatial analysis software talk to each other) Close coupling: Spatial analysis routines are called from within GIS, (which requires use of macro language capabilities of GIS). Special combinations: A self-contained spatial analysis system for a specific purpose is developed (Case I). OR Spatial analysis and GIS functions are added to a standard statistical package (Case II). METU, GGIT 711 E.g. (Case I) GAM (Geographical Analysis Machine developed for detecting existence of clusters), SpaceStat (developed for exploratory data analysis and model fitting in spatial econometrics) E.g. (Case II) REGARD (by John Haslett from Trinity College, Dublin) Operates on Mac. S-Plus (used by professional statisticians) Operates on IBM SPLANCS (it can be coupled with Arc/Info, i.e. close coupling) Operates on Unix XLISP-STAT Operates on Unix METU, GGIT 711