Transcript Document

Lecture 2:
Data Exploration
Jianfei Chen
School of Geographical Sciences
GuangZhou University
GunagZhou, 510405 China
Email: [email protected]
Chapter Outline
9.1 Introduction
9.2 Data Exploration
9.2.1 Descriptive Statistics
9.2.2 Graphs
9.2.3 Dynamic Graphs
9.2.4 Data Exploration and GIS
9.3 Vector Data Query
9.3.1 Attribute Data Query
Box 9.1 Query Operations in ArcGIS
9.3.1.1 Logical Expressions
9.3.1.2 Type of Operation
9.3.1.3 Examples of Query Operation
9.3.1.4 Relational Database Query
9.3.1.5 Use SQL to Query a Database
Box 9.2 More Examples of SQL Statement
9.3.2 Spatial Data Query
9.3.2.1 Feature Selection by Cursor
9.3.2.2 Feature Selection by Graphic
9.3.2.3 Feature Selection by Spatial Relationship
Box 9.3 Expressions of Spatial Relationship in ArcView
9.3.2.4 Combination of Attribute and Spatial Data
Queries
9.4 Raster Data Query
9.4.1 Query by Cell Value
9.4.2 Query Using Graphic Method
9.5 Charts
9.6 Geographic Visualization
9.6.1 Data Classification
9.6.1.1 Data Classification for Visualization
Box 9.4 Data Classification Methods
9.6.1.2 Data Classification for Creating New Data
9.6.2 Data Aggregation
9.6.3 Map Comparison
Applications: Data Exploration
Task 1: Select Feature by Location
Task 2: Select Feature by Graphic
Task 3: Query Attribute Data from a Joint Table
Task 4: Query Attribute Data from a Relational Database
Task 5: Combine Spatial and Attribute Data Query
Task 6: Query Raster Data
What is Data Exploration?
Data exploration is data-centered query and analysis. It allows
the user to examine the general trends in the data, to take a
close look at data subsets, and to focus on possible
relationships between datasets.
The purpose of data exploration is to better understand the data
and to provide a starting point in formulating research
questions and hypotheses.
Data Exploration and GIS
1.
Data exploration in GIS is functionally similar to exploratory
data analysis and dynamic graphics in statistics.
2.
Exploratory data analysis advocates the use of a variety of
techniques for examining data more effectively as the first step
in statistical analysis and as a precursor to more formal and
structured data analysis. Dynamic graphics enhances
exploratory data analysis by using multiple and dynamically
linked windows and by letting the user directly manipulate data
points in charts and diagrams.
3.
Data exploration in GIS uses interactive and dynamically
linked visual tools. Maps (both vector- and raster-based),
graphs, and tables are displayed in multiple windows and
dynamically linked.
Graphics for Statistics
Line Graph
Bubbleplot
Bar Chart
Cumulative
Frequency Graph
Boxplot
Scatter Plot
Graphs for Spatial Data
3D plot
Variogram cloud
Dynamic Graphs: Brushing
Vector Data Query
1. Attribute data query
2. Spatial data query
Attribute data query
1.
Logical expressions
2.
Type of operation
3.
Relational database query
4.
SQL
Logical Expressions
1. A simple logical expression contains two operands
and a logical operator e.g., “class” = 2
2. Boolean connectors of AND, OR, XOR, and NOT
connect two or more expressions in a query
statement.
The shaded portion represents the
complement of data subset A (top),
the union of data subsets A and B
(middle), and the intersection of A
and B (bottom).
Three types of operation may be performed on the subset
of 40 records: add more records to the subset (+2),
remove records from the subset (-5), or select a smaller
subset (20).
Soil Theme Table
musym
Comp.dbf
musym
muid
muid
plantsym
Forest.dbf
Plantnm.dbf
plantsym
comname
The keys relating three dBASE files in MUIR and the feature
attribute table. The field comname in plantnm.dbf contains the
common plant names.
Relational
Database Query
PIN
Sale
date
Acres
Zone
code
Zoning
PIN
Owner
P101
1-1098
1.0
1
resident
ial
P101
Wang
P102
10-668
3.0
2
commer
cial
P101
Chang
P103
3-797
2.5
2
commer
cial
P102
Smith
7-3078
1.0
P102
Jones
P103
Costello
P104
Smith
P104
1
resident
ial
Relation 1: Parcel
The key PIN relates the parcel and
owner tables and allows use of SQL
with both tables.
Relation 2: Owner
SQL
SQL (Structured Query Language) is a standard
query language designed for relational
databases.
The basic syntax of SQL, with the keywords in bold
type, is
select <attribute list>
from <table>
where <condition>
The select keyword selects field(s) from a database,
the from keyword selects table(s) from a database, and
the where keyword specifies the condition or criteria
for data query.
Simple SQL
select Sale_date
from Parcel
where PIN = ‘P101’
More SQL
select Parcel.Sale_date
from Parcel, Owner
where Parcel.PIN = Owner.PIN AND Owner_name = ‘Costello’
where Parcel.PIN = Owner.PIN AND Owner_name like ‘C%’
where Parcel.PIN = Owner.PIN AND Owner_name in (‘Wang’,
‘Smith’, ’Jones’)
Spatial Data Query
1.
Feature selection by graphics
2.
Feature selection by spatial relationship
3.
Combination of attribute and spatial data
queries
A circle with a specified radius is drawn around
Sun Valley. The circle is then used as a graphic
object to select point features within the
circular area.
Feature Selection by Spatial Relationship
1.
Containment—selects features that fall completely within
features used for selection. Examples include finding schools
within a selected county, and finding state parks within a
selected state.
2.
Intersect—selects features that intersect features used for
selection. Examples include selecting land parcels that
intersect a proposed road, and finding settlements that
intersect an active fault line.
3.
Proximity/Adjacency—selects features that are within a
specified distance/no distance of features used for selection.
Examples of spatial adjacency include selecting land parcels
that are adjacent to a flood zone, and finding vacant lots that
are adjacent to a new theme park.
Combination of Attribute and Spatial Data Queries
Find gas stations that are within one mile of a
freeway exit in southern California and have
annual revenue of $2 million:
1.
Locate all freeway exits in the study area, and draw a circle around
each exit with a 1-mile radius. Select gas stations within the circles
through spatial data query. Then use attribute data query to find gas
stations that have annual revenues exceeding $2 million.
2.
Locate all gas stations in the study area, and select those stations
with annual revenues exceeding $2 million through attribute data
query. Next, use spatial data query to narrow the selection of gas
stations to those within 1 mile of a freeway exit.
Spatial data query
Attribute data query
Geographic Visualization
Geographic visualization, sometimes called
cartographic visualization, refers to the use of
maps for setting up a context for processing
visual information and for formulating
research questions or hypotheses. Geographic
visualization therefore has the same objective
and the same types of interactivity as
exploratory data analysis.
Methods for Geographic Visualization
1. Data classification
2. Spatial aggregation
3. Map comparison
The top map shows rate of
unemployment in 1997 as either
above or below the national
average of 4.9%. The bottom
map uses the mean and standard
deviation (SD) for data
classification.
The top map shows percent
population change by state,
1990–2000. The darker the
symbol, the higher the percent
increase. The bottom map shows
percent population change by
region.
An example of using multiple
maps in data exploration. In
this view of deer relocations
in SE Alaska, the focus is on
the distribution of deer
relocations along the
clearcut/old forest edge.
A bivariate map showing the combinations of (1)
rate of unemployment in 1997, either > or <=
the national average, and (2) rate of income
change 1996–98, either > or <= the national
average.
Thank You!