Transcript ppt

Intro to advanced GIS and
a review of basic GIS
Topic 1
Jan. 14, 2008
Outlines




About the class setting
Materials to be covered and scheduled
Quick review of GIS basics
First lab (Lab 1)
Materials to be covered and
scheduled


A review of basic GIS (1)
Spatial data analysis







Vector data analysis (2,3)
Raster data analysis (4,5)
Spatial statistics (6)
Geostatistic analysis (7,8)
3-D analysis (9)
Geoprocessing (10,11)
Other topics (12)
We do not use one single
book, because there is
no single book covering
all the materials I will
cover in the class.
1. I will assign many ESRIebook for you to read
2. Many papers for you to
read.
3. I will give quiz
occasionally to see if you
read them or not. One
quiz will be 2 points
toward your final grade.
4. Other policies refer to the
syllabus
What is GIS ?
• A computer system for
- collecting,
- storing,
- manipulating,
- analyzing,
- displaying, and
- querying
geographically related
information.
In general GIS cover 3
components

Computer system

Hardware




Computer, plotter, printer, digitizer
Software and appropriate procedures
Spatially referenced or geographic
data
People to carry out various
management and analysis tasks
Geographic Data

Geospatial data tells
you where it is and
attribute data tells
you what it is.
Metadata describes
both geospatial and
attribute data.
In GIS, we call geographic data as GIS data or spatial data
1. Geospatial data
Traditional method

To represent the geographic data is
paper-based maps




Geology map
Topographic map
City street map (we still use it a lot)
...
Characteristics of spatial data

“mappable” characteristics:





Location (coordinate system, will be lectured
later)
Size is calculated by the amount (length,
area, perimeter) of the data
Shape is defined as shape (point, line, area)
of the feature
Discrete or continuous
Spatial relationships
Discrete and continuous

Discrete data are distinct features that
have definite boundaries and identities


A district, houses, towns, agricultural fields,
rivers, highways, …
Continuous data has no define borders
or distinctive values, instead, a
transition from one value to another

Temperature, precipitation, elevation, ...
GIS: a simplified view of the real
world

Discrete features



Points
Lines
Areas
Networks

A series of interconnecting
lines



Continuous features

Road network
River network
Sewage network
Surfaces


Elevation surface
Temperature surface
Problems caused by the simplified
features may still exist, but let’s live on it

Dynamic nature (not static)




Identification of discrete and continuous features



Forest grow
River channel change
City expand or decline
Road to be a line or a area?
Scale
Some may not fit to any type of features: fuzzy
boundaries

Transition area between woodland and grassland
Lets do not worry about these problems now!!! Just keep in mind
Points



A point is a 0 dimensional object and
has only the property of location (x,y)
Points can be used to Model features
such as a well, building, power, pole,
sample location ect.
Other name for a point are vertex,
node
Point
Lines




A line is a one-dimensional object
that has the property of length
Lines can be used to represent road,
streams, faults, dikes, maker beds,
boundary, contacts etc.
Lines are also called an edge, link,
chain, arc
In an ArcInfo coverage an arc starts
with a node, has zero or more
vertices, and ends with a node
Line
Areas (Polygons)



A polygon is a two-dimensional object
with properties of area and perimeter
A polygon can represent a city, geologic
formation, dike, lake, river, ect.
Other name for polygons face, zone
Area
Topology needed

A collection of numeric data which clearly describes
adjacency, containment (coincidence), and
connectivity between map features and which can
be stored and manipulated by a computer.

A set of rules on how objects relate to each other

Major difference in file formats

Higher level objects have special topology rules
Topology
© Paul Bolstad, GIS Fundamentals
Two basic data models to
represent these features

Raster spatial data model



Define space as an array of equally sized cells arranged in rows and
columns. Each cell contains an attribute value and location
coordinates
Individual cells as building blocks for creating images of point, line,
area, network and surface
Continuous raster


Discrete raster


Numeric values range smoothly from one location to another, for
example, DEM, temperature, remote sensing images, etc.
Relative few possible values to repeat themselves in adjacent cells, for
example, land use, soil types, etc.
Vector spatial data model

Use x-, y- coordinates to represent point, line, area, network,
surface


Point as a single coordinate pair, line and polygon as ordered lists of
vertices, while attributes are associated with each features
Usually are discrete features
DIGITAL SPATIAL DATA
• RASTER
• VECTOR
• Real World
Source: Defense Mapping School
National Imagery and Mapping Agency
Raster and Vector Data Models
Real World
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10
G
B
G G
B
B
BG G
B G G
B G
BK
B
G
B B
B
B
Raster Representation
600
Trees
500
400
Y-AXIS
300
Trees
House
200
River
100
100 200
300 400 500
X-AXIS
600
Vector Representation
Source: Defense Mapping School
National Imagery and Mapping Agency
Example: Discrete raster
Example: continuous raster
Xie et al. 2005
Raster
Real world
Vector
Heywood et al. 2006
Effects of changing resolution
Heywood et al. 2006
Vector – Advantages and
Disadvantages

Advantages





Good representation of reality
Compact data structure
Topology can be described in a network
Accurate graphics
Disadvantages



Complex data structures
Simulation may be difficult
Some spatial analysis is difficult or impossible to
perform
Raster – Advantages and
Disadvantages

Advantages






Simple data structure
Easy overlay
Various kinds of spatial analysis
Uniform size and shape
Cheaper technology
Disadvantages





Large amount of data
Less “pretty”
Projection transformation is difficult
Different scales between layers can be a nightmare
May lose information due to generalization
GIS data formats (files)

Vector data


Shapefiles
Coverages
TIN (e.g. elevation can be stored as
TIN)


Raster data

Triangulated Irregular Network
Grid (e.g. elevation can be stored as
Grid)
Image (e.g. elevation can be stored as
image, all remote sensing images)
Shape Files




Nontopological
Advantages no overhead to process
topology
Disadvantages polygons are double
digitized, no topologic data checking
At least 3 files .shp .shx .dbf
Coverages




Original ArcInfo Format
Directory With Several Files
Database Files are stored in the Info
Directory
Uses Arc Node Topology



Containment (coincident)
Connectivity
Adjacency
TIN


A triangulated irregular network (TIN) is a data
model that is used to represent three
dimensional objects. In this case, x,y, and z
values represent points. Using methods of
computational geometry, the points are
connected into what is called a triangulation,
forming a network of triangles. The lines of
the triangles are called edges, and the interior
area is called a face, or facet.
While the TIN model is somewhat more
complex than the simple point, line, and
polygon vector model, or the raster model, it is
actually quite useful for representing
elevations. For example a raster grid would
require grid cells to cover the entire surface of
a geographic area. Also, if we wanted to show
great detail we would have to have small grid
cells. Now, if the land area is relatively flat, we
would still need the small grid cells. However,
with a TIN we would not have to include so
many points on the flat areas, but could add
more points on the steep areas where we want
©Arthur J. Lembo
Cornell University
Components of a TIN





Nodes
Edges
Triangles
Hull
Topology
©Arthur J. Lembo
Cornell University
Grid Properties






Each Grid Cell holds one
value even if it is empty.
A cell can hold an index
standing for an attribute.
Cell resolution is given as its
size on the ground.
Point and Lines move to the
center of the cell.
Minimum line width is one
cell.
Rasters are easy to read
and write, and easy to draw
on the screen.
A new data model in ArcGIS

Geodatabase data model

Use a relational database that stores geographic
data



A type of database in which the data is organized across
several tables. Tables are associated with each other
through common fields. Data items can be recombined
from different files.
A container for storing spatial and attribute data
and the relationships that exist among them
And their associated attributes can be structured
to work together as an integrated system using
rules, relationships, and topological associations
Geodatabase componentsvector data and table


Primary (basic) components
- feature classes,
- feature datasets,
- nonspatial tables.
complex components
building on the basic
components:
- topology,
- relationship classes,
- geometric networks
Geodatabase componentsRaster data



Raster data referenced only in personal geodatabase
Raster data physically stored in multiuser geodatabse
Raster datasets and raster catalogs


A raster dataset is created from one or more individual rasters. When
creating a raster dataset from multiple rasters, the data is mosaicked,
or aggregated, into a single, seamless dataset in which areas of overlap
have been removed. The input rasters must be contiguous (adjacent)
and have the same properties, including the same coordinate system,
cell size, and data format. For each raster dataset (.img, grid, JPEG,
MrSID, TIFF), ArcGIS creates an ERDAS IMAGINE file (.img).
A raster catalog is defined as a table in the geodatabase which you can
view like any other table in ArcCatalog. Each raster in the catalog is
represented by a row in the table. It contains a collection of rasters
that can be noncontiguous, stored in different formats, and have other
different properties. In order to view all the rasters in the catalog, they
must have the same coordinate system and a common geographic
extent
2. Attribute data

Attribute data is about “what” of a
spatial data and is a list or table of data
arranged as rows and columns

Rows are records (map features)



Each row represents a map feature, which has
a unique label ID or object ID
Columns are fields (characteristics)
Intersection of a column and a row shows
the values of attributes, such as color,
ownership, magnitude, classification,…
examples
A database needed




If many fields related to one record (feature-ID), for example,
the a soil unit can have over 80 estimated physical and
chemical properties, more tables are needed to store all the
attributes.
A database management system (DBMS) is needed to manage
multiple tables.
A database is a collection of interrelated tables in digital format.
There are four types:
 Flat file, hierarchical database, network database, relational
database
In GIS, we usually use relational database
Flat file
Network
Hierarchical
Relational
PIN: Parcel ID number
Zoning (zonecode): 1-residential, 2-commercial
Chang, 2004
Relational database

A relational database is a collection of tables, also called
relations, which can be connected to each other by keys.


A primary key represents one or more attributes whose values
can uniquely identify a record in a table. Its counterpart in
another table for the purpose of linkage is called a foreign key
Advantages


Each table in the database can be prepared, maintained, and
edited separately from other tables
Efficient data management and processing, since linking tables
query and/or analysis is often temporary
Four tables linked by keys
Chang, 2004
Relationship of those separate
tables
One record in one table
related to one record in
another table
One record in one table
related to many records in
another table
Many records in one table
related to one record in
another table
Many records in one table
related to many records in
another table
Join and relate tables


Once tables are separated as
relational tables, then two operations
can be used to link those tables
during query and analysis
 Join, brings together two tables
based on a common key.
 Relate, connects two tables
(based on keys) but keeps the
tables separate.
Keys do not have to have the same
name but must be of the same data
type
Join
relate
Join
relate
One-to-One Join
Employee-id
Job
Employee-id
name
1
Digislave
1
Tom
2
Useless Supervisor
2
John
Join Employee-id to Employee-id
Employee-id
Job
Name
1
Digislave
Tom
2
Useless Supervisor
John
After join
Many-to-One Join
Polygon Id
Symbol
Symbol
Description
1
Qa
Qa
Quaternary Alluvium
2
Qa
Qe
Quaternary Eolian
3
Pa
Pa
Permian Abo
4
Qe
Polygon ID
Symbol
Description
1
Qa
Quaternary Alluvium
2
Qa
Quaternary Alluvium
3
Pa
Permian Abo
4
Qe
Quaternary Eolian
After Join on Symbol
One-to-Many Relates
Formation
Symbol
Quaternary Alluvium
Qa
Permian Abo
Pa
Symbol
Mineral
Qa
Quartz
Pa
Quartz
Qa
Gypsum
Pa
Feldspar
If the tables are related on Symbol, selecting
Polygon-id 1 will select the highlighted areas.
Many-to-Many Relates
Symbol
Mineral
Qa
Quartz
Formation
Symbol
1
Qa
Pa
Quartz
2
Qa
Qa
Gypsum
Pa
Feldspar
If the tables are related on Symbol, selecting
Polygon-id 1 will select the highlighted areas.
Tables In ArcGIS GIS



Those separate tables will have one and only one table called
spatial table (or layer attribute table), which has spatial
location and relationship with the spatial data. Other tables
called nonspatial tables, which can be either join or relate to
the spatial table.
Join tables when each record in the spatial table has no more
than one matching record in the nonspatial table
 One to one relation
 Many to one relation
Relate tables when each record in the spatial table has more
than one record in the nonspatial table
 One to many relation
 Many to many relation
The joined table
The joined table will only preserved within the map
document-the tables remain separate on disk-and can be
removed at any time
Related tables
The related table will only preserved within the map document-the
tables remain separate on disk-and can be removed at any time
3. metadata



Meta is defined as a change or transformation. Data is
described as the factual information used as a basis for
reasoning. Put these two definitions together and
metadata would literally mean "factual information used as
a basis for reasoning which describes a change or
transformation."
In GIS, Metadata is data about the data. It consists of
information that describes spatial data and is used to
provide documentation for data products. Metadata is the
who, what, when, where, why, and how about every
facet of the spatial data.
According to the Federal Geographic Data Committee
(FGDC), metadata is data about the content, quality,
condition, and other characteristics of data.
Why use and create metadata



To help organize and maintain an organization's spatial
data
- Employees may come and go but metadata can
catalogue the changes and updates made to each spatial
data set and how each employee implemented them
To provide information to other organizations and
clearinghouses to facilitate data sharing and transfer
- It makes sense to share existing data sets rather
than producing new ones if they are already available
To document the history of a spatial data set
- Metadata documents what changes have been made
to each data set, such as changes in geographic projection,
adding or deleting attributes, editing line intersections, or
changing file formats. All of these could have an effect on
data quality.
Metadata Should Include Data
about













Date of data collected.
Date of coverage generated.
Bounding coordinates.
Processing steps.

Software used

RMSE, etc.
From where original data came.
Who did processing.
Projection
coordinate System
Datum
Units
Spatial scale
Attribute definitions
Who to contact for more information
See an example of non-standard metadata (see)
Federal Geographic Data Committee’s
(FGDC) Content Standard for Digital
Geospatial Metadata (CSDGM)


The FGDC is developing the National Spatial Data
Infrastructure (NSDI) in cooperation with organizations
from State, local and tribal governments, the academic
community, and the private sector. The NSDI
encompasses policies, standards, and procedures for
organizations to cooperatively produce and
share geographic data.
The objectives of the CSDGM are to provide a common
set of terminology and definitions for the documentation
of digital geospatial data.
CSDGM (FGDC-STD-001-1998)

Metadata =







Identification_Information
Data_Quality_Information
Spatial_Data_Organization_Information
Spatial_Reference_Information
Entity_and_Attribute_Information
Distribution_Information
Metadata_Reference_Information
Connect to http://www.fgdc.gov/metadata/csdgm/
Metadata tools

Metadata editors:
-

tkme / USGS
ArcCatalog / ESRI
SMMS / Intergraph
FGDCMETA / Illinois State Geological Survey
xtme / USGS
Metadata utilities (check compliance and export to text,
HTML,XML, or SGML):
- mp / USGS
- MP batch / Intergraph
- ArcCatalog powered by mp/ ESRI

mp: Metadata Parser
Metadata Server
- Isite / FGDC
- GeoConnect Geodata Management Server / Intergraph
- ArcIMS Metadata Server / ESRI
FGDC Clearinghouse


the FGDC developed a clearinghouse
that allows geospatial data creators to
share their data
however, the FGDC Clearinghouse is not
a data repository. The data contained
within the clearinghouse is actually
stored on computer servers maintained
by individual contributors. This allows
contributors to manage their own data.
Two Components




The FGDC Clearinghouse consists of 6
gateways and 250 nodes
A gateway is a point of entry into the
FGDC Clearinghouse
A clearinghouse node is a database
that contains metadata records.
Individual contributors maintain nodes
Besides the FGDC Clearinghouse,
there are a variety of other
communities that use FGDC-compliant
metadata as the basis of their data
sharing services. These so-called
clearinghouse communities are often
developed because the participating
organizations have data of similar or
complementary types.
http://clearinghouse1.fgdc.gov/
4. Geodatabase


Before geodatabase, in one GIS project, many
GIS files (spatial data and nonspatial data) are
stored separated. So for a large GIS project, the
GIS files could be hundreds.
Within a geodatabase, all GIS files (spatial data
and nonspatial data) in a project can be stored
in one geodatabase, using the relational
database management system (RDMS)
Types of geodatabases


personal
enterprise
Personal Geodatabase

The personal geodatabase
is given a name of
filename.mdb that is
browsable and editable by
the ArcGIS, and it can also
be opened with Microsoft
Access. It can be read by
multiple people at the same
time, but edited by only
one person at a time.
maximum size is 2 GB.
Multiuser Geodatabase


Multiuser (ArcSDE or enterprise) geodatabase
are stored in IBM DB2, Informix, Oracle, or
Microsoft SQL Server.
It can be edited through ArcSDE by many users
at the same time, is suitable for large
workgroups and enterprise GIS
implementations. no limit of size. support raster
data.
3-tier ArcSDE client/server architecture with both
the ArcSDE and Oracle RDBMS running on the
same server, which minimizes network traffic
and client load while increasing the server load
compared to 2-tier system, in which the clients
directly connect to the RDBMS
Personal and Multiuser
Geodatabase Comparison
source: www.esri.com
5. Lab 1

Creating, editing, and managing
geodatabase for ArcGIS 9
30 minutes
30 minutes
25 minutes
25 minutes
45 minutes
15 minutes
COPY the result map of your last step to your home work
Copy your
exam questions
and result to
your homework