Transcript ppt

Advanced GIS
Topic 1
Starting Jan. 16, 2007
Outlines




About the class setting
Materials to be covered and scheduled
Quick review of GIS basics
First lab
Materials to be covered and
scheduled

Review (week 1,2)


Geodatabase lab 1
Spatial data analysis (3,4,5)


Vector data analysis (lab 2)
Raster data analysis



Geostatistic analysis (6,7,8)


Lab5,6
3-D analysis (10,11,12)


Basic (lab 3)
Watershed delineation (lab 4)
Lab7,8
Geoprocessing (13,14,15)

Lab9,10
What is GIS ?
• A computer system for
- collecting,
- storing,
- manipulating,
- analyzing,
- displaying, and
- querying
geographically related
information.
In general GIS cover 3
components

Computer system

Hardware




Computer, plotter, printer, digitizer
Software and appropriate procedures
Spatially referenced or geographic
data
People to carry out various
management and analysis tasks
Geographic Data

Geospatial data tells
you where it is and
attribute data tells
you what it is.
Metadata describes
both geospatial and
attribute data.
In GIS, we call geographic data as GIS data or spatial data
Traditional method

To represent the geographic data is
paper-based maps




Geology map
Topographic map
City street map (we still use it a lot)
...
Characteristics of spatial data

“mappable” characteristics:





Location (coordinate system, will be lectured
later)
Size is calculated by the amount (length,
area, perimeter) of the data
Shape is defined as shape (point, line, area)
of the feature
Discrete or continuous
Spatial relationships
Discrete and continuous

Discrete data are distinct features that
have definite boundaries and identities


A district, houses, towns, agricultural fields,
rivers, highways, …
Continuous data has no define borders
or distinctive values, instead, a
transition from one value to another

Temperature, precipitation, elevation, ...
GIS: a simplified view of the real
world

Discrete features



Points
Lines
Areas
Networks

A series of interconnecting
lines



Continuous features

Road network
River network
Sewage network
Surfaces


Elevation surface
Temperature surface
Problems caused by the simplified
features may still exist, but let’s live on it

Dynamic nature (not static)




Identification of discrete and continuous features



Forest grow
River channel change
City expand or decline
Road to be a line or a area?
Scale
Some may not fit to any type of features: fuzzy
boundaries

Transition area between woodland and grassland
Lets do not worry about these problems now!!! Just keep in mind
Points



A point is a 0 dimensional object and
has only the property of location (x,y)
Points can be used to Model features
such as a well, building, power, pole,
sample location ect.
Other name for a point are vertex,
node
Point
Lines




A line is a one-dimensional object
that has the property of length
Lines can be used to represent road,
streams, faults, dikes, maker beds,
boundary, contacts etc.
Lines are also called an edge, link,
chain, arc
In an ArcInfo coverage an arc starts
with a node, has zero or more
vertices, and ends with a node
Line
Areas (Polygons)



A polygon is a two-dimensional object
with properties of area and perimeter
A polygon can represent a city, geologic
formation, dike, lake, river, ect.
Other name for polygons face, zone
Area
Topology needed

A collection of numeric data which clearly describes
adjacency, containment (coincidence), and
connectivity between map features and which can
be stored and manipulated by a computer.

A set of rules on how objects relate to each other

Major difference in file formats

Higher level objects have special topology rules
How Topology Works





We previously discussed that lines represent linear
features, or borders for area features. We also said
that every line starts and ends with a node, and
has intermittent shape points called vertices to
define the shape of the line or border.
So when you think about it, lines don’t really
exist. They simply represent a relationship
between two nodes and zero or more vertices.
When two lines cross, and form an intersection,
they also have a node, since the intersection is the
start of one line and the end of the other line.
Topology describes the connectivity of the lines
and nodes. So for our example on the right, lines
A and B are connected by node b. So line A goes
from node a to node b. Line B goes from node b
to node c.
Now, we can create a whole string of lines and put
them together into an area too. Now, just like a
line, polygons don’t really exist. They simply
represent the relationship among lines, which in
turn represent the relationship among points.
Node
c
a
A
b
B
Line
Polygon
©Arthur J. Lembo
Cornell University
How Topology Works

Now we have described our location (with x,y
coordinates), and our connectivity. What if
we had two polygons P1 and P2, could we
define the adjacency? Yes, here is how:






1
a
Line 1 goes from node a to node b.
Line 2 goes from node a to node b.
Line 3 goes from node b to node a.
Polygon P1 is to the left of line 2, and to the
right of line 1.
Polygon P2 is to the right of line 2, and to the
right of line 3.
©Arthur J. Lembo
Cornell University
b
2
P2
3
Polygon Lines
P1
1,2
P2
2,3
So, we can create a table that “clearly
describes location, adjacency, connectivity
and containment, or more specifically, a
topology table.
Line FromNode
1
a
2
a
3
b
P1
ToNode
b
b
a
LeftPolygon RightPolygon
0
P1
P1
P2
0
P2
1
a
Traversing Topology
P1
b
2
P2
3

Without looking at the picture, you can answer these
questions from the table:

Where is node a.


What polygon is P1 next to, and where are they adjacent:


P1 is next to P2 because Line 2 has polygon P1 to the left
and P2 to the right. This is adjacency.
How do I traverse from node b, to node a, and then back
to node b:


No problem. It has an x,y coordinate
Easy! Take line 3 to node a, and you have a choice to take
either line 2 or 3 back to node b. This is connectivity.
Polygon Lines
P1
1,2
P2
2,3
What lines does polygon P1 fall inside of:

Easy! Polygon P1 is contained by lines 1 and 2. This is
containment
©Arthur J. Lembo
Cornell University
Line FromNode
1
a
2
a
3
b
ToNode
b
b
a
LeftPolygon RightPolygon
0
P1
P1
P2
0
P2
Topology
© Paul Bolstad, GIS Fundamentals
Two basic data models to
represent these features

Raster spatial data model



Define space as an array of equally sized cells arranged in rows and
columns. Each cell contains an attribute value and location
coordinates
Individual cells as building blocks for creating images of point, line,
area, network and surface
Continuous raster


Discrete raster


Numeric values range smoothly from one location to another, for
example, DEM, temperature, remote sensing images, etc.
Relative few possible values to repeat themselves in adjacent cells, for
example, land use, soil types, etc.
Vector spatial data model

Use x-, y- coordinates to represent point, line, area, network,
surface


Point as a single coordinate pair, line and polygon as ordered lists of
vertices, while attributes are associated with each features
Usually are discrete features
DIGITAL SPATIAL DATA
• RASTER
• VECTOR
• Real World
Source: Defense Mapping School
National Imagery and Mapping Agency
Raster and Vector Data Models
Real World
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10
G
B
G G
B
B
BG G
B G G
B G
BK
BG
B B
B
B
Raster Representation
600
Trees
500
400
Y-AXIS
300
Trees
House
200
River
100
100 200
300 400 500
X-AXIS
600
Vector Representation
Source: Defense Mapping School
National Imagery and Mapping Agency
Example: Discrete raster
Example: continuous raster
Xie et al. 2005
Raster
Real world
Vector
Heywood et al. 2006
Effects of changing resolution
Heywood et al. 2006
Vector – Advantages and
Disadvantages

Advantages





Good representation of reality
Compact data structure
Topology can be described in a network
Accurate graphics
Disadvantages



Complex data structures
Simulation may be difficult
Some spatial analysis is difficult or impossible to
perform
Raster – Advantages and
Disadvantages

Advantages






Simple data structure
Easy overlay
Various kinds of spatial analysis
Uniform size and shape
Cheaper technology
Disadvantages





Large amount of data
Less “pretty”
Projection transformation is difficult
Different scales between layers can be a nightmare
May lose information due to generalization
GIS data formats (files)

Vector data


Shapefiles
Coverages
TIN (e.g. elevation can be stored as
TIN)


Raster data

Triangulated Irregular Network
Grid (e.g. elevation can be stored as
Grid)
Image (e.g. elevation can be stored as
image, all remote sensing images)
Shape Files




Nontopological
Advantages no overhead to process
topology
Disadvantages polygons are double
digitized, no topologic data checking
At least 3 files .shp .shx .dbf
Coverages




Original ArcInfo Format
Directory With Several Files
Database Files are stored in the Info
Directory
Uses Arc Node Topology



Containment (coincident)
Connectivity
Adjacency
TIN


A triangulated irregular network (TIN) is a data
model that is used to represent three
dimensional objects. In this case, x,y, and z
values represent points. Using methods of
computational geometry, the points are
connected into what is called a triangulation,
forming a network of triangles. The lines of
the triangles are called edges, and the interior
area is called a face, or facet.
While the TIN model is somewhat more
complex than the simple point, line, and
polygon vector model, or the raster model, it is
actually quite useful for representing
elevations. For example a raster grid would
require grid cells to cover the entire surface of
a geographic area. Also, if we wanted to show
great detail we would have to have small grid
cells. Now, if the land area is relatively flat, we
would still need the small grid cells. However,
with a TIN we would not have to include so
many points on the flat areas, but could add
more points on the steep areas where we want
©Arthur J. Lembo
Cornell University
Components of a TIN





Nodes
Edges
Triangles
Hull
Topology
©Arthur J. Lembo
Cornell University
Grid Properties






Each Grid Cell holds one
value even if it is empty.
A cell can hold an index
standing for an attribute.
Cell resolution is given as its
size on the ground.
Point and Lines move to the
center of the cell.
Minimum line width is one
cell.
Rasters are easy to read
and write, and easy to draw
on the screen.
A new data model in ArcGIS

Geodatabase data model

Use a relational database that stores geographic
data



A type of database in which the data is organized across
several tables. Tables are associated with each other
through common fields. Data items can be recombined
from different files.
A container for storing spatial and attribute data
and the relationships that exist among them
And their associated attributes can be structured
to work together as an integrated system using
rules, relationships, and topological associations
Geodatabase componentsvector data and table


Primary (basic) components
- feature classes,
- feature datasets,
- nonspatial tables.
complex components
building on the basic
components:
- topology,
- relationship classes,
- geometric networks
Geodatabase componentsRaster data



Raster data referenced only in personal geodatabase
Raster data physically stored in multiuser geodatabse
Raster datasets and raster catalogs


A raster dataset is created from one or more individual rasters. When
creating a raster dataset from multiple rasters, the data is mosaicked,
or aggregated, into a single, seamless dataset in which areas of overlap
have been removed. The input rasters must be contiguous (adjacent)
and have the same properties, including the same coordinate system,
cell size, and data format. For each raster dataset (.img, grid, JPEG,
MrSID, TIFF), ArcGIS creates an ERDAS IMAGINE file (.img).
A raster catalog is defined as a table in the geodatabase which you can
view like any other table in ArcCatalog. Each raster in the catalog is
represented by a row in the table. It contains a collection of rasters
that can be noncontiguous, stored in different formats, and have other
different properties. In order to view all the rasters in the catalog, they
must have the same coordinate system and a common geographic
extent
Attribute data

Attribute data is about “what” of a
spatial data and is a list or table of data
arranged as rows and columns

Rows are records (map features)



Each row represents a map feature, which has
a unique label ID or object ID
Columns are fields (characteristics)
Intersection of a column and a row shows
the values of attributes, such as color,
ownership, magnitude, classification,…
•Data types of attribute data: character, integer, floating, date
•Each field must be defined with a data type, data width, number of
decimal places
•The width refers to the number of space reserved for a field
examples
A database needed




If many fields related to one record (feature-ID), for example,
the a soil unit can have over 80 estimated physical and
chemical properties, more tables are needed to store all the
attributes.
A database management system (DBMS) is needed to manage
multiple tables.
A database is a collection of interrelated tables in digital format.
There are four types:
 Flat file, hierarchical database, network database, relational
database
In GIS, we usually use relational database
Flat file
Network
Hierarchical
Relational
PIN: Parcel ID number
Zoning (zonecode): 1-residential, 2-commercial
Chang, 2004
Relational database

A relational database is a collection of tables, also called
relations, which can be connected to each other by keys.


A primary key represents one or more attributes whose values
can uniquely identify a record in a table. Its counterpart in
another table for the purpose of linkage is called a foreign key
Advantages


Each table in the database can be prepared, maintained, and
edited separately from other tables
Efficient data management and processing, since linking tables
query and/or analysis is often temporary
Three tables linked by keys
Students:
Registration:
Student#
Advisor
Student#
Class#
1022
Jones
1022
101-07
4123
Smith
1022
143-01
1022
159-02
4123
211-01
Faculty:
Name
Room
4123
211-02
Jones
412
4123
214-01
Smith
216
Four tables linked by keys
Chang, 2004
Relationship of those separate
tables
One record in one table
related to one record in
another table
One record in one table
related to many records in
another table
Many records in one table
related to one record in
another table
Many records in one table
related to many records in
another table
Join and relate tables


Once tables are separated as
relational tables, then two operations
can be used to link those tables
during query and analysis
 Join, brings together two tables
based on a common key.
 Relate, connects two tables
(based on keys) but keeps the
tables separate.
Keys do not have to have the same
name but must be of the same data
type
Join
relate
Join
relate
One-to-One Join
Employee-id
Job
Employee-id
name
1
Digislave
1
Tom
2
Useless Supervisor
2
John
Join Employee-id to Employee-id
Employee-id
Job
Name
1
Digislave
Tom
2
Useless Supervisor
John
After join
Many-to-One Join
Polygon Id
Symbol
Symbol
Description
1
Qa
Qa
Quaternary Alluvium
2
Qa
Qe
Quaternary Eolian
3
Pa
Pa
Permian Abo
4
Qe
Polygon ID
Symbol
Description
1
Qa
Quaternary Alluvium
2
Qa
Quaternary Alluvium
3
Pa
Permian Abo
4
Qe
Quaternary Eolian
After Join on Symbol
One-to-Many Relates
Formation
Symbol
Quaternary Alluvium
Qa
Permian Abo
Pa
Symbol
Mineral
Qa
Quartz
Pa
Quartz
Qa
Gypsum
Pa
Feldspar
If the tables are related on Symbol, selecting
Polygon-id 1 will select the highlighted areas.
Many-to-Many Relates
Symbol
Mineral
Qa
Quartz
Formation
Symbol
1
Qa
Pa
Quartz
2
Qa
Qa
Gypsum
Pa
Feldspar
If the tables are related on Symbol, selecting
Polygon-id 1 will select the highlighted areas.
In ArcGIS GIS



Those separate tables will have one and only one table called
spatial table (or layer attribute table), which has spatial
location and relationship with the spatial data. Other tables
called nonspatial tables, which can be either join or relate to
the spatial table.
Join tables when each record in the spatial table has no more
than one matching record in the nonspatial table
 One to one relation
 Many to one relation
Relate tables when each record in the spatial table has more
than one record in the nonspatial table
 One to many relation
 Many to many relation
The joined table
The joined table will only preserved within the map
document-the tables remain separate on disk-and can be
removed at any time
Related tables
The related table will only preserved within the map document-the
tables remain separate on disk-and can be removed at any time
Geodatabase


Before geodatabase, in one GIS project, many
GIS files (spatial data and nonspatial data) are
stored separated. So for a large GIS project, the
GIS files could be hundreds.
Within a geodatabase, all GIS files (spatial data
and nonspatial data) in a project can be stored
in one geodatabase, using the relational
database management system (RDMS)
Types of geodatabases


personal
enterprise
Personal Geodatabase

The personal geodatabase
is given a name of
filename.mdb that is
browsable and editable by
the ArcGIS, and it can also
be opened with Microsoft
Access. It can be read by
multiple people at the same
time, but edited by only
one person at a time.
maximum size is 2 GB.
Multiuser Geodatabase


Multiuser (ArcSDE or enterprise) geodatabase
are stored in IBM DB2, Informix, Oracle, or
Microsoft SQL Server.
It can be edited through ArcSDE by many users
at the same time, is suitable for large
workgroups and enterprise GIS
implementations. no limit of size. support raster
data.
3-tier ArcSDE client/server architecture with both
the ArcSDE and Oracle RDBMS running on the
same server, which minimizes network traffic
and client load while increasing the server load
compared to 2-tier system, in which the clients
directly connect to the RDBMS
Personal and Multiuser
Geodatabase Comparison
source: www.esri.com
What is metadata



Meta is defined as a change or transformation. Data is
described as the factual information used as a basis for
reasoning. Put these two definitions together and
metadata would literally mean "factual information used as
a basis for reasoning which describes a change or
transformation."
In GIS, Metadata is data about the data. It consists of
information that describes spatial data and is used to
provide documentation for data products. Metadata is the
who, what, when, where, why, and how about every
facet of the spatial data.
According to the Federal Geographic Data Committee
(FGDC), metadata is data about the content, quality,
condition, and other characteristics of data.
Why use and create metadata



To help organize and maintain an organization's spatial
data
- Employees may come and go but metadata can
catalogue the changes and updates made to each spatial
data set and how each employee implemented them
To provide information to other organizations and
clearinghouses to facilitate data sharing and transfer
- It makes sense to share existing data sets rather
than producing new ones if they are already available
To document the history of a spatial data set
- Metadata documents what changes have been made
to each data set, such as changes in geographic projection,
adding or deleting attributes, editing line intersections, or
changing file formats. All of these could have an effect on
data quality.
Metadata Should Include Data
about













Date of data collected.
Date of coverage generated.
Bounding coordinates.
Processing steps.

Software used

RMSE, etc.
From where original data came.
Who did processing.
Projection
coordinate System
Datum
Units
Spatial scale
Attribute definitions
Who to contact for more information
See an example of non-standard metadata (see)
Federal Geographic Data Committee’s
(FGDC) Content Standard for Digital
Geospatial Metadata (CSDGM)


The FGDC is developing the National Spatial Data
Infrastructure (NSDI) in cooperation with organizations
from State, local and tribal governments, the academic
community, and the private sector. The NSDI
encompasses policies, standards, and procedures for
organizations to cooperatively produce and
share geographic data.
The objectives of the CSDGM are to provide a common
set of terminology and definitions for the documentation
of digital geospatial data.
CSDGM (FGDC-STD-001-1998)

Metadata =







Identification_Information
Data_Quality_Information
Spatial_Data_Organization_Information
Spatial_Reference_Information
Entity_and_Attribute_Information
Distribution_Information
Metadata_Reference_Information
Connect to http://www.fgdc.gov/metadata/csdgm/
Metadata tools

Metadata editors:
-

tkme / USGS
ArcCatalog / ESRI
SMMS / Intergraph
FGDCMETA / Illinois State Geological Survey
xtme / USGS
Metadata utilities (check compliance and export to text,
HTML,XML, or SGML):
- mp / USGS
- MP batch / Intergraph
- ArcCatalog powered by mp/ ESRI

mp: Metadata Parser
Metadata Server
- Isite / FGDC
- GeoConnect Geodata Management Server / Intergraph
- ArcIMS Metadata Server / ESRI
FGDC Clearinghouse


the FGDC developed a clearinghouse
that allows geospatial data creators to
share their data
however, the FGDC Clearinghouse is not
a data repository. The data contained
within the clearinghouse is actually
stored on computer servers maintained
by individual contributors. This allows
contributors to manage their own data.
Two Components




The FGDC Clearinghouse consists of 6
gateways and 250 nodes
A gateway is a point of entry into the
FGDC Clearinghouse
A clearinghouse node is a database
that contains metadata records.
Individual contributors maintain nodes
Besides the FGDC Clearinghouse,
there are a variety of other
communities that use FGDC-compliant
metadata as the basis of their data
sharing services. These so-called
clearinghouse communities are often
developed because the participating
organizations have data of similar or
complementary types.
http://clearinghouse1.fgdc.gov/
First lab

Creating, editing, and managing
geodatabase for ArcGIS 9
30 minutes
30 minutes
25 minutes
25 minutes
45 minutes
15 minutes
COPY the result map of your last step to your home work
Copy your
exam questions
and result to
your homework