FUNDAMENTALS OF GEOGRAPHIC INFORMATION SYSTEMS
Download
Report
Transcript FUNDAMENTALS OF GEOGRAPHIC INFORMATION SYSTEMS
Environmental GIS
Nicholas A. Procopio, Ph.D, GISP
[email protected]
Data Types
In GIS, there are three
main types of data
• Spatial
• Attribute
• Metadata
Zygo, Lisa, Baylor University, Lecture Notes, 2002
Data Sources
Data Types
• Primary – Measurements collected through firsthand observations
• Secondary – Measurements collected through a
secondary source (i.e., neighborhood surveys)
Metadata
Data documentation
• Data about the data
• Explains the form, content, accuracy, precision,
usability, creator, purpose, etc.
• Metadata standards exist
• Metadata is a part of geospatial data
Metadata
Metadata information includes
• Identification – title, area, dates, owners, organizations, etc.
• Data quality – attribute accuracy and spatial precision,
consistency, sources of info, and methods of data production
• Spatial data organization – raster-vector format and organization
of features in the data set, data model
• Spatial reference – map projections, datums, and coordinate
system
Metadata
Metadata is created to…
• Protect investment in data
Staff turnover, memory loss
Makes it easier to reuse and update data
Provides documentation of sources, quality
Easier to share data
• Helping the user understand the data
Provides consistent terminology
Focuses on key elements
Helps user determine fitness for use
Facilitates data transfer, interpretation by new users
Federal Geographic Data Committee
Under Executive Order No. 12906, all federal
agencies and organizations must document
their geospatial data using the FDGC Content
Standard for Digital Geospatial Metadata
http://www.fgdc.gov/
Federal Geographic Data Committee
Compliance with this executive order will…
• Minimize duplication of data
• Foster cooperative digital data collection activities
• Establish a national framework of quality data
Metadata
Use ArcCatalog to create and edit metadata
Database Models
Database – a collection of non-redundant data, which
can be shared by different application systems
Geographic database – database linked to geographic
data for a particular area and subject.
Code
AF
AL
AG
AQ
AN
AO
AV
AY
AC
AR
AM
Country
Population Area (sq km) Area (sq mi)
Afghanistan
17250390 641869.19
247825.7
Albania
3416945
28754.5
11102.11
Algeria
27459230
2320972 896127.31
American Samoa
53000
186.895
72.16
Andorra
55335
452.485
174.704
Angola
11527260
1252421 483559.81
Anguilla
9208
86.296
33.319
Antarctica
-99999
12302740
4750088
Antigua and Barbuda
65212
462.378
178.524
Argentina
33796870
2781013
1073749
Armenia
3377228 29872.461
11533.76
Attribute Data
The “where” of GIS is determined by the spatial data
The “what” is determined by the attribute data
The attribute data is just as important as the spatial
data
Databases
Attribute data are stored in database tables
Databases
Advantages of a DBMS include
• Reduced redundancy of data duplication
• Various data access methods are possible (queries)
• Data is stored independently of the application for
which they will be used
• Access to data is controlled and data is centralized
• Ease of updating and maintaining data
Creating a database
Consider the following…
•
•
•
•
•
Storage media
How will the database change over time?
What security is needed?
Should the database be distributed or centralized?
How should database creation be scheduled?
Codd’s Principles for Databases
Only one value per cell
All values in a column are about the same subject
Each row is unique
No significance to the sequence of columns
No significance to the sequence of rows
Keep your table simple!
Attribute Types
Qualitative
• No measurement or magnitude
• Non-numeric descriptions
• No numeric meaning, even if shown as code numbers (i.e.,
1=category 1)
Attribute Types
Quantitative
• Numeric and have mathematical meaning
• Serve as measurements or magnitudes of the features they
refer
• Example: city population
Types of Databases
Relational
• Presents data organized in a series of twodimensional tables, each containing records for
one entity
Relational Database
Flexible approach to linkages between records comes
closest to modeling the complexity of spatial relationships
between objects
Links attributes contained in separate files with a key
attribute
The key attribute is usually a non-redundant, unique
identification number for each record
The most popular DBMS model for GIS
Data
Most data is input into a database by keycoding
Other data may be obtained through government
sources
• USGS
• US Census
• NOAA
• State Agencies
Data may also be obtained from other projects
Methods of Spatial Data Entry
Manual “heads-up” digitizing
Scanners
• Appropriate for encoding raster data since this is
the output format for most scanners.
• Problems may include
Scanning unwanted information
Optical distortion
The higher the resolution, the more volume of data
produced
Methods of Spatial Data Entry
Electronic Data Transfers
• Downloading data from the internet
• Downloading data from a GPS unit
• Consider when obtaining electronic data
What data is available
Cost
Media
Format
Sources of Electronic Data
United States Geological Survey (USGS)
• Digital Line Graphs (DLG)
• Digital Elevation Models (DEM)
• Digital Orthophoto Quads (DOQ)
United States Census Bureau (USCB)
• Topologically Integrated Geographic Encoding Reference System
(TIGER)
First comprehensive GIS database at street level for entire U.S.
National Oceanographic and Atmospheric Agency (NOAA)
• Satellite and radar images
• Bathymetry maps
Other Sources of Spatial Data
Field Data
• Global Positioning System (GPS)
Locating position from receiving a signal from orbiting
satellites
• Manual Input
• Remote Sensing
Utilizing satellite images to develop a base view of area
of interest
Spatial Data Models
Spatial Databases
Real world is infinitely complex
Database size is limited
Data model converts real world into elements
that can be stored in a database
Toward Realism: Layers
A GIS breaks down reality into different layers (themes)
A layer can be composed of identical entities such the
locational information for trees, manholes, buildings, etc.
Layers can be overlapped to show the spatial relationship
between various entities
Layers can also represent different times
Spatial Databases
There are two primary models for spatial data in a
GIS
• Raster
a data structure or model based on grid cells
• Vector
a data structure composed of nodes, vertices, and arcs or
connected points
Raster Data Models
Individual cells are used as the building blocks for creating
point, line, and polygon entities
Size of the cell very important because it will reflect how
entities are displayed (i.e., more specific shape with greater
number of cells).
Cell represents some attribute or a reference ID to a table of
attributes
Raster Data Model
Raster data are ideal for continuous data such as air
temperatures, water pH, etc.
What happens when two categories occupy the same
cell?
Raster Spatial Databases
Single objects displayed by
shading individual cells
Linear features displayed by
shading a sinuous series of
connected cells
Polygon features displayed by
shading a group of connected
cells
Relief can be shown by
assigning a certain value to
each selected cell
Raster Data Models
Cells may be homogenous (each cells contains the same
feature) or heterogeneous (one cell contains varying features)
Heterogeneity may be resolved by
•
•
•
•
•
Simply looking for the presence or absence of features
Looking at the cell center to determine placement of index code
Dominant area analysis
Transition cells
Percentages
Spatial Databases
Advantages of Raster Format
• Simple data structure
• Compatible with remotely sensed or scanned data
• Simple spatial analysis procedures
Spatial Databases
Disadvantages of Raster Format
• Requires large storage space
• Graphical output may be less pleasing (depending
on resolution)
• Projection transformations more difficult
• Difficult to maintain topology
Vector Spatial Databases
Vector data models arose in the early 1960’s in
relation to the development of the hierarchical
attribute data structure
The first generation were simply lines with an
arbitrary start and ending point
Files would typically consist of a few long lines and
many short lines
Often referred to as cartographic spaghetti
Spatial Databases
Vector Data Model
• Uses two-dimensional
Cartesian coordinates to
store the shape of a spatial
entity.
• The point is the basic
building block from which
all other spatial entities are
constructed.
• Lines and areas are
constructed by connecting a
series of points
Vector Data Models
Uses two-dimensional Cartesian coordinates to store the shape
of a spatial entity.
The point is the basic building block from which all other
spatial entities are constructed.
Lines and areas are constructed by connecting a series of
points (nodes and vertices)
Vector Spatial Databases
Advantages
• Requires less storage space
• Topology easily maintained
• Graphical output usually more pleasing
Vector Spatial Databases
Disadvantages
•
•
•
•
More complex data structure
Not compatible with remotely sensed data
Spatial analysis operations more difficult
Selecting appropriate number of points to display feature
Too few points would compromise shape or spatial properties (area,
perimeter, etc.)
Too many points means possible data duplication and increase
costs in terms of data storage
Advancing Toward Topology
The arc/node model developed as a “hierarchy” for spatial data
Based on the principle that each type of structure consists of
features built upon simpler features
• Coordinates make up points
• Connected points make lines
• Connected lines make polygons
Allows the user to differentiate between points, line, and
polygons, but requires maintenance of links between features
Topologic Models
This new model allowed for drawing a line only once
For example:
• If two polygons shared a side, that shared side would have to be traced
when both polygons were drawn
• This would allow for the possibility of gaps or slivers between the
individual lines (topological error)
• The new system avoided the error because the one arc “told” which
polygon was to the left and which polygon was on the right
Topological Terms
Nodes
• Where a line begins,
ends, or where two lines
intersect
Arcs
Vertices
• Where a line bends
Nodes
Arcs
• Line segment between
two nodes
Vertices
3
1
2
4
1
5
A
11
9
7
2
10
8
Files of arcs by polygons
A: 1, 2, Area, Attributes
6
1xy
2xy
3xy
4xy
5xy
6xy
7xy
8xy
9xy
10 x y
11 x y
1 1,2,3,4,5,6,7
2 1,7,8,9,10,11
Arcs File
Points File
Topology
Example
Topology Example
Topology not attained!
Sliver
Topology is attained!
Summary of Data Models
Real World
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
Raster Windmills
0 = No Data
1 = Windmill
Vector Windmills
Summary of Data Models
Raster
• Every location given an object
Vector
• Every object is given a location
Data Conversion
Data can be transformed from one of these data
models to the other
You always loose some information when going
from one data format to the other
Vectorization
•
Rasterization
•
Rasterization
Loose topological features
Positional accuracy decreases
Vector Format
Raster Format
Zygo, Lisa, Baylor University, Lecture Notes, 2002
Vectorization
Features look “jagged” or “pixelated” in the vector
representation
Topology is created
Raster Format
Vector Format
Zygo, Lisa, Baylor University, Lecture Notes, 2002
Vector Representations of Surfaces
A vector surface is modeled by creating a series of
irregularly placed points as vertices
Each of the vertices has an explicit topographic value
Any 3 points are connected to represent an area of the
same topography (triangle)
Triangulated irregular network (TIN) a vector data
model that uses Delaunay triangulation as a means of
explicitly storing surface information
The topology of a TIN
TINs
Contain separate files for arcs triangles
Became a popular way to show elevation, etc.
for visualization and engineering
Allowed for contouring, 3-D views, water flow
directions, etc.
Many CAD GIS systems use TINs
Aronoff, 1993. Geographic Information Systems: A Management Perspective.
Ottawa : WDL Publications.
Digital Elevation Model
Examples of applications that
use the TIN data model
(A)Landslide risk map for Pisa, Italy
(Courtesy of Earth Science Department, University of
Siena, Italy)
(B) Yangtse River, China
(Courtesy of Human Settlements Research
Center, Tsinghua University, China)