DATA MODELS IN GIS
Download
Report
Transcript DATA MODELS IN GIS
DATA MODELS IN GIS
OUTLINE:
Overview of models
Data and levels of measurements
Raster and vector models
Conversion between models
Databases
DIGITAL INFORMATION
GIS requires that both data and maps be
represented as numbers
GIS places data into the computer’s memory in a
physical data structure (i.e. files and directories).
files can be written in binary or as ASCII text.
binary is faster to read and smaller, ASCII can be
read by humans and edited but uses more space.
sent through a “pipe” consisting of 0s and 1s
stored on devices that can store only 0s and 1s
processed as 0s and 1s
DATA
locational and attribute data in a GIS
attribute type: discrete vs continuous
discrete: presumed to occur at distinct locations
with empty locations having a value of zero for the
attribute in question
continuous: feature occurs throughout
geographical region; no locations are empty
DATA
Levels of Measurement:
four levels are commonly recognized – nominal,
ordinal, interval and ratio
each subsequent level includes all characteristics of
preceding levels
data available at higher levels can be reduced to
lower levels; opposite is not true
LEVEL OF MEASUREMENTS
Nominal Scale
objects are classed into groups; groups possess
arbitrary labels (numbers/names)
i.e. religion, land use/cover
discrete variable
LEVEL OF MEASUREMENTS
Ordinal Scale
categorization plus an ordering/ranking of data
i.e. country road, street, highway
can identify larger/smaller but can not comment on
degree between variables
K=5, L=3, M=1 equivalent to K=500, L=300,
M=10
discrete variables
LEVEL OF MEASUREMENTS
Interval Scale
measurements arranged in rank and distance
between measurements is known
no “true” zero point
i.e. elevation/topographic lines,
temperature in oC
discrete or continuous
LEVEL OF MEASUREMENTS
Ratio Scale
like interval scaling: both rank
and separation are known,
but there is also a known,
fixed starting point
i.e. temperature on Kelvin
scale; speed
continuous and discrete
DATA MODELS – REPRESENTING
DATA
1. Reality – total phenomena as they actually exist
2. Conceptual Data Model – describes and defines included
entities (how they will be represented)
3. Logical Data Model – logical organization of the database
elements
4. Physical Data Model or File Structure – how information
will be structured for access
DATA MODELS
logical data model is how data are organized for use
by the GIS.
GISs have traditionally used either raster or vector for
maps.
raster – based on pixels
vector – based on points, lines and polygons
while most GIS systems can handle raster and
vector, only one is used for the internal organization
of spatial data.
DATA MODELS
rasters and vectors can be flat files … if they are
simple
Raster-based line
Vector-based line
Flat File
0000000000000000
0001100000100000
1010100001010000
1100100001010000
0000100010001000
0000100010000100
0001000100000010
0010000100000001
0111001000000001
0000111000000000
0000000000000000
Flat File
4753456
4753436
4753462
4753432
4753405
4753401
4753462
4753398
623412
623424
623478
623482
623429
623508
623555
623634
RASTER DATA MODELS
basic unit is cells or pixels which are uniformly
spaced
each cell/pixel has spatial and spectral information.
i.e. digital elevation data and digital images
spatially exhaustive sampling of the area of interest
every cell has a value, even if it is “missing.”
cell has a resolution, given as the cell size in ground
units.
higher resolution, smaller cell dimensions
RASTER DATA MODELS
Grid extent
Rows
Grid
cell
Resolution
Columns
Generic structure for a grid.
RASTER DATA MODELS
RASTER DATA MODELS
Fining of Resolution
RASTER DATA MODELS
CREATING RASTER DATA MODELS
creating raster is like laying a grid over a map
code each cell with a value representing attribute
every cell has a value, even if null or zero
(integers, ratios, etc.)
values for each cell are written into a file
spreadsheet, data base, word processor
imported into GIS so it can be reformatted
each pixel presumably has one value – in reality is
this correct? mixed pixel issue
RASTER AND MISSING DATA
GIS data layer as a grid with a large section of “missing data,” in this
case, the zeros in the ocean off of New York and New Jersey.
MIXED PIXEL ISSUE
Water dominates
Winner takes all
Edges separate
W W
G
W G
G
W E
G
W W
G
W W
G
W E
G
W W
G
W G
G
E
G
E
MIXED PIXEL ISSUE
“Largest share”
Water
Land
“Central point”
“Presence/Absence”
35%
70%
80%
100%
“Percent occurrence”
CREATING RASTER DATA MODELS
raster data visualized as map layers
map layer: data describing a single characteristic
for a location
multiple items of information require multiple
layers
creates problems – raster databases can become
enormous
each map layer has thousands of cells
RASTER DATA MODELS
Advantages
simple data structures
each cell can be owned by only one feature.
overlay and combination of maps and remote sensed
images easy
simulation easy, because cells have the same size
and shape
technology is cheap
RASTER DATA MODELS
Advantages
some spatial analysis methods simple to perform
local: cell by cell calculations
focal: models cell value based on neighbours
zonal: models cell value based on geographical
areas
global: models cell value based on all cells
RASTER DATA MODELS
Disadvantages
volumes of graphic data
use of large cells to reduce data volumes
poor at representing points, lines and areas; good at
surfaces
must often include redundant or missing data
network linkages are difficult to establish
projection transformations are time consuming
COMPRESSION TECHNIQUES
raster compression techniques used in GIS are runlength encoding and quad trees
Run-length Encoding – more efficient
values often occur in runs across several cells
form of spatial autocorrelation
e.g. array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 would be
entered as 3 0 2 1 2 0 3 1 2 0 3 1
RUN-LENGTH CODING
Row-by-row coding:
CCCCCBBDCCCCBBDCCCBBBDDCBBA
ADDDDBAADDBBBAADDDAAAADDDA
AAA
Run-length coding:
5C 2B 1D 4C 2B 1D 3C 3B 2D 1C 2B 2A 4D
1B 2A 2D 3B 2A 3D 4A 3D 4A
A. Mixed Conifer
B. Douglas Fir
56 entries for 7x8 array, or
C. Oak Savannah
22 pairs (44 entries) for 7x8 array
D. Grassland
COMPRESSION TECHNIQUES
Quadtree Compression
hierarchical data model using a variable-sized grid cell
finer subdivisions are used in areas requiring finer
detail (higher resolution)
pixel in each higher layer is derived from average or
majority of 4 pixels from the lower layer
not as efficient for more variable or complex data
used primarily as a way to store data for rapid retrieval
on display devices
QUAD TREE STRUCTURE
RASTER DATA FORMAT
most raster formats are digital image formats.
most GISs accept TIF, GIF, JPEG or encapsulated
PostScript, which are not georeferenced.
DEMs are true raster data formats.
RASTER DATA FORMAT
VECTOR DATA MODELS
think of world as a space populated by discrete
features of various shapes and kinds – points, lines,
areas.
any location in space may be empty or occupied by
one or more point, line or area.
VECTOR DATA MODELS
point
zero-dimensional abstraction of an object represented by a
single X,Y co-ordinate.
normally represents a geographic feature too small to be
displayed as a line or area
stored by their real (earth) coordinates
VECTOR DATA MODELS
line
set of ordered co-ordinates that represent the shape of
geographic features too narrow to be displayed as an area at
the given scale or linear features with no area
lines and areas are built from sequences of points in order.
lines have a direction to the ordering of the points.
VECTOR DATA MODELS
polygon
feature used to represent areas.
defined by the lines that make up its boundary and a point
inside its boundary for identification.
have attributes that describe the geographic feature they
represent.
VECTOR DATA MODELS
vector data evolved the arc/node model in the 1960s.
an area consist of lines and a line consists of
points.
points, lines, and areas can each be stored in their own
files, with links between them.
endpoint of a line (arc) is called a node; arc junctions
are only at nodes.
stored with the arc is the topology (i.e. the connecting
arcs and left and right polygons).
TOPOLOGY
topological data structures dominate GIS software.
stored explicitly
allows automated error detection and elimination.
rarely are maps topologically clean when digitized or
imported.
GIS has to be able to build topology from unconnected
arcs.
13
11
2
12
10
7
POLYGON “A” 5
4
9
1
2
6
3
8
1
1xy
2xy
3xy
4xy
5xy
6xy
7xy
8xy
9xy
10 x y
11 x y
12 x y
13 x y
Points File
TOPOLOGY
File of Arcs by Polygon
A: 1,2, Area, Attributes
1 1,2,3,4,5,6,7
2 1,8,9,10,11,12,13,7
Arcs File
Arc/Node Map Data Structure with Files.
TOPOLOGY
relationship between nodes, arcs and polygons.
topologically structured database for ease of retrieval
and implementation of spatial-relational operations.
advantages:
simple, elegant and efficient
relational database construction and analysis
complete topology makes map overlay feasible.
topology allows many GIS operations to be done
without accessing the point files.
VECTOR DATABASE CREATION
database creation involves several stages:
input of the spatial data
input of the attribute data
linking spatial and attribute data
spatial data is entered via digitized points and lines,
scanned and vectorized lines or directly from other
digital sources
once the spatial data has been entered, much work is
still needed before it can be used
VECTOR DATABASE CREATION
Building Topology
once points are entered and geometric lines are
created, topology must be "built"
this involves calculating and encoding relationships
between the points, lines and areas
this information may be automatically coded into tables
of information in the database
VECTOR DATABASE CREATION
Editing
during topology generation process, problems such as
overshoots, undershoots and spikes are either flagged
for editing by the user or corrected automatically
automatic editing involves the use of a tolerance value
which defines the width of a buffer zone around objects
within which adjacent objects should be joined
VECTOR DATA MODELS
Advantages
good representation of structures (points, lines,
polygons)
compact and more efficient
topology can be completely described
accurate graphics
retrieval, updating and generalization of graphics and
attributes possible
work well with pen and light-plotting devices and tablet
digitizers.
VECTOR DATA MODELS
Disadvantages
complex data structures
combination of several vector polygon maps or polygon
and raster maps through overlay creates difficulties
simulation is difficult
display and plotting can be expensive
technology is expensive
not good at continuous coverage or plotters that fill
areas.
TIN must be used to represent volumes.
VECTOR DATA FORMATS
vector formats are either page definition languages or
preserve ground coordinates.
page languages are HPGL, PostScript, and Autocad
DXF.
true vector GIS data formats include ArcView
Shapefiles and ArcGIS Interchange Files (E00) which
has topology.
VECTOR DATA MODELS
List of coordinates “spaghetti”
simple
easy to manage
no topology
lots of duplication, hence need for large storage space
very often used in CAC (computer assisted cartography)
VECTOR DATA MODELS
Vertex Dictionary
no duplication, but still this model does not use topology
VECTOR DATA MODELS
Dual Independent Map Encoding (DIME)
developed by US Bureau of the
Census
nodes (intersections of lines) are
identified with codes
assigns a directional code in the
form of a "from node" and a "to
node"
both street addresses and UTM
coordinates are explicitly defined
for each link
VECTOR TO RASTER EXCHANGE
data exchange by translation (export and import) can lead to
significant errors in attributes and in geometry.
efficient data exchange is important for the future of GIS.
VECTOR TO RASTER EXCHANGE
ADVANCED DATA MODELS - TIN
triangulated irregular network is a set of elevation
points which have been connected to form a network
of triangles.
developed in early 1970s as a simple way to build a
surface
the sample points are connected by lines to form
triangles; within each triangle the surface is usually
represented by a plane
triangles fit together in a manner which simulates the
face of the land.
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
irregularly spaced sample points can be adapted to
the terrain
rough terrain - more points
smooth terrain - less points
an irregularly spaced sample is more efficient
ADVANCED DATA MODELS - TIN
TINs can be seen as
polygons having attributes of
slope, aspect and area,
three vertices having
elevation attributes
TIN model work best in
areas with sharp breaks in
slope
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
Advantages
ability to describe the surface at different level of
resolution
efficiency in storing data
allows simple calculation of basin areas, slopes,
channels, and many other geometric parameters
Disadvantages
in many cases require visual inspection and manual
control of the network
DATABASES
a spatial database is a collection of spatially
referenced data that acts as a model of reality
these selected phenomena are deemed important
enough to represent in digital form
the digital representation might be for some past,
present or future time period
DIGITIAL DATABASES
scaleless- data can be stored at the level of detail
found in the environment
cartographer is responsible for choosing the content
and resolution
scale critical factor:
level of resolution set by field instruments
digitizing - resolution of instrument and
abstraction and production factors
DIGITIAL DATABASES
problems when using data sets of different resolutions
i.e. roads may not line up
resolved using ancillary source materials
additional problems when using data sets of different
themes
i.e. combing elevation and drainage data – water
running uphill or non-level lakes
DIGITIAL DATABASES
Value of databases:
Cost of creation – cheaper to get data from an
existing database
Appropriateness of use
Lack of alternative data sources
Graphic output
METADATA
“data about the data”
could include data elements that: identify the
data, identify the custodians and access conditions
to the data, describe projection, content, quality of
data
describes the action taken when handling
databases of varying scale
Dataset information
Title
Ortofotos'95
Abstract
Ortofotos'95 is a collection of ortho-rectified aerial photographs. These aerial
photographs cover Portugal and were obtained in August 1995 in false color
infra red film at scale 1:40 000. CNIG, The Directorate General of Forests
and The Paper Mill industry are the owners of the aerial photographs (in
paper format).
Type of dataset
Airborne data>Aerial photos
Locations
Portugal
Temporal Range
1995-
Dataset scales
1:25 000-1:50 000
Dataset resolution
1 - 3 meters
Dataset quality
remarks
Aquisition of data: aerial photographs, the film is scanned at very high
resolution and ortho-rectified using DTM derived from topographic
cartography at scale 1:25 000
Information
creation date
1999-10-29
DATABASES
pre-1970s, command line based with read and write
to hard disk, tapes, diskettes
database approach – all reading and writing through
simple interface (no need to care about tapes, etc.)
small GIS projects sufficient to store geographic
information as simple files.
with large data volumes and number of data users
best to use a database management system (DBMS)
relational design has been the most useful (since
1980s)
DATABASE MANAGEMENT
SYSTEMS
contain tables or feature classes in which:
rows: entities, records, observations, features
all information about one occurrence of a feature
columns: attributes, fields, data elements, varaibles
one type of information for all features
key field is an attribute whose values uniquely identify
each row
Parcel Table
entity
Parcel #
8
9
36
75
Key field
Address
501 N Hi
590 N Hi
1001 W. Main
1175 W. 1st
Block
1
2
4
12
Attribute
$ Value
105,450
89,780
101,500
98,000
DATABASES - RDBM
tables are related or joined using a common record
identifier (column variable) present in both tables
Example:
goal: produce map of values by distinct/neighbourhood
problem: no distance code available in parcel table
Parcel #
8
9
36
75
Parcel Table
Address
Block
501 N Hi
1
590 N Hi
2
1001 W. Main
4
1175 W. 1st
12
$ Value
105,450
89,780
101,500
98,000
DATABASES - RDBM
solution: join parcel table containing values with
geography table containing location codings, using
Block as key field
Parcel #
8
9
36
75
Parcel Table
Address
Block
501 N Hi
1
590 N Hi
2
1001 W. Main
4
1175 W. 1st
12
$ Value
105,450
89,780
101,500
98,000
Secondary or foreign key
Block
1
2
4
12
Geography Table
District
Tract
A
101
B
101
B
105
E
202
City
Dallas
Dallas
Dallas
Garland
DATABASES - RDBM
Relational Linkages
Spatial Attributes
Water Right
Locations
Descriptive Attributes
DATABASES
Advantage
very flexible
export data to another system easily
enables simple operations
i.e. search for records satisfying some condition
Description
New Ice
Nilas, Ice Rind
Young Ice
Grey Ice
Grey-White Ice
First-Year Ice
Thin First-Year Ice
Thin First-Year Ice, first stage
Thickness
<10 cm
0-10 cm
10-30 cm
10-15 cm
15-30 cm
30-200 cm
30-70 cm
30-50 cm
Code
1
2
3
4
5
6
7
8
Thin First-Year Ice, second stage
50-70 cm
9
Medium First-Year Ice
Thick First-Year Ice
Old Ice
Second-Year Ice
Multi-Year Ice
70-120 cm
120-200 cm
1.
4.
7.
8.
9.