Data Types and Classification

Download Report

Transcript Data Types and Classification

TERMS, CONCEPTS
and
DATA TYPES IN GIS
Orhan Gündüz
Data used in GIS systems are of two
major types:
1. Shape data
2. Raster data
Shape data is further divided into three:
1. Point data
2. Line/Polyline data
3. Polygon data
Point data:
•
•
•
0-D object
A point is a combination fo two numbers (X,Y)
Represents well locations, crime scenes, cities…
Line/polyline data:
•
•
•
•
1-D object
A line is the shortest distance between two points
Has a beginning and an ending point
Represents streams, boundaries, roads…
Polygon data:
•
•
•
2-D object
A polygon is a set of points connected by line segments that close back
to the first vertex
Represents lakes, lots…
POINT
LINE
(X2,Y2)
(X,Y)
left
right
POLYLINE
POLYGON
(Xn,Yn)
outside
(X2,Y2)
inside
(X1,Y1)
(X1,Y1)
(X1,Y1)
(Xn-1,Yn-1)
* Always follow counter
clockwise direction when
creating the polygon.
Node:
A special type of point where at least 3 line segments intersect
Defined by a pair of coordinates
(X1,Y1)
(X1,Y1)
Pixel:
Smallest indivisible element of an image (i.e., pixel in digital pictures)
Grid/Grid cell:
2-D object feature that represents a single element of a continuous surface
(used in raster data)
Symbol:
A graphic element that represents features or attributes on a map
Hospital
Airport
Annotation:
Text or label graphically pointing a feature
ANKARA
Gediz River
GIS Operations
1. Forward data display (from data to map)
2. Backward data display (from map to data)
3. Point in polygon analysis
4. Line in polygon analysis
5. Polygon overlay
6. Buffers
7. Thematic mapping (data display and capture)
8. Area/Distance calculation operations
9. Geocoding/address matching
10. Network analysis
11. Surface modeling
Concept of meta data:
•
•
•
•
•
“Data about data”
An overall description of the contents of the database
Documents data
Gives description on files, formats, locations, source …
Very important
Types of Computerized Systems
Used in GIS
1. Standalone systems
(single PC, local data storage and processing)
2. Networked systems
(NT, local processing, centralized data storage, requires authorization)
3. Centralized systems
(UNIX, centralized data storage/manipulation)
GIS
Vector-based GIS
•
•
•
•
•
•
Objects stored as points,
lines and polygons
Data can be grouped
All data have (X,Y) coord.
Thematic representation is
possible
Overlay operations are
difficult
Boundaries are easily defined
Raster-based GIS
•
•
•
•
•
•
Objects stored as grids
The higher the resolution, the
better the data representation
Poor in boundary definition
Difficulty in defining vector
like objects (eg. A road, a
river, a fence)
Best for overlay operations
Powerful in modeling
Topology:
•
•
•
•
•
The relationship between and among objects
Topology is the branch of mathematics which
concerns itself with the concepts of:
 Direction
 Connectivity
 Adjacency or contiguity
 Proximity
This design feature allows the computer to know the
actual relationship among its graphic parts
Topological data structure is based on nodes and
edges
Commonly used in GIS operations
DATA CLASSIFICATION
METHODS IN GIS
Orhan Gündüz
Data in GIS can be classified according to
following methods:
1.
2.
3.
4.
5.
6.
7.
Natural breaks
Quantiles
Equal area
Equal interval
Standart deviation
Continuous / discontinuous
Normalization
NATURAL BREAKS
•
•
•
•
•
Data is listed from minimum to maximum
Boundaries of an abrupt change in data is set a break
Data in between breaks are grouped as a unit
Statistical methods could also be used to set the breaks
Variance minimization is an option
QUANTILES
•
•
•
Data is broken into intervals with same number of observations
Mostly useful for linear data
Otherwise can be misleading
EQUAL AREA
•
•
Used to classify polygon data
Data divided to form equal area intervals
EQUAL INTERVAL
•
Range of equal intervals
Ex: If data range is (12…351), then we have a total interval of 339. If
divided into 3 intervals, this corresponds to equal intervals of 113.
Thus, we obtain:
12-125
125-238
238-351
STANDARD DEVIATION
•
•
•
•
Mean of data set is computed
Interval breaks are found below and above the mean where these
breaks occur at ¼, ½ or 1 standard deviation from the mean
Suitable in presenting data that has density information such as
population, traffic accidents
As data accumulates around the mean and disperse around the mean
according to standard deviation, one could see the areas where data
accumulates and disperses
Continuous / Discontinuous
•
•
Using upper limits for continuous data
Using both upper and lower limits for discontinuous data
NORMALIZATION
•
•
Instead of data value itself, a normalized version is used in
representation
Normalization is generally done by the sum of all data or the maximum
value of the data
Data value/ sum(data) * 100