Transcript dbdecisions

Database Design
and Decisions
GISC 6383 Management and Implementation of GIS
Note: some figures in this document are adapted from:
State of New York GIS Development Guides
ftp://ftp.sara.nysed.gov/pub/gis/sara.zip
chapter 3: conceptual design
chapter 6: database planning and design
chapter 7: database construction
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
1
Reality
Object
Oriented
Analysis
and Design
(OOAD)
Data Base Design Process: steps
(Review from earlier lecture.)
Conceptual
1. Conceptually Model USERS
view
ESRI Steps for
Building Geodatabase
1. Model the user’s
View of the data
(Tomlinson Chapter 6,7)
2. Logically Model the Data
Database
Schema in
UML
Diagram
(using Visio)
Define data entities and their
relationships
– Identify representation of entities
and match to spatial data model
(Tomlinson Chapter 8, & 9 thru p.136)
3. Physically Model the Data
Geodatabase
Generation
and
Population
3/28/2016
2. Define objects and
relationships
–
Create physical database design for selected
software (Oracle, ArcGIS, etc)
4. Design Process for obtaining and
converting Data from source
Physical
Ron Briggs, UT-Dallas
3. Select geographic
representation.
[
4. Match to
geodatabase elements
5. Organize
geodatabase structure
(Zeiler, p. 18)
GISC 6383 GIS Management and Implementation
2
• We covered Conceptual Modeling/Needs
Assessment in the lecture on Implementation
Steps
• Tonight we focus on Logical Design and
Physical Design
• These are the guts of database design/data
modeling
Loosely corresponds to Tomlinson’s
Chapter 8: Create a data design
and Chapter 9 thru p. 136
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
3
Objectives of Database Design:
– Satisfies and supports organization’s objectives
– Contains all data but no redundant data
• Minimizes redundancy across the organization
– Allows for different users to access same data
• Consistent and flexible data retrieval and analysis
– Accommodates different views of the same data
• Based on user and application needs
• Increases likelihood of users developing applications
– Appropriately represents & organizes geographic features
– Maintains the data so its currency is assured
– Secures data by distinguishing applications (& users) which
• create data (add records for new entities)
• update data (maintain & modify existing data records)
• read data (use but can’t modify in any way)
• delete data (remove records from database)
(CRUD)
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
4
Logical Data Modeling
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
5
Why build a model?
• We build models of systems in order to break them up into
well-defined subsystems, because it helps us to overcome
the difficulties in comprehending such systems in their
entirety.
• As the complexity of systems increase, so does the difficulty
of comprehending them and the importance of good
modeling techniques to assist us in managing them.
• Good models with well-defined semantics are essential for
communication among project teams and to assure
architectural soundness.
In other words, developing a model for an industrial-strength
software system prior to its construction or renovation is
as essential as having a blueprint for a large building.
Luis X. B. Mourão http://cplus.about.com
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
6
Logical Data Modeling
– It is an entirely data-driven
process
– It encourages comprehensive
understanding of business
information requirements
– It enables effective
communication among
designers, developers, and
users throughout the design
process
– It forms the basis for
designing correct, consistent,
sharable, and flexible
databases using any database
technology
3/28/2016
Ron Briggs, UT-Dallas
• Correct
– an accurate and faithful
representation of the way
information is used in the business
• Consistent
– no contradictions in the way
information objects are named,
defined, related, & documented
• Sharable
– accessible by multiple applications
and users to meet varying access
requirements
• Flexible
– easily updateable when
implementation or business
changes
GISC 6383 GIS Management and Implementation
7
Logical Data Modeling in practice
• purpose of data model is to ensure that data has been
identified in a completely rigorous and unambiguous
fashion on which both user and GIS analyst agree
• logical data models define entities (the unit about
which we collect information—e.g. people,
companies), the attributes of entities (the information
collected—age, salary), and the relationships between
entities (companies pay salaries to people)
• developed through use of
– entity-relationship diagrams which show relationships
among all data throughout the organization
– data dictionaries (structured lists) which document each
entity, its attributes, and its relationships
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
8
Entity Relationship (E-R) Diagrams and Logical Data Modeling
• Entity
(noun)
– objects or things to be included in the database
– person, place thing or concept about which you wish to record info.
– for example: employee, company, citizen, street
• Attribute
(adjective) (but they are often nouns—e.g. a name!!)
– characteristics or measurements to be recorded for the entities
– fact or nondecomposable piece of information describing an entity
– for example: age, dob, owner, street type
• Relationships
association between entities
employee---works for----company
transformers ----are mounted on----poles
• Cardinality of Relationships
(verb)
company--------has-------employees
land parcel---has----owner
(“adverb”)
– one to one
Country ----has--- Capital city (one capital city per country)
– one to many
Company (one) <----Work for---- ---has------> (many) Employees
– many to many
Parcels < ----have----> owner s (parcels >1 owner; owners>1 parcel)
• Business Rules (attribute domains and validation rules)
– requirements that attributes or relationships must meet
– specifications which preserve the integrity of the logical data model by governing the values
attributes may assume or the cardinality relationships may take on
– For example: # of kids is an integer between 0 and 24;
poles have 0 to 3 transformers
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
9
Name
Name
Age
Conventional
E-R Diagram
Employee
Works in
Department
Sex
Function
Size
Job Title
Has
Name
Age
Occupant_Name
Dependents
Unit_Number
Relationshlip
to employer
GIS
E-R Diagram
Building_Name
Height
Owner_Name
Building
Located on
Parcel
Owner_Address
Situs_Address
Floor_area
ID #
Has
Occupant_Name
Occupant
3/28/2016
Ron Briggs, UT-Dallas
Unit_Number
GISC 6383 GIS Management and Implementation
10
Entity or Attribute?
Perennial problem in modeling
Parcel
PIN
Owner
This?
Parcel
Owner
PIN
Name
owner as example
Some rules for attributes
• Primitives: no meaning in itself
– e.g 50
• All values of same kind
Or this?
• Do you think of it as a number or text
– e.g all integers: 4, 6 not 4, SIX
>>>attribute
• Describe entity not another attribute
• Does it have attributes of its own
– e.g. owner name describes owner
>>>entity
– Owner has name, billing address,
of parcel, not the parcel
• Does it have a relationship with other • Never a list:
e.g. owner: John Smith, Bill Jones
things >>>entity
– “Owner receives tax bills”
• Never repeated
• Does it repeat elsewhere >>>entity
e.g. owner1, owner2, owner3
– Owner could own many parcels
• Undecomposable or composite?
• Is it important by itself >>>entity
e.g. 117 West Plano Road
– An owner IS important
If in doubt, make it an entity, Can change
later in physical model
3/28/2016
Ron Briggs, UT-Dallas
Exceptions are always made!!!
e.g. in 911 when speed matters
GISC 6383 GIS Management and Implementation
11
…...data dictionaries (structured lists) document
each entity, its attributes, and its relationships
Entity Definition
Entity Name:
Building
Definition:
A unique structure with roof
Unique identifier:
Bldg_Id
Attributes:
Bldg_Id, Bldg_Name, Floor_Area,
Height
Relationships:
located on parcel
has occupant
Entities which have spatial expression
need additional conceptualization…
…see following slides
3/28/2016
Ron Briggs, UT-Dallas
When using UML techniques to
create Entity-Relationship
diagrams, entities, attributes and
relationships can all be documented
within the diagram.
Data dictionaries become part of the
metadata.
Recording measurement units
(meters, feet) for attributes is
critical and problemattic. Can
include in:
--name: length_in_feet (clumsy)
--metadata (metadata gets lost,
forgotten, or separated from data)
GISC 6383 GIS Management and Implementation
12
Spatial Data Modeling
Spatial data differs in two key ways, and these
must be incorporated:
– entities have a corresponding spatial expression.
In ESRI terms,
• Objects: entities without spatial expression (e.g. owner)
• Features: entities with spatial expression (e.g. parcel)
• An entity, when given spatial properties, becomes a
GIS spatial data layer
– relationships may have a spatial expression also
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
13
…in essence, you need to decide how to
represent each entity spatially
Note that:
•
•
•
•
•
Point or a point symbol
Line or line types
Areas or polygons
Surfaces or surface drapes
Raster format such as
scanned paper documents
• Images such as photos,
satellite or clip art
• or plain old Alpha or
numeric
3/28/2016
Ron Briggs, UT-Dallas
• one entity could be visualized
through another
– footprint on a lot map
• might be different at different scales
– airport a point at one scale and
polygon at another
• might be different for two
applications
– street as line for routing
– street as polygon for pavement
management
• attributes might be displayed
graphically as annotation
GISC 6383 GIS Management and Implementation
14
…..identifying entities’ spatial representation
(examples)
• Property associated
–
–
–
–
–
–
–
–
Legal parcel
Assessor parcel
Parcel boundary
plat map
Parcel photograph
Owner
Address
Land value
• Street associated
–
–
–
–
–
–
–
Street
Street segment
Intersection
Traffic light
Traffic analysis zone
Bus route
Bus stop
3/28/2016
Ron Briggs, UT-Dallas
–
–
–
–
–
–
–
–
polygon
polygon
line string
raster
pixels
image
alphanumeric
abcdef123
alphanumeric
numeric
110210.67
–
–
–
–
–
–
–
line string
line segment
node
point
polygon
route
point
in essence, you
need to decide
how to
represent each
entity spatially
(And these will need to
be supported by the
GIS software you
select)
GISC 6383 GIS Management and Implementation
15
Additionally, you will need to identify the classic
spatial data properties for each layer
• Scale ranges at which data is required to be
displayed
• minimum resolution required to support intended
applications
• minimum accuracy required to support intended
applications
• Projection(s) in which data will be
– stored and
– used
(may not be the same)
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
16
…identifying spatial relationships, and how they will be implemented
Spatial Relationships
Spatial
Relationship
Descriptive
Verbs
Common GIS
E-R Model
Implementation Symbol
Connectivity
Connect, link
Topology
Contiguity
Adjacent,
abutt
Topology
street segments link to
street network
cities common border
Containment
Contained,
containing,
within
Spatial join
operation
lot within floodplain
Proximity
Closest,
nearest
Spatial join
operation
house nearest fire_hydrant
Coincidence
Coincident,
Coterminous
Spatial join.
operation
Valve and gauge same
manhole
Possible ER symbol (not standardized)
How GIS system might implement the relationship
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
17
CENSUS
BLOCK
ZONING
CONTAINS
POLYGON
G
T
POLYGON
POLYGON
SOILS
G
POLYGON
G
T
POLYGON
G
G
T
COINCIDENT
LINE
T
STREET SYSTEM
T
FLOODPLAIN
CONTAINS
T
CENSUS
TRACT
CONTAINS
POLYGON
G
TRAFFIC
ZONE
CONTAINS
PARCEL
ADJACENT
POLYGON
G
POLYGON
T
G
INTERSECTION
ABUTTING
STREET
SEGMENT
POLYGON
STREET
T SEGMENT
LINE
G
LINK
T
G
T
INTERSECT
NODE
G
T
WETLANDS
CONTAINS
POLYGON
G
T
WITHIN
VALVE
Contains:
--15 entities
--16 relations
Attributes not
included for
simplicity
NODE
BUILDING
WATERMAIN
POLYGON
G
LINE
G
G
T
NEAREST
ADDRESS
LINK
T
HYDRANT
NODE
HAS
OCCUPANT
G
T
LINK
HAS
WATER
SERVICE
CONNECTION
LINE
G
T
Corrected from original source
Example E-R Diagram with Spatial Concepts for Urban Application
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
18
E-R Diagram
Final E-R diagram should be verified with users for:
• required entities
• appropriate spatial representation of entities
• required attributes
• appropriate spatial relations/operations
Once verified, the E-R diagram becomes the basis for
the physical data base design.
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
19
Creating E-R diagrams: Unified Modeling Language
• E-R diagrams can be drawn by hand or with any drawing package
• CASE (computer aided software engineering) tools emerged in
the 1980s and early 90s) to aid in the process (e.g. TI’s
Composer)
• Each used their own proprietary language and symbolization
• UML (Unified Modeling Language) developed in mid/late 1990s
to provide standardized modeling language based on objectoriented concepts.
– Initiated at Rational Software Corporation in 1994/95 by merging of Grady
Booch’s (Booch Model), Jim Rumbaugh’s (OMT--Object Modeling Technique) and
Ivar Jacobson’s (OOSE--Object-Oriented Software Engineering) method
• Existence of UML standard allows data base vendors to support
automated conversion of conceptual data models to physical data
base designs
• ArcGISs support use of MS Visio2000/02/03 (enterprise edition
[2000], professional [02/03] for full support) [see Zeiler, p. 19-20]
– Sample geodatabase schemas (templates) available for different industries
• Other UML based products: Rational Rose, Paradigm Plus,
Oracle2000, ERwin (from Computer Associates)
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
20
Physical Data Modeling
• Creating physical database design for the
selected database software
– Oracle, SQL Server, ArcGIS, Intergraph, etc
• We will focus on physical database design for
an ESRI Geodatabase
Loosely corresponds to Tomlinson’s Chapter 8:
Create a Data Design
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
21
…physical data base design involves
For entities and attributes
•
•
Representing (“mapping”)* all entities and their attributes into one or more relational
tables in a selected RDBMS and determining keys for forming relationships
For each spatial entity, selecting an appropriate
– GIS data type e.g. polygon, line, point, surface
– GIS data set format for storage: e.g. geodatabase, coverage, grid, tin, image, etc
– Spatial Reference System
• coordinate system (geographic or projected, etc.)
• Spatial domain
• Precision (or resolution)
•
Selecting an appropriate Data Type for each field (attribute)
– e.g. for ESRI: string, short integer, long integer, float, double, blob.
For relationships (associations)
•
for regular (non-spatial) relationships, identifying which of the RDBMS’s normal
query structures or relational operators will handle the relationship
– if won’t do it, develop specs for a custom application
– See Appendix for discussion of relational operators
•
for spatial relations, identify which capabilities of software will handle the desired
operation (e.g. nearest neighbor identification)
– if won’t do it, develop specs for a custom application
*Note: Computer people talk about “mapping” entities to tables rather than
“representing”. Mapping in this sense does not mean producing cartographic output!
Conceptual
E/R diagram
PARCEL
BUILDING
CONTAINS
POLYGON
POLYGON
Physical
Database design
LAYER
(Primary key=foreign key)
BUILDING
LAYER
TABLE
BUILDING ID # (primary key)
PARCEL ID # (foreign key)
PARCEL
TABLE
PARCEL ID #
(one)
table join
(many)
Example of “mapping” a relationship for RDBMS attribute tables
--the relation contains is mapped to a table join
--the building table has a foreign or secondary key, (parcel ID#) to
permit a join with the primary key (also parcel ID#) in parcel table
The spatial data structure is shown as two generic layers
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
23
spatial expression
represented as an ArcInfo
polygon coverage
Example of
Spatial
Database
“Mapping”
for ArcInfo
coverage
Parcel
POLYGON
G
T
Attribute List of Entity "Parcel"
Parcel [subdivision_block_lot#,
owner_name, owner_address
situs_address, area, depth,
front_footage, assessed_value,
last_sale_date, last_sale_price
(owner_name, owner_address,
assessed_value as of Jan. 1 for
last two years)]
Attributes
represented in
Oracle Tables
ARC/INFO Spatial
Database Structure
(coverage)
Parcel
INFO
ARC
Parcel
AAT
TIC
BND
ETC.
PAT=Polygon
Attribute Table
PAT
Area
Perimeter
Poly ID #
Sub_bl_lot#
Parcel ID#
Owner_name
Owner_add
Situs_add
Depth
Front_footage
Assessed_value
Last_sale_date
Last_sale_price
Previous Values
Parcel ID#
Year
Owner_name
Owner_address
Assessed_value
What is wrong
with this design?
A key field for parcels is formed by
concatenating subdivision, block & lot #s
VALVE
WATER MAIN
LINK
Spatial:
geodatabase
HYDRANT
Feature data set
Feature class
Feature class
Example
of Database Mapping:
for geodatabase
WATER MAIN ID #
Attribute: SQL-Server TABLES
WATER MAIN
WATER MAIN ID #
VALVE
VALVE ID #
VALVE ID #
HYDRANT ID #
Design decisions:
• why distribmains and transmains as
separate feature classes?
• why not valve with gate and hydrant
subclasses? (--different attributes)
• why prodwell1 and prodwell2?
HYDRANT
HYDRANT ID #
ESRI-related
DB Design Decisions
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
26
ESRI DB Design Decision Overview
• Type for spatial data layer
– Vector (point, line or polygon), raster, tin ?
• Format for spatial data
– Coverage, shapefile or geodatabase ?
Geodatabase Design Decisions
• Feature Datasets
– Stand-alone feature classes or feature data sets (fds)?
– Spatial Reference system for each feature data set
• Coordinate system: lat/long or projected? which projection? what parameters?
• Spatial Domain (extent) and Precision ?
• Feature Classes (tables)
– Subtypes or separate feature classes?
• roads feature class with road_type subtype, or separate freeway, arterial, street feature
classes?
• Attributes Types and Validation
– Type (string, long integer, short integer, etc.) ?
– Validation Rule through application of Attribute Domain ?
• Domains and Defaults
• Relationships and Associations between feature classes
– Implement Geometric networks and/or topology rules?
– Implement relates or joins in the database or in ArcMap documents?
• Type of Geodatabase
– Personal geodatabase or SDE-based geodatabase ?
Spatial Data Types
Overall, ArcGIS 9 supports at least four representations of geographic data.
• Vector data for representing features
– CAD, Coverages, shapefiles, geodatabases
• Raster data for images, and surfaces.
– Image data in .bmp, .tiff, .jpeg, .sid, ERDAS formats
– Raster data in discrete or continuous GRIDS (ESRI’s native file format for raster)
• Discrete grids can have an attribute data table, continuous do not
– Raster data in a Geodatabase (as of ArcGIS 9)
• Triangulated Irregular Networks (TINS) for surfaces.
– Although TINS are a vector format, as of ArcGIS 9.1, they are not yet supported by
the Personal Geodatabase and must be stored in coverage workspaces or SDE
geodatabase
• Tabular data (sometimes called Event tables).
– List of X,Y coordinates for points (such as may be output from GPS)
– “Locators” for finding a geographic position from an address.
A decision must be made as to the spatial data type for each layer.
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
28
Spatial File Formats--example
Personal Geodatabase
In a gdb, feature
class can have
Feature data set
only one feature
Feature class (feature type = polygon)
type.
Feature class (feature type = arc)
Coverage (= feature class)
A coverage can
Feature type (arc)
have multiple
feature typesFeature type (point)
now viewed as a
Feature type (polygon)
shortcoming.
Feature type (point)
Coverage (= feature class)
Feature type (arc)
Feature type (point)
Tracts feature class table
Locator (table)
Raster
Shapefile
Shapefile
Features
(rows)
3/28/2016
Ron Briggs, UT-Dallas
Feature ID
Feature
(key field)
type
GISC 6383 GIS Management and Implementation
Secondary or
Foreign key
29
Geodatabase Design Decisions—example
Texas geodatabase
Dallas County feature data set
feature classes
All feature classes
extent
within a feature data
datum
set must be in the
projection
same spatial
reference system
Plano feature data set
feature class
feature class
Stand-alone feature classes
Each stand-alone
feature class may
be in a different
spatial reference
system
US Geodatabase
Stand-alone feature class
Rasters and TINs can be stored within a SDE
geodatabase but not a personal geodatabase
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
30
Anatomy of a Geodatabase
Geodatabase
Feature datasets
Spatial Reference
Object classes and subtypes
Feature Classes and subtypes
Relationship classes
Geometric Network
Topology
Domains
Validation Rules
Raster Datasets (SDE Only)
rasters
TIN datasets (SDE only )
nodes, edges, faces
Locators
addresses
x,y locations
Zip codes place names
route locations
3/28/2016
Ron Briggs, UT-Dallas
Geodatabases contain: feature datasets, raster
datasets, TIN datasets (planned 9.2),
locators
Feature datasets contain various objects which
all share a common spatial reference
Objects (e.g. Jane Blow, land owner) are
instances of object classes (e.g. land
owners) and have no spatial form.
Features, stored in feature classes, are spatial
objects (e.g. land parcels) which are similar
and have same spatial form (e.g. polygon)
Object (or feature) classes are tables, with
objects (or features) in the rows of the table
Attributes are in the columns of the table
Subtypes are an alternative to multiple object
(or feature) classes (e.g. ‘concrete’,
‘asphalt’, ‘gravel’ road subtypes): think of
subtype as the most significant
classification variable (attribute) in the class
table
GISC 6383 GIS Management and Implementation
31
Anatomy of a Geodatabase contd
Geodatabase
Feature datasets
Spatial Reference
Object classes and subtypes
Feature Classes and subtypes
Relationship classes
Geometric Network
Topology
Domains
Validation Rules
Raster Datasets (SDE only)
rasters
TIN datasets (SDE omly)
nodes, edges, faces
Locators
addresses
x,y locations
Zip codes
place names
route locations
3/28/2016
Ron Briggs, UT-Dallas
Relationship classes are tables containing general
relationships between objects and/or features
(e.g. between work order object class and roads
feature class)
Geometric Networks models flows thru linear
systems such as streams, sewers, raods
Topology models relationships among lines and
areas (e.g. common state/county boundary)
Domains are sets of valid and/or default attribute
values: (e.g. road lane count default is 2; valid
values are integers 1-12 )
Validation rules control feature and attribute
integrity by applying domains. 3 types:
attribute rules (applied to attributes or
subtypes e.g gravel road 1 or 2 lanes only);
connectivity rules (e.g. gravel road cannot
connect to freeway); and relationship rules
(constrains cardinality of a relationship e.g.
gravel road can have no more than 4 segments
at an intersection)
Simple behaviors are realized thru domains and
validation rules
Complex behaviors and custom objects are realized
by extending rules with custom programming32
GISC 6383 GIS Management and Implementation
Feature classes (FC), feature datasets (fds) and subtypes
• Spatial features (e.g. a land parcel) are grouped into feature classes: a table with
spatial data
– Data in FC must have same topology type (all points, all lines, all polygons)
• Water feature class with lakes (polygon) and streams (line) not permitted
– Minimizing the number of feature classes improves performance
• Use different feature classes only when attributes are significantly different
– Use roads feature class rather than freeway, arterial, streets feature classes
– Use subtype to differentiate freeway, arterials, streets (all have similar attributes)
• Subtypes are “subclasses” within a feature class that allow you to further
distinguish objects without creating new feature classes
– based on a single column’s values (must be integer or long integer)
– Same subtype has similar attribute values and behaviors
– Use where attributes are the same across all subtypes
• Feature classes can be grouped into feature datasets (fds) or “spatial folders”
– All feature classes in a fds must have the same spatial reference system, but may
have different topology (can have points and lines and polygons in same fds)
– Organize by thematic similarity e.g transportation
– If you wish to create a geometric network, must be in same fds
– If you wish to create topology, must be in same fds
– If they share geometry (street forms political boundary), should be in same fds
– Security (read/write permissions, etc..) applied at the fds not the fc level!!!!
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
33
Data Types for Attributes
• For every attribute field, must select a data type
• Each RDBMS stores data slightly differently
• ESRI generic data types will translate into closest RDBMS equivalent
• Values given below may differ with RDBMS used
ESRI Generic Data Types
String: text field. Be sure its length (number of characters), absolute or what you
specify, is sufficient to record longest data value.
Short Integer: (or integer) whole numbers (no decimal point) generally
+/-32,767 (2 bytes). OK for size of family, not OK for city size
Long Integer: (or long) only supports integers to +/- 2,147,483,647 (4 bytes)
Float: (or single) single precision floating point; again, be careful-- supports
decimal point but perhaps only 6 digits long with decimal moveable 34 places
(E34) (4 bytes)
Double: double precision floating point; the safest-- supports 12-15 digits with
decimal moveable up to 308 places (E308) (8 bytes)
Blob: binary long decimal for special programming applications
Note terminology:
• Precision: the total number of digits (before plus after decimal)
• Scale: number of digits after decimal
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
34
Domains and Defaults
Why Use Them?
• Data Integrity: prevents entry of invalid (“obviously wrong”) data values
• Data Efficiency: choose from a set of valid values rather than type in each time
Domains define a set of legal values for a field’s attributes
• Range domain: specifies a valid range of values for numerical attributes
– A water pipe must be between 1 and 100 inches wide
• Coded value domain: specifies a valid set of values for an attributes. Can apply
to any type of attributes
– Parcels can only have RES or VAC land use values
• Domains are defined as a geodatabase property & then applied as appropriate
– Multiple objects in the same database may use the same domain
– May be applied to an entire field (attribute), or separately by subtype
Defaults are values automatically assigned when a feature is created
– Of course, may be changed during data entry/edit process
– Again, may be applied to an entire field (attribute), or separately by subtype
Again, the physical design process requires decisions about domains
and defaults, and to what they should be applied.
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
35
Implementing Associations and Relationships
Associations (general term) may be implemented as
• A relationship class within the GDB.
–
–
–
–
–
•
both classes (tables) in the relationship (i.e. the related tables) must be within the GDB
permanent part of the data model and relationship rules are enforced
Is itself a class, stored in tables, with properties and behaviors
support 1:1 and 1:M cardinality, and many to many using a key class table.
Strictly speaking, everything else is an “association”
An ArcMap relate
–
–
–
Functionally similar to a relationship class
supports both 1:1 and 1:M cardinality
Can form a relate to tables outside the GDB
•
•
–
•
Local to the ArcMap document therefore is essentially temporary
An ArcMap join
–
–
–
Supports 1:1 cardinality only
Links matching objects and visualizes them as rows in a single table
Can match objects either
•
•
•
INFO tables, Access tables, dbf tables
Tables in other databases via ODBC (object data base connectivity)
--Use relationship class
when the referential
integrity of data is
important
--Joins are simpler and
require less overhead.
Non-spatially using key attribute fields in each table: exists only temporarily
Spatially using containment, nearness, etc. criteria:
saved as new feature class or new shapefile
An ArcMap hyperlink
– Attribute in a table stores a hyperlink to a document outside GDB
• path name to file-based documents (spreadsheets, text, photos, video or sound clips, etc.)
• URLs for Internet documents
– Not part of the GDB in any way; implemented via layer properties in ArcMAP
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
36
Some Special Case Relationships
• Many to Many relationships
– Many class A objects match many class B (parcels & owners)
– Implemented via an “attributed relationship”
– Intermediate table is created to store the relationship
Ownership
Parcel
LegalArea Zoning
1.5
RES
0.75
COM
0.75
RES
0.18
RES
0.18
AGR
PIN
1
2
3
4
5
Origin PK
Origin FK
PIN
1
2
3
3
3
4
4
5
Percent
100
50
50
25
75
75
25
100
Owner
OWN
1
1
2
3
4
1
3
3
Own
1
2
3
4
Name
Mary
Joe
Raj
Pedro
Destination PK
Destination FK
• Aggregation v. Composition and Simple v. Complex Relationships
– Aggregation: e.g. dog has bowl, collar
valve has valve box
• Aggregation implemented through simple relationships
• Peer to peer: delete one, the other remains: Dog dies but bowl and collar remain
– Composition: e.g. dog has feet, tail
valve has maintenance records
• Composition implemented through complex relationships
• Enforced dependency: delete one, and the other goes also: Dog dies, feet and tail gone
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
37
Spatial Reference
All feature classes within a feature dataset must have the same spatial reference.
• Coordinate System
– Datum
– Geographic (lat/long) or projected?
– Projection parameters: central meridian, standard parallels, coordinate system origin
(false easting and northing)
– Measurement (map) units: dd (for lat/long); feet, meters, etc. (for projected)
• Spatial domain
– The allowable coordinate range for the geographic coordinates
• X/Y Domain: MinX, MaxX, MinY, MaxY (horizontal extent)
– Domain defaults to 3 times the the actual data extent (100% on either side)
• Z Domain: Min, Max (vertical extent)
• M Domain: Min, Max (other parameter, e.g. distance from river mouth ) (can differ within
feature data set)
– Once created, the spatial domain for feature dataset/class cannot be changed.
– Data outside domain will require a new feature dataset or standalone feature class.
• Precision
– Number of system storage units (SU) per one map measurement unit (MU)
• If precision is 1 and mu= 1 meter ( 1 SU per MU), cannot record values less than 1 meter
• If precision is 100 and mu= 1 meter (100 SUs per MU), can record values
to 1/100 = .01 = 1 cm
38
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
Precision and Spatial Domain
Wilson NC data
793707
X values
Precision is: 15,624
Since map units are feet, will
support accuracy to 15624/12=
1,302th of an inch!!!
Domain range in x
Domain range in y
137,438
47734
52555
745972
41970
704002
2249613
2302168 Extent range in x
Extent range
in y
52555
32328
2334496
2387052
WilsonCity.shp
Shapefile Extent
(range of actual data
when fds created)
Geodatabase Spatial Domain
47734
100%-200% wider on all sides therefore
Domain range is at least 3 times Extent range
656268
Y values
You have 10 significant digits to work
with. Precision in essence controls
were you put the decimal.If map unit is
meters and precision is 1000, you
record down to the nearest millimeter.
E.g. the map value 1,123,456.1236 is
stored as 1,123,456,124
With GRS80, world circumference is
40,075,016 m. therefore can map
world at approx. 1.9 cm accuracy
(40,075,016*100)/ 2,147,483,648
(Exact amount depends on Precision)
--Geodatabase coordinates are stored as 4byte long integer. This provides
10 significant digits with max value of 2,147,483,648
-- map value is multiplied by precision when stored (and converted back when displayed), so
min Y values, for example, actually stored as 793707x15624=1,2400,888,268
-- the values are also shifted when stored so that data is centered in storage space so
you only have to ensure that max. range times precision is less than 2,147,483,648
137438x15624=2,147,331,300 < 2,147,483,648 OK, otherwise reduce domain or precision
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
39
Personal versus MultiUser (ArcSDE) Geodatabase
• Personal Geodatabase.
– Implemented as a Microsoft Access database (*.mdb file) by using MS Jet engine
which is installed along with ArcGIS8.
– Microsoft Access license not needed, but its handy to have for attribute data
development
– Can be placed on local or network drives
– if on network drive and a user has edit access, other users can’t access (single user
editing).
– Intended for personal or small work-group use
– can handle small to moderately sized datasets.
• Max of 250,000 features per feature class (table)
• maximum size is 2.0 GB
– In general, has the full functionality of ArcSDE geodatabase except
• versioning.
• Multi-user editing
– If a personal ggdb is deleted in ArcCatalog (or by Windows Explorer) , its gone.
• One .mdb file can contain a lot of data. Be carefull!!
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
40
Personal versus MultiUser (ArcSDE) Geodatabase
• Enterprise Multiuser ArcSDE Geodatabase
– ArcSDE is a data access extension to ArcGIS 8 & 9 that serves geodatabases to
ArcGIS applications running on PC’s connected via a TCP/IP network.
– supports concurrent editing by multiple users.
• Supports versioning where multiple users can concurrently edit different versions of a
layer and any conflicts are resolved when versions are saved back to the original layer
– significantly higher speeds for data access than shapefiles or personal geodatabases
– Supports very large databases without the need to tile or otherwise ‘subdivide’ the
data
– ArcCatalog only creates and deletes connections to ArcSDE geodatabases, it can’t
delete the database
– Can be deployed on UNIX or Windows NT.
• Many use UNIX platform for ArcSDE and DBMS, and XP for GIS applications
– ArcSDE is centrally tuned and managed by a DBA.
– Back-up and security procedures implemented in the DBMS apply to the GIS data.
– Can build SQL applications to access tables in a remote geodatabase.
– Requires a server with a DBMS (e.g Oracle, SQL Server) and ArcSDE.
GIS
User
3/28/2016
Ron Briggs, UT-Dallas
server
SDE
db
GISC 6383 GIS Management and Implementation
41
Generating Geodatabase Schema
•
“Schema” is the definition of objects contained within a database
– For a geodatabse, objects may include Domains, Tables, FeatureClasses, Relationships or
GeometricNetworks.
•
Four solutions for schema
generation/management in ArcGIS 9
1.
2.
3.
4.
Manual creation using ArcCatalog, or
Design from scratch in Microsoft
Visio, output to XMI* Repository, and
use ArcCatalog's Case Tool to import
from XMI
As above, but begin with one of ESRI
existing Sample Data Models**, and
edit
GeodatabaseDesigner extension (free
ArcScript downloadable from
www.esri.com) which can be used in
conjunction with first 2 or, in some
cases, as an alternative.
**ESRI has sample data models for a variety of areas:
*XMI is a new standard for storing object
--hydro, parcel, transportation, utilities, etc..
models. Supported by Visio 2002 and later
--download from Web site
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
42
Pros and Cons
• ArcCatalog Menu and Wizards
– Simple and easy to use; documented within ArcCatalog
– Easy to introduce gross errors in the schema through either omission or addition
• Visio(UML) and the ArcCatalog Case Tool
– UML (Unified Modeling Language) is a standard for modeling object-oriented
objects and their properties, thus can be applied to other databases
– object inheritance significantly reduces duplication
– Uses Visio's strong graphical functionality for an easy way to visualize the design.
– However,
• need to buy it
• More complex with steep initial learning curve
• only supports a subset of geodatabase properties
• Geodatabase Designer (GD)
–
–
–
–
Fast and free to use and distribute (although not officially supported)
Supports all geodatabase properties and all ArcSDE RDBMS's.
The only bi-directional solution : schema can be EXPORTED and IMPORTED
However,
• A proprietary solution which only works with ESRI geodatabase
• Only displays in html text, but can use Geodatabase Diagrammer (another free extension)
to display in Visio
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
43
Database Normalization
• An important step in the physical design is database
normalization
• Developing a table structure which:
– Reduces or eliminates redundancy
– Makes tables easy to manage
– Simplifies changes in the future
• There is an entire theory of database normalization
– we don’t have time to go into it
• Just present an example
– The usual goal is to create a table structure which is in 3rd
normal form (3NF)
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
44
Unnormalized (flat file)
Parcel_ID Parcel_ad Block Precinct Councillor City
8
501 Sadowski
1
1001
Smith
Big
9
590 Sadowski
2
1002
Jones
Big
36
1001 Adnan
4
1002
Jones
Big
75
1175 Dadley
12
1004
Hassan Little
Mayor Own1_name
Own1_ad
Own2_name
Own2_ad
Value
Green Sadowski. M 501 Sadowski
105,450
Green
Adams, K
590 Sadowski
Adams, M
590 Sadowski 89,780
Green Sadowski, M 501 Sadowski
101,500
White
Kroeger
592 Tierney
Bertrand. K 1097 Bertrand 98,000
3rd Normal form:
--all fields are determined by primary key field
OWNER TABLE
key fields
Owner_ID
Owner_name Owner_ad
001
Sadowski. M 501 Sadowski
002
Adams, K
590 Sadowski
004
Kroeger
592 Tierney
003
Adams, M
590 Sadowski
005
Bertrand. K 1097 Bertrand
PARCEL TABLE
Parcel_ID
street_no
street_name Block
Owner_ID
8
501
Sadowski
1
001
9
590
Sadowski
2
002
36
1001
Adnan
4
001
75
1175
Dadlexz
12
004
9
590
Sadowski
2
003
75
1175
Dadlexz
12
005
COUNCILOR TABLE
MAYOR TABLE
Precinct
Councillor
City
City
1001
Smith
Big
Big
1002
Jones
Big
Little
1004
Hassan
Little
Value
105,450
89,780
101,500
98,000
89,780
98,000
Mayor
Green
White
See:
Appendix II
for more detail
Data Importing Vs. Data Loading
• Importing
– Creates new features within a new feature class or
geodatabase table.
• The features class or table cannot exist before importing
– Database schema is imported at the same time
– Often involves conversion from other formats e.g. coverages
• Loading
– Appends features into an existing feature class.
– Existing feature class must have the same schema as the data
sources
– Can be accomplished with:
• Simple Data Loader (ArcCatalog)
• Object Loader Wizard (ArcMap)
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
46
Conclusion
• The outcome of these steps is:
– A rigorous design for our database:
a “database schema”
– The design of a process for obtaining the data elements that will
populate our database schema
• Identifying a data source and the necessary processing sequence for
each layer
• covered in Implementation Steps lecture
Next time, we will go into the lab and look at some of this in
practice.
This will involve many ESRI-specific design decisions as outlined in:
dbdecisions.ppt
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
47
Appendix I
DBMS Relational Operators
“for regular (non-spatial) relationships (in the ER Diagram
or UML model), physical database design involves
identifying which of the RDBMS’s normal query structures or
relational operators will handle the relationship”
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
48
•
RDBMS: Relational Operators
Select (or Restrict)
–
•
retrieves a subset of rows from a table based on value(s) in a column or columns
Project
–
•
retrieves a subset of columns from a table, removing duplicates from the result
Product
–
produces the set of all rows that are the concatenation of a row from one relational table with a row from
another relational table
(usually an intermediate step; not useful otherwise)
–
•
Join
–
–
•
Union
–
•
results in rows common to two (or more) relational tables
Difference
–
•
vertically combines (stacks) rows of one table with rows in the same or a different table
Intersection
–
•
horizontally combines (contatenates) rows in one table with rows in another (or the same) table,
including only rows which meet some selection criteria relating columns of the two tables
Combines product and select
results in rows that appear in one table but not another
Division
–
results in common values in one table for which there are other matching column values corresponding
to every row in another table
Examples follow in the next three slides……
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
49
Base Tables
STORE TABLE
Store_Name
Mr Chip
Bean, Inc.
Mrs Mousse's
Mr Mousse's
Sancho's
Diet-Crème
Freeze-it
Location
New York
New York
New Jersy
New Jersy
California
Colorado
Alaska
SUPPLIER TABLE
Supplier_Name Flavor
Mr Chip
vanilla
Mr Chip
Chocolate
Mrs Chip
avovado
Mrs Chip
date-nut
Diet-Crème
cottage chees
Diet-Crème
skim milk
SELECT (RESTRICT) of Store Table
on Location = New Jersey
Store_Name
Location
Mrs Mousse's
New Jersy
Mr Mousse's
New Jersy
PROJECT of Store table
on Location column
Location
New York
New Jersey
California
Colorado
Alaska
JOIN Store Table and Supplier Table
by vendor name
(same as Select from Product when
supplier_name=store_name
Mr Chip
vanilla
Mr Chip
New York
Mr Chip
Chocolate Mr Chip
New York
Diet-crème
cottage chees
Diet-Crème Colorado
(note: this is an equi-join; there are other types)
3/28/2016
Ron Briggs, UT-Dallas
PRODUCT of STORE and SUPPLIER Tables
Supplier
Supplier
Store
Store
supplier_name flavor
Name
Location
Mr Chip
vanilla
Mr Chip
New York
Mr Chip
vanilla
Bean, Inc.
New York
Mr Chip
vanilla
Mrs Mousse'sNew Jersy
Mr Chip
vanilla
Mr Mousse's New Jersy
Mr Chip
vanilla
Sancho's
California
Mr Chip
vanilla
Diet-Crème Colorado
Mr Chip
vanilla
Freeze-it
Alaska
Mr Chip
Chocolate
Mr Chip
New York
Mr Chip
Chocolate
Bean, Inc.
New York
Mr Chip
Chocolate
Mrs Mousse'sNew Jersy
Mr Chip
Chocolate
Mr Mousse's New Jersy
Mr Chip
Chocolate
Sancho's
California
Mr Chip
Chocolate
Diet-Crème Colorado
Mr Chip
Chocolate
Freeze-it
Alaska
Mrs Chip
Avocado
Mr Chip
New York
Mrs Chip
Avocado
Bean, Inc.
New York
Mrs Chip
Avocado
Mrs Mousse'sNew Jersy
Mrs Chip
Avocado
Mr Mousse's New Jersy
Mrs Chip
Avocado
Sancho's
California
Mrs Chip
Avocado
Diet-Crème Colorado
Mrs Chip
Avocado
Freeze-it
Alaska
Mrs Chip
date-nut
Mr Chip
New York
Mrs Chip
date-nut
Bean, Inc.
New York
Mrs Chip
date-nut
Mrs Mousse'sNew Jersy
Mrs Chip
date-nut
Mr Mousse's New Jersy
Mrs Chip
date-nut
Sancho's
California
Mrs Chip
date-nut
Diet-Crème Colorado
Mrs Chip
date-nut
Freeze-it
Alaska
Diet-crème
cottage chees
Mr Chip
New York
Diet-crème
cottage chees
Bean, Inc.
New York
Diet-crème
cottage chees
Mrs Mousse'sNew Jersy
Diet-crème
cottage chees
Mr Mousse's New Jersy
Diet-crème
cottage chees
Sancho's
California
Diet-crème
cottage chees
Diet-Crème Colorado
Diet-crème
cottage chees
Freeze-it
Alaska
Diet-crème
skim milk
Mr Chip
New York
Diet-crème
skim milk
Bean, Inc.
New York
Diet-crème
skim milk
Mrs Mousse'sNew Jersy
Diet-crème
skim milk
Mr Mousse's New Jersy
Diet-crème
skim milk
Sancho's
California
Diet-crème
skim milk
Diet-Crème Colorado
Diet-crème
skim milk
Freeze-it
GISC 6383 GIS Management and Implementation
50
Alaska
STORE TABLE
Store_Name Location
Mr Chip
New York
Bean, Inc.
New York
Mrs Mousse'sNew Jersy
Mr Mousse's New Jersy
Sancho's
California
Diet-Crème Colorado
Freeze-it
Alaska
Base Tables
EUROPE_STORE Table
Store_Name Location
Sir Chip
London
Monsieur Chip Paris
Senor Chip
Madrid
UNION of STORE
and EUROPE_STORE Tables
Store_Name Location Status
Mr Chip
New York null
Bean, Inc.
New York null
Mrs Mousse's New Jersynull
Mr Mousse's
New Jersynull
Sancho's
California null
Diet-Crème
Colorado null
Freeze-it
Alaska
null
Sir Chip
London open
Monsieur Chip Paris
closed
Senor Chip
Madrid
open
(duplicates removed, if any)
3/28/2016
Ron Briggs, UT-Dallas
Status
open
closed
open
MY_FAVORITES Table
Store_Name Location
Comrade Chip Moscow
Sir Chip
London
Herr Chip
Berlin
INTERSECTION of MY_FAVORITES
and EUROPE_STORE Tables
Store_Name Location Status
Sir Chip
London open
DIFFERENCE of MY_FAVORITES
and EUROPE_STORE
Store_Name Location
Comrade Chip Moscow
Herr Chip
Berlin
DIFFERENCE of EUROPE_STORE
and MY_FAVORITES
Store_Name Location Status
Monsieur Chip Paris
closed
Senor Chip
Madrid
open
GISC 6383 GIS Management and Implementation
51
Terri’s job function is to check if employees of Big City X have taken
required courses in the city’s employee training program. The process is to
compare courses taken with required courses. Terri can be replaced by a
RDBMS Division relational operator!
Base Tables
Required Course Table
Course#
C_Name
mis101
data
mis201
program
mis301
networks
Completed Course Table
Student ID# S_Name
10
Fred
10
Fred
10
Fred
10
Fred
30
Karen
20
John
30
Karen
30
Karen
30
Karen
DIVISION of Completed_Course Table
by Required_Course Table
Student ID# S_Name
10 Fred
30 Karen
3/28/2016
Ron Briggs, UT-Dallas
Course#
mis101
gis101
mis201
mis301
mis101
gis201
gis101
mis301
mis201
C_Name
data
gis
program
networks
data
gps
gis
networks
program
The division
operator identifies
Fred and Karen
GISC 6383 GIS Management and Implementation
52
Appendix II
Database Normalization Detail
•Developing a table structure which:
–Reduces or eliminates redundancy
–Makes tables easy to mange
–Simplifies changes in the future
•Our goal is normally to have all tables in
third normal form (3NF)
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
53
Unnormalized Data (Flat File)
Unnormalized form (flat file)
Parcel_ID Parcel_ad Block Precinct Councillor City
8
501 Sadowski
1
1001
Smith
Big
9
590 Sadowski
2
1002
Jones
Big
36
1001 Adnan
4
1002
Jones
Big
75
1175 Dadley
12
1004
Hassan Little
Not smallest meaningful value.
How can you sort by street, then number?
Mayor Own1_name
Own1_ad
Own2_name
Own2_ad
Value
Green Sadowski. M 501 Sadowski
105,450
Green
Adams, K
590 Sadowski
Adams, M
590 Sadowski 89,780
Green Sadowski, M 501 Sadowski
101,500
White
Kroeger
592 Tierney
Bertrand. K 1097 Bertrand 98,000
repeating groups of fields.
What if there are 3 (or 25)
owners?
• You work for the county. In this particular state, the county records
land ownership, values property, and manages all elections held in the
county.
• Some of the information you need is shown in the flat file above
• This format has many problems, some of which are pointed to above
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
54
First Normal Form (1NF)
First Normal Form (1NF):
•Each field contains smallest meaningful value
•Parcel_ad split into two variables (street_no & street_name), thus can sort on
street, then number
•Owner_ad left as complex attribute ‘cos only used for mailing
•No repeating fields (owner1, owner2, etc..)
•There is now no limit on number of owners per parcel
Parcel_ID street_no street_name Block Precinct Councillor City Mayor Owner_ID Owner_name
Owner_ad
Value
8
501
Sadowski
1
1001
Smith
Big Green
001
Sadowski. M 501 Sadowski 105,450
9
590
Sadowski
2
1002
Jones
Big Green
002
Adams, K
590 Sadowski 89,780
36
1001
Adnan
4
1002
Jones
Big Green
001
Sadowski, M 501 Sadowski 101,500
75
1175
Dadlexz
12
1004
Hassan Little White
004
Kroeger
592 Tierney 98,000
9
590
Sadowski
2
1002
Jones
Big Green
003
Adams, M
590 Sadowski 89,780
75
1175
Dadlexz
12
1004
Hassan Little White
005
Bertrand. K
1097 Bertrand 98,000
However, problems in that:
•Must use multiple primary key fields (parcel_id and owner_id) to uniquely identify a record
•multiple repeating values when there are two (or more) owners: Street_no, street_name,
block, precinct,councillor, mayor, city, Owner_name, Owner_ad all have repeats. Wastes
space, and
•If an owner’s address changes, multiple records (rows) must be updated
•If a parcel is sold, and the owner does not own any other property (e.g. Kroeger,
Adams, M or Bertrand), information about that owner is lost
Second Normal Form (2NF):
concept of 2NF and problem with 1NF table
• Second Normal Form (2NF) requires that every non-key field (attribute) be
“functionally dependent” on the primary key
– Functional dependency is a relationship between attributes such that knowing
one attribute automatically determines the other
• Tables with multiple fields making up the primary key are not 2NF
– This usually shows up as repeating values in an attribute fields
– For example, owner_ID repeats, and knowing the owner does not determine the
councilor
•
knowing owner_ID (part of primary key) is 001, does not determine Councilor, which
could be Jones or Smith.
Parcel_ID street_no street_name Block Precinct Councillor City Mayor Owner_ID Owner_name
Owner_ad
Value
8
501
Sadowski
1
1001
Smith
Big Green
001
Sadowski. M 501 Sadowski 105,450
9
590
Sadowski
2
1002
Jones
Big Green
002
Adams, K
590 Sadowski 89,780
36
1001
Adnan
4
1002
Jones
Big Green
001
Sadowski, M 501 Sadowski 101,500
75
1175
Dadlexz
12
1004
Hassan
Little White
004
Kroeger
592 Tierney 98,000
9
590
Sadowski
2
1002
Jones
Big Green
003
Adams, M
590 Sadowski 89,780
75
1175
Dadlexz
12
1004
Hassan
Little White
005
Bertrand. K
1097 Bertrand 98,000
(1NF table)
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
56
Second Normal Form (2NF): example of 2NF form
PRECINCT TABLE
key field
Precinct
1001
1002
1004
Councilor City Mayor
Smith
Big Green
Jones
Big Green
Hassan Little White
OWNER TABLE
Owner_ID Owner_name Owner_ad
001
Sadowski. M 501 Sadowski
002
Adams, K
590 Sadowski
004
Kroeger
592 Tierney
003
Adams, M
590 Sadowski
005
Bertrand. K 1097 Bertrand
PARCEL TABLE
Parcel_ID
8
9
36
75
9
street_no
501
590
1001
1175
590
street_name Block
Sadowski
1
Sadowski
2
Adnan
4
Dadlexz
12
Sadowski
2
Precinct
1001
1002
1002
1004
1002
Owner_ID
001
002
001
004
003
Value
105,450
89,780
101,500
98,000
89,780
(2NF tables)
•
In each table, there is only one key field, and knowing its value determines the
value of all other attributes
– Satisfies criteria for 2NF
– Far fewer repeats and duplicate editing problems
•
Note that there are still shortcomings, for example
–
if the mayor of city “big” changes, we must update two records
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
57
Third Normal Form (3NF)
•
3NF requires that no non-key field be a fact about another non-key field
– This will be violated if there is transitive dependency in a table
•
Transitive functional dependency occurs when the value in a non-key field is determined
by a value in another non-key field
• The value for city determines mayor, (and neither of these is the key field)
•
In 3NF, fields can only be attributes of the primary key, and not of some other field
– Tables not in 3NF usually have repeating values in a non-key field (e,g mayor field in
PRECINCT table)
– Mayor ‘Green’ is a fact about city (a non-key field), not about precinct (the key field)
PRECINCT TABLE (2NF)
Precinct
1001
1002
1004
COUNCILOR TABLE (3NF)
Precinct
Councillor
1001
Smith
1002
Jones
1004
Hassan
City
Big
Big
Little
Councilor City Mayor
Smith
Big Green
Jones
Big Green
Hassan Little White
MAYOR TABLE (3NF)
City
Mayor
Big
Green
Little
White
Precinct Table (in 2NF) is split into Councilor and Mayor tables in 3NF
3/28/2016
Ron Briggs, UT-Dallas
GISC 6383 GIS Management and Implementation
58
4th and 5th Normal Form
•
•
•
Our goal is usually to have all tables in at least 3rd normal form
4th and 5th normal forms also exist, but these can add disadvantages (for
example, processing inefficiency) as well as advantages
For example, 5th Normal form has no duplicated data, but requires junction
tables to link data and form relationships
PRECINCT TABLE (5NF)
Precinct
Voter_count
1001
3245
1002
5600
1004
2001
1005
750
COUNCILOR TABLE (5NF)
Councilor
Age
Smith
27
Jones
85
Hassan
43
CITY TABLE (5NF)
City
Mayor Population
Big
Green 500,000
Little
White
5,000
Junction Tables
Precint
1001
1002
1004
1005
3/28/2016
Councilor
Smith
Jones
Hassan
Hassan
Ron Briggs, UT-Dallas
Councilor
Smith
Jones
Hassan
City
Big
Big
Little
GISC 6383 GIS Management and Implementation
59