Transcript Slide 1
Introduction to Geographic Information Systems
Spring 2013 (INF 385T-28437)
Data Modeling,
Database Design
You are here
Food is there
Well, now it’s there
INF385T(28437) – Spring 2013 – Lecture 11
3
Outline
The design process
Describing a system of viewpoints
Developing use cases
GIS design stages
Conceptual design
Normalization example
Database diagrams
Building the database
Prototype to production
INF385T(28437) – Spring 2013 – Lecture 11
4
Best practices:
iterative, incremental development
Ideas, Use Cases,
Requirements
Planning, Cost-Benefit,
Risk Management
Analysis, Design
and Evaluation
Maintenance,
Evaluation
Deployment
INF385T(28437) – Spring 2013 – Lecture 11
Development,
Quality Control
and Evaluation
Note: This is only an
illustration – your experience
may differ!
5
Best practices: iterative approach
Focus on current issues
Short iterations better reflect near-term internal
and external environment
Resolve misunderstandings early
Resolve analysis, design and
implementation disparities early
More accurate overall project status
Workload better distributed across life cycle
Medium- to long-term changes in environment
factored in more easily through spiral approach
INF385T(28437) – Spring 2013 – Lecture 11
6
Starting out
Define the project
Get to know the client
Stakeholders and decision makers
Build the design team
Analyst
Subject matter experts
Users
Inventory tasks, products, data
Identify model
Develop Use Cases
Inventory data
INF385T(28437) – Spring 2013 – Lecture 11
GIS Design Seminar
Teach client GIS concepts
and design methods
Introduce yourself and the
design team
Set realistic expectations
for the GIS system
Alleviate fears and
concerns
7
Building a System… of Viewpoints
Community Objectives
Business aspects: purpose,
scope and policies
What for? Why? Who? When?
Enterprise
Viewpoint
Abstract/Best Practices
Information
sources
Information
and
Viewpoint
models
What is it about?
Computational
Viewpoint
Types of services and
protocols
How does each bit
work?
Implementation/Development
Engineering
Viewpoint
Technology
Viewpoint
Solution types:
distribution infrastructure
How do the components work
together?
Implementation system:
hardware, software, distribution
With what?
Viewpoints in “Reference Model - Open Distributed Processing (RM-ODP)” ISO/IEC 10746
Use cases
A description of a task you want the system to
perform
Use case
Add new water service, record parcel sale
Notify Owners
Basis of all analysis and design
Start simple; expand with detail later
Analysis of use cases yields
data, interfaces, applications
Use cases can:
Capture existing work flows
Define new applications
Help understand alternative
and pathological work flows
Data
GIS Database
INF385T(28437) – Spring 2013 – Lecture 11
9
Use case diagrams
System boundary
Show the actor/use case
relationships
System architecture
Data flows, coordination
Develop with users
Manage flood
control
structures
Operations &
Maintenance
Staff
During meetings, interviews
Clean up and refine later
Graphical notation is helpful,
but the use case document
is the most important artifact
Geocode &
map call list
GIS Analyst
Use case
and name
INF385T(28437) – Spring 2013 – Lecture 11
Produce
flood maps
Document
locations of
flooding
Emergency
Call Center
Actor
and name
10
Example use case
Some context: For emergency response application,
assume a set of use cases focused on exchanging information
leading to creating, updating, and posting flood maps to
interested agencies.
Each use case is
documented according
to a template, such as:
Use Case Name
Description:
Actors:
Pre-conditions:
Post-conditions:
Flow of events:
- business rules
- user actions, responses
Exceptions:
Alternates:
INF385T(28437) – Spring 2013 – Lecture 11
11
Example use case
Use case: Flooding Information & Response
Description: The call center receives and documents citizen calls
related to flooding during storm events. The information is geocoded,
mapped, and provided to the Water District’s Operation &
Maintenance staff, which makes decisions to manage control
structures to mitigate flooding.
Actors: Emergency Response Call Center, GIS Analyst, Operations
& Maintenance Staff.
Pre-conditions: The database of critical water facilities has been
created. The Emergency Call Center has been activated for a storm
event. Water district staff are operating under emergency operating
procedures.
Post-conditions: Status change notices are sent to the relevant
agencies registered to receive updates.
INF385T(28437) – Spring 2013 – Lecture 11
12
Use case primary scenario
1 Citizen places a call to the Emergency Call Center
hotline.
2 Call center staff document location and description of
flooding problem.
3 GIS analyst receives and geocodes locations from the
call center, producing a map of call locations.
4 Call reports are symbolized based on flooding issue.
Maps are produced and handed off to the operations
staff.
5 Operations & Maintenance staff review maps of flooding
incidents and make decisions for operating control
structures, gates, and pumps.
INF385T(28437) – Spring 2013 – Lecture 11
13
Are we finished yet?
What layers will you need based on the use
case?
Where will you get these layers?
Are there any changes that can be made to
the business process?
INF385T(28437) – Spring 2013 – Lecture 11
14
Using the use cases
From the set of use cases developed, the functional
requirements and interfaces can be fleshed out.
From an understanding of the collaborators and stakeholders
involved in the use cases, appropriate data sources and
maintenance authorities can be determined.
From a comparison of all the use cases, redundant information
and tasks can be discovered and minimized.
From an examination of potential alternate scenarios,
pathological situations can be anticipated and mitigated.
BUT… beware the use case time sink
You cannot completely and correctly document all the use cases
for a reasonably complex system in your lifetime
Keep it simple, and start prototyping as soon as you can – this will
further inform the use cases and keep your project moving
INF385T(28437) – Spring 2013 – Lecture 11
15
GIS DESIGN
INF385T(28437) – Spring 2013 – Lecture 11
Designing the database
Conceptual
model
Logical
model
Key
Project
Feature collection
Business
practices
Collect information,
identify desired
themes and
sources
INF385T(28437) – Spring 2013 – Lecture 11
Roads
Network
Rail
Topology
Boundaries
Relationship
Physical
model
table
table
table
Map themes to GIS
database elements:
define database
entities and
organization
Complete data
organization,
build full schema,
test and refine
17
GIS design practice
Think about the GIS features represented by
thematic layers, and about the integrity and
behavior of those features-
Parcels are represented as polygons.
Parcels share geometry with boundaries.
Parcels do not overlap.
… etc.
INF385T(28437) – Spring 2013 – Lecture 11
18
Conceptual design
Entities, general relationships, important
attributes
Sketches
ER/UML conceptual diagrams
Spreadsheets
Often reconstruct from existing systems/datasets
Very important for complex projects
Very useful to communicate with domain
experts/business people
INF385T(28437) – Spring 2013 – Lecture 11
19
Conceptual design
Purpose and usage of GIS
Data sources
coverages, shapefiles, CAD, etc.
compilation scale and accuracy
Spatial representation
raster, vector, surface, address
Attributes
required fields,
types of measurement
Relationships
network, topological, general
INF385T(28437) – Spring 2013 – Lecture 11
20
Conceptual design
It is important to understand what you want to
achieve from the outset
Symbology and labels
what symbols at which scales
text presentation on the map
Key
Spatial reference
projection and datum
the largest area mapped
required detail and resolution
Special design cases, for example:
condominium parcels
parcel annotation
INF385T(28437) – Spring 2013 – Lecture 11
21
Diagramming themes
Classic layer diagrams
Organize data into logical units
Focus on common data
elements to help determine:
Attributes
Associations
Spatial relationships
Water Use Application
Hydrology
Utilities
Boundaries
Roads
INF385T(28437) – Spring 2013 – Lecture 11
22
Documenting themes
Layer
Map use
Data source
Representation
Spatial relationships
Map scale, accuracy, currency
Symbology and annotation
Parcels
Parcels define land ownership and are used for taxation
Compiled from land ownership transactions and cadastral records
Polygons
Parcel polygons do not overlap
1:2400, +/- 5 ft, quarterly update
Labeled or annotated with house number and street name
Layer
Map use
Data source
Representation
Spatial relationships
Map scale, accuracy, currency
Symbology and annotation
Streets
Define the street centerline network
Public or commercial data products or various government agencies
Polylines
Streets intersect only at endpoints and generally do not overlap
1:12000, +/- 10 ft., semiannual update
Symbolized according to road classification, labeled with street name
2-23
Inventory existing data
Legacy data
Target data layers
Annotation
Boundaries
Lots
Parcels
PLSS Monument
PLSS Quarter
PLSS Section
PLSS Township
INF385T(28437) – Spring 2013 – Lecture 11
24
Inventory existing data
Model database schema from existing data
Bridge existing data with current technology, for example:
Legacy data
Target Data Layers
GIS Database
Annotation
Boundaries
Lots
Parcels
PLSS Monument
PLSS Quarter
PLSS Section
PLSS Township
INF385T(28437) – Spring 2013 – Lecture 11
• Boundaries hold survey
attributes
• Coverage parcel polygons
only exist for regions
25
Lecture 11
DATABASE DESIGN
INF385T(28437) – Spring 2013 – Lecture 11
Files, databases, and GIS
Data files contain text or other data in arbitrary formats
Data tables contain records with fields (attributes, data
items) identified by a primary key
Relational Database Management System (RDBMS or
just DBMS):
creates and maintains relationships between data tables
allows one or more users to create or edit data in the tables
allows users to sort, select, and retrieve information using
QUERIES and REPORTS
GIS adds a spatial dimension to databases, by integrating
location and geometric shape information with the tables
INF385T(28437) – Spring 2013 – Lecture 11
27
Relational database
A formal information model called “relational”
Tables can have formal & ad hoc relationships,
based on:
Ab 32
34
R
Rows and columns
Known column types
Relationships
SQL language and operators
Cd
12
9
A
Ef
17
11
X
xz
53
46
G
11
ed
4
w
12
vg
9
f
24
kl
2
c
12
op
2
v
Relational is based on a simple, generic model with
many implementations (MS Access, IBM DB2, Oracle,
MS SQLServer, and many others)
INF385T(28437) – Spring 2013 – Lecture 11
28
Data tables
Organized into columns, rows, and cells (like a spreadsheet)
Columns = attributes = fields = data items
Rows = records
Cells = values
Attribute
or Column
Record or Row
Cell or Value
INF385T(28437) – Spring 2013 – Lecture 11
29
Defining columns
To define a column or attribute, you must specify
the column name and type
All DBMS’s support basic types:
•
•
•
•
Number (integer, float, decimal)
String (text)
Boolean (Yes/No)
Date
Many DBMS (SQL-99) and GIS systems support
additional types (BLOB, XML, time series, …)
INF385T(28437) – Spring 2013 – Lecture 11
30
Primary key
The field or combination of fields that identifies each and
every record uniquely within a table
Note: Primary
key is more often
arbitrary,
meaningless to
users; main
purpose is to be
unique
3-31
Water use permit
example
• Paper-based
application form
required for permit to
withdraw surface or
ground water
3-32
Sample data for water use permits
INF385T(28437) – Spring 2013 – Lecture 11
33
use codes
INF385T(28437) – Spring 2013 – Lecture 11
34
Relational organization
Tables should be organized according to basic rules of
relational design for most efficient use.
Normalization is a series of steps followed to obtain a
database design that allows for efficient access and
storage of data in a relational database. These steps
reduce data redundancy and the chances of data
becoming inconsistent.
3NF or BCNF are the usual standards for relational
database design, however performance and convenience
may drive toward de-normalization.
INF385T(28437) – Spring 2013 – Lecture 11
35
Database normalization steps
First Normal Form (1NF) eliminates repeating groups by putting each into
a separate table and connecting them with a one-to-many relationship.
Second Normal Form (2NF) eliminates functional dependencies on a
partial key by putting the fields in a separate table from those that are
dependent on the whole key.
Third Normal Form (3NF) eliminates functional dependencies on non-key
fields by putting them in a separate table. At this stage, all non-key fields
are dependent on the key, the whole key and nothing but the key.
Boyce-Codd Normal Form (BCNF) is sometimes applied as a stronger
form of 3NF in which every determinant of a functional dependency within a
relation must be a candidate key for the schema.
Fourth Normal Form (4NF) separates independent multi-valued facts
stored in one table into separate tables.
Fifth Normal Form (5NF) breaks out data redundancy that is not covered
by any of the previous normal forms.
source - http://www.hyperdictionary.com/dictionary/database+normalisation
INF385T(28437) – Spring 2013 – Lecture 11
36
First normal form - NOT
Do you see any groups of repeating columns?
What’s wrong with that?
Can you think of a case where this is okay?
How would you reorganize to fix this?
INF385T(28437) – Spring 2013 – Lecture 11
37
First normal form
Foreign keys
Primary key
B
C
Primary key
A
The Use Code columns
can be removed from the
main table (A), and made
into rows of a separate
table (B), keyed by ActID.
(compare with previous slide)
INF385T(28437) – Spring 2013 – Lecture 11
(C) is a lookup table for
use code descriptions.
38
Relationship cardinality
With this design, one ActID can have any number of use
codes, and any one use code can be associated with
many ActID’s
This is called a Many-to-Many (M:M) relationship
This is much more space-efficient for data storage
One record in the Use Code Descriptions table (C) can
be associated with many records in the ActID-Use
Codes table (B)
This is called a One-to-Many (1:M) relationship
You may also have relationships with fixed cardinalities,
such as 1:1, 1:2, 1:0..5, etc.
Cardinality of 1:0 generally means “nulls are allowed”
INF385T(28437) – Spring 2013 – Lecture 11
39
Second normal form - NOT
Do you see any dependencies between non-key columns
and a partial key?
If the primary key were compound and included an OwnerID, there
could be such a dependency between Owner and OwnerID
Compound key
What’s wrong
with that?
What would
you do to
fix this?
INF385T(28437) – Spring 2013 – Lecture 11
40
Second normal form
Remove the non-key data
to a separate table and
link to it
… and clean up the data
while you’re at it!
Spelling, abbreviations,
punctuation
Firstname Lastname vs.
Lastname, Firstname
INF385T(28437) – Spring 2013 – Lecture 11
41
Third normal form - NOT
• Do you see any functional dependencies among
non-key fields in the table below?
» Need we ask again: what’s wrong with this?
• How would you reorganize to fix this?
3-42
Third normal form
Remove the source
description to a separate
table, and join using the
source code field
This will reduce duplication
of data (and errors)
INF385T(28437) – Spring 2013 – Lecture 11
43
Is that all there is to it?
Name
City
ST
PostalCode
John
Patty
Smith
Denver
Seattle
Vancouver
CO
WA
BC
80031
98107
V6C 1T2
This table is NOT in Third Normal Form:
○ The PostalCode field is dependent on the City and ST fields
To place this table in 3NF, a separate table would be created for the
City and ST fields, and joined using the PostalCode field
○ But this is generally not done with address & postal codes… WHY?
Normalization tradeoffs
When would you expect to normalize tables?
For primary data entry and updates; easier to set up and
manage data integrity validation
Such as name and address subfields
To support more kinds of ad hoc queries
When would you expect to denormalize?
For presentation of data to users
To reduce the number of table-joins for faster performance
Queries are known and fixed
Better performance for web publishing
Database views are often used to flatten relationship
structure for read-only access
INF385T(28437) – Spring 2013 – Lecture 11
45
Lecture 11
DATABASE DIAGRAMMING
INF385T(28437) – Spring 2013 – Lecture 11
Database diagramming:
conceptual / logical overview
Owners 1
*
Applications
* *
Use Codes
*
Database relationships and
cardinality can be diagrammed
for prototyping and
1
documentation
Use Code Descrips
The “*” on an association
link means “many”
INF385T(28437) – Spring 2013 – Lecture 11
47
Database diagramming:
Entity-Relationship (E-R) or Unified Modeling Language (UML)
Owners 1
*OwnerID
FirstName
LastName
Phone
StreetAddr
City
State
PostalCode
Applications * *
* *ApplicationID
OwnerID
ApplicationType
ProjectLocation
BusinessName
BusinessType
…
Use Codes
*RowID
UseCode
ApplicationID
*
1
Use Code Descrips
*UseCode
Description
INF385T(28437) – Spring 2013 – Lecture 11
48
Normalization tradeoff: referential integrity
Suppose you removed a record from the Owner
table
What should be done with the related records from the
Applications table?
Would this be easier or harder to manage than with the
de-normalized design on slide 31, “Sample data for water
use permits”?
The more tables are interconnected by
relationships, the greater the need to support
referential integrity within your applications
A DBMS’ default support for referential integrity may be
very basic, such as to place Nulls in associated foreign
key fields, but only when a relationship is declared
INF385T(28437) – Spring 2013 – Lecture 11
49
Lecture 11
INF385T(28437) – Spring 2013 – Lecture 11
BUILDING THE DATABASE
Prototype prototype prototype…
Critical for validating your data model and
applications
An easy way to discover project
requirements
Don’t plan a lot of time for this, just do it!
Prototype in the simplest environment to
learn the most in the least time
Validate that thematic choices, schema &
integrity rules support your requirements
Reduce data management overhead with
personal, single-user system
INF385T(28437) – Spring 2013 – Lecture 11
51
Database environments
Production/Publishing
Read-only copies of databases
Used by majority of users
Contains custom views of databases
Development/Maintenance
Where compilation and editing occur
Normalized for greater integrity enforcement
May have multiple environments by data model
(cadastral/land use, transportation, utilities, hydro…)
Design/Test
Prototype validation, load testing
Isolate testing changes to the development environment,
so as not to corrupt the development system
INF385T(28437) – Spring 2013 – Lecture 11
52
Large projects can seem like this…
etc, etc…
smoke alarm…
1. Burning toast…
…fill glass!
1-53
But they can be simplified with
common data models
Should have:
Simple structure with most common elements across a set of
applications in a user community
Minimal rules, custom behavior, or cross-dependencies
May include collections or sets or systems of feature classes,
e.g., networks, topologies, terrains
Should lend to:
Web distribution
Incremental, multi-user data updates
User-side fusion, densification, value-adding
http://support.esri.com/en/knowledgebase/techarticles/detail/40585
1-54
Summary
The design process
Describing a system of viewpoints
Developing use cases
GIS design stages
Conceptual design
Normalization example
Database diagrams
Building the database
Prototype to production
INF385T(28437) – Spring 2013 – Lecture 11
55