11. Building Information Systems
Download
Report
Transcript 11. Building Information Systems
DATA
RESOURCE
MANAGEMENT
Data Hierarchy in a
Computer System
Entitities and Attributes
Problems with the Traditional
File Environment
Data redundancy
Program-Data dependence
Lack of flexibility
Poor security
Lack of data-sharing and availability
Traditional File Processing
Figure 7-3
Database Management System
(DBMS)
• Creates and maintains databases
• Eliminates requirement for data definition
statements
• Acts as interface between application
programs and physical data files
• Separates logical and physical views of data
The Contemporary
Database Environment
Components of DBMS
• Data definition language: Specifies
content and structure of database and
defines each data element
• Data manipulation language:
Manipulates data in a database
• Data dictionary: Stores definitions of
data elements, and data characteristics
Sample Data Dictionary Report
Relational Data Model
Figure 7-6
Three Basic Operations in a
Relational Database
• Select: Creates subset of rows that meet
specific criteria
• Join: Combines relational tables to provide
users with information
• Project: Enables users to create new tables
containing only relevant information
Three Basic Operations in a
Relational Database
Figure 7-7
FLAT FILE – NOT NORMALIZED
A Normalized Relation of ORDER
Ensuring Database Integrity
Database integrity involves the maintenance
of the logical and business rules of the
database.
There are two kinds of “DB Integrity” that
must be addressed:
Entity Integrity
Referential Integrity
Entity Integrity
Entity integrity deals
with within-entity rules.
These rules deal with
ranges and the
permission of null
values in attributes or
possibly between
records
Examples of Entity Integrity
Data Type Integrity: very common and most
basic. Checks only for “data type”
compatibility with DB Schema, such as:
numeric, character, logical, date format, etc.
Commonly referred to in GIS manuals as:
Range and List domains
Ranges - acceptable Numeric ranges for input
List - acceptable text entries or drop-down lists.
Enforcing Integrity
Not a trivial task!
Not all database management systems or
GIS software enable users to “enforce data
integrity” during attribute entry or edit
sessions.
Therefore, the programmer or the Database
Administrator must enforce and/or check for
“Integrity.”
Referential Integrity
Referential integrity concerns two or more
tables that are related.
Example: IF table A contains a foreign key
that matches the primary key of table B
THEN values of this foreign key either match
the value of the primary key for a row in table
B or must be null.
Necessary to avoid: Update anomaly, Delete
anomaly.
Querying Databases:
Elements of SQL
Basic SQL Commands
SELECT: Specifies columns
FROM: Identifies tables or views
WHERE: Specifies conditions
Using SQL- Structured Query
Language
SQL is a standard database protocol,
adopted by most ‘relational’ databases
Provides syntax for data:
Definition
Retrieval
Functions (COUNT, SUM, MIN, MAX, etc)
Updates and Deletes
SQL Examples
CREATE TABLE SALESREP
Item definition expression(s)
{item, type, (width)}
DELETE table
WHERE expression
Data Retrieval
SELECT list FROM table WHERE condition
list - a list of items or * for all items
WHERE - a logical expression limiting the
number of records selected
can be combined with Boolean logic: AND,
OR, NOT
ORDER may be used to format results
UPDATE tables
SET item = expression
WHERE expression
INSERT INTO table
VALUES …..
Database Normalization
Normalization: The process of structuring data to
minimize duplication and inconsistencies.
The process usually involves breaking down a single
Table into two or more tables and defining
relationships between those tables.
Normalization is usually done in stages, with each
stage applying more rigorous rules to the types of
information which can be stored in a table.
Normalization
Normalization: a process for analyzing the
design of a relational database
Database Design - Arrangement of attributes
into entities
It permits the identification of potential
problems in your database design
Concepts related to Normalization:
KEYS and FUNCTIONAL DEPENDENCE
Ex: Database Normalization (1)
Sample Student
Activities DB Table
Poorly Designed
Non-unique records
John Smith
Test the Design by
developing sample
reports and queries
Ex: Database Normalization (2)
Created a unique “ID” for
each Record in the
Activities Table
Required the creation of
an “ID” look-up table for
reporting (Students Table)
Converted the “Flat-File
into a Relational
Database
Ex: Database Normalization (3)
Wasted Space
Redundant data entry
What about taking a 3rd
Activity?
Query Difficulties - trying
to find all swimmers
Data Inconsistencies conflicting prices
Ex: Database Normalization (4)
Students table is fine
Elimination of two
columns and an
Activities Table
restructuring, Simplifies
the Table
BUT, we still have
Redundant data
(activity fees) and data
insertion anomalies.
Problem: If student #219
transfers we lose all references
to Golf and its price.
Ex: Database Normalization (5)
Modify the Design to
ensure that “every nonkey field is dependent
on the whole key”
Creation of the
Participants Table,
corrects our problems
and forms a union
between 2 tables.
This is a Better Design!
The Normal Forms
A series of logical steps to take to normalize
data tables
First Normal Form
Second
Third
Boyce Codd
There’s more, but beyond scope of this
First Normal Form (1NF)
All columns (fields) must be atomic
Means : no repeating items in columns
OrderDate
11/30/1998
OrderDate
11/30/1998
Customer
Joe Smith
Customer
Joe Smith
Items
Hammer, Saw, Nails
Item1
Hammer
Item2
Saw
Item3
Nails
Solution: make a separate table for each set of
attributes with a primary key (parser, append query)
Customers
CustomerID
Name
Orders
OrderID
Item
CustomerID
OrderDate
Second Normal Form (2NF)
In 1NF and every non-key column is fully dependent
on the (entire) primary key
Means : Do(es) the key field(s) imply the rest of the fields? Do we
need to know both OrderID and Item to know the Customer and
Date? Clue: repeating fields
OrderID
1
1
1
Item
Hammer
Saw
Nails
CustomerID
1
1
1
OrderDate
11/30/1998
11/30/1998
11/30/1998
Solution: Remove to a separate table (Make Table)
Orders
OrderID
CustomerID
OrderDate
OrderDetails
OrderID
Item
Third Normal Form (3NF)
In 2NF and every non-key column is mutually
independent
means : Calculations
Item
Hammer
Saw
Nails
Quantity
2
5
8
Price
$10
$40
$1
Total
$20
$200
$8
•Solution: Put calculations in queries and forms
OrderDetails
OrderID
Item
Quantity
Price
Put expression in text control or in query:
=Quantity * Price
Data Warehousing and
Datamining
Data warehouse
Supports reporting and query tools
Stores current and historical data
Consolidates data for management analysis and
decision making
What is a Data Warehouse?
"A warehouse is a subject-oriented, integrated, time-variant
and non-volatile collection of data in support of
management's decision making process".
Bill Inmon (1990)
"A Data Warehouse is a repository of integrated information,
available for queries and analysis. Data and information
are extracted from heterogeneous sources as they are
generated.…”
Anonymous
Components of a Data Warehouse
Data Mining
ON-LINE ANALYTICAL PROCESSING
(OLAP): ability to manipulate, analyze large
volumes of data from multiple perspectives
MINING: Seeking relationships that are not
known in advance. A function of the software
and data organization.
DW Characteristics
Subject Oriented:Data that gives information
about a particular subject instead of about a
company's ongoing operations.
Integrated: Data that is gathered into the data
warehouse from a variety of sources and
merged into a coherent whole.
Time Variant: All data in the data warehouse is
identified with a particular time period.
Data Acquisition
The process of moving company data from
the source systems into the warehouse.
Often the most time-consuming and costly
effort.
Performed with software products known as
ETL (Extract/Transform/Load) tools.
Over 50 ETL tools on market.
Data Cleansing
Typically performed in conjunction with data
acquisition.
A complicated process that validates and, if
necessary, corrects the data before it is
inserted.
AKA "data scrubbing" or "data quality
assurance".