11. Building Information Systems

Download Report

Transcript 11. Building Information Systems

DATA
RESOURCE
MANAGEMENT
Data Hierarchy in a
Computer System
Entitities and Attributes
Problems with the Traditional
File Environment

Data redundancy

Program-Data dependence

Lack of flexibility

Poor security

Lack of data-sharing and availability
Traditional File Processing
Figure 7-3
Database Management System
(DBMS)
• Creates and maintains databases
• Eliminates requirement for data definition
statements
• Acts as interface between application
programs and physical data files
• Separates logical and physical views of data
The Contemporary
Database Environment
Components of DBMS
• Data definition language: Specifies
content and structure of database and
defines each data element
• Data manipulation language:
Manipulates data in a database
• Data dictionary: Stores definitions of
data elements, and data characteristics
Sample Data Dictionary Report
Relational Data Model
Figure 7-6
Three Basic Operations in a
Relational Database
• Select: Creates subset of rows that meet
specific criteria
• Join: Combines relational tables to provide
users with information
• Project: Enables users to create new tables
containing only relevant information
Three Basic Operations in a
Relational Database
Figure 7-7
FLAT FILE – NOT NORMALIZED
A Normalized Relation of ORDER
Ensuring Database Integrity
 Database integrity involves the maintenance
of the logical and business rules of the
database.
 There are two kinds of “DB Integrity” that
must be addressed:


Entity Integrity
Referential Integrity
Entity Integrity
 Entity integrity deals
with within-entity rules.
 These rules deal with
ranges and the
permission of null
values in attributes or
possibly between
records
Examples of Entity Integrity
 Data Type Integrity: very common and most
basic. Checks only for “data type”
compatibility with DB Schema, such as:
numeric, character, logical, date format, etc.
 Commonly referred to in GIS manuals as:

Range and List domains


Ranges - acceptable Numeric ranges for input
List - acceptable text entries or drop-down lists.
Enforcing Integrity
 Not a trivial task!
 Not all database management systems or
GIS software enable users to “enforce data
integrity” during attribute entry or edit
sessions.
 Therefore, the programmer or the Database
Administrator must enforce and/or check for
“Integrity.”
Referential Integrity
 Referential integrity concerns two or more
tables that are related.
 Example: IF table A contains a foreign key
that matches the primary key of table B
THEN values of this foreign key either match
the value of the primary key for a row in table
B or must be null.
 Necessary to avoid: Update anomaly, Delete
anomaly.
Querying Databases:
Elements of SQL
Basic SQL Commands
 SELECT: Specifies columns
 FROM: Identifies tables or views
 WHERE: Specifies conditions
Using SQL- Structured Query
Language
 SQL is a standard database protocol,
adopted by most ‘relational’ databases
 Provides syntax for data:




Definition
Retrieval
Functions (COUNT, SUM, MIN, MAX, etc)
Updates and Deletes
SQL Examples
 CREATE TABLE SALESREP

Item definition expression(s)

{item, type, (width)}
 DELETE table

WHERE expression
Data Retrieval
 SELECT list FROM table WHERE condition
 list - a list of items or * for all items



WHERE - a logical expression limiting the
number of records selected
can be combined with Boolean logic: AND,
OR, NOT
ORDER may be used to format results
UPDATE tables
 SET item = expression
 WHERE expression
 INSERT INTO table
 VALUES …..
Database Normalization
 Normalization: The process of structuring data to
minimize duplication and inconsistencies.
 The process usually involves breaking down a single
Table into two or more tables and defining
relationships between those tables.
 Normalization is usually done in stages, with each
stage applying more rigorous rules to the types of
information which can be stored in a table.
Normalization
 Normalization: a process for analyzing the
design of a relational database

Database Design - Arrangement of attributes
into entities
 It permits the identification of potential
problems in your database design
 Concepts related to Normalization:

KEYS and FUNCTIONAL DEPENDENCE
Ex: Database Normalization (1)
 Sample Student
Activities DB Table
 Poorly Designed

Non-unique records
 John Smith
 Test the Design by
developing sample
reports and queries
Ex: Database Normalization (2)
 Created a unique “ID” for
each Record in the
Activities Table
 Required the creation of
an “ID” look-up table for
reporting (Students Table)
 Converted the “Flat-File
into a Relational
Database
Ex: Database Normalization (3)
 Wasted Space
 Redundant data entry
 What about taking a 3rd
Activity?
 Query Difficulties - trying
to find all swimmers
 Data Inconsistencies conflicting prices
Ex: Database Normalization (4)
 Students table is fine
 Elimination of two
columns and an
Activities Table
restructuring, Simplifies
the Table
 BUT, we still have
Redundant data
(activity fees) and data
insertion anomalies.
Problem: If student #219
transfers we lose all references
to Golf and its price.
Ex: Database Normalization (5)
 Modify the Design to
ensure that “every nonkey field is dependent
on the whole key”
 Creation of the
Participants Table,
corrects our problems
and forms a union
between 2 tables.
This is a Better Design!
The Normal Forms
 A series of logical steps to take to normalize
data tables
 First Normal Form
 Second
 Third
 Boyce Codd
 There’s more, but beyond scope of this
First Normal Form (1NF)
 All columns (fields) must be atomic

Means : no repeating items in columns
OrderDate
11/30/1998
OrderDate
11/30/1998
Customer
Joe Smith
Customer
Joe Smith
Items
Hammer, Saw, Nails
Item1
Hammer
Item2
Saw
Item3
Nails
Solution: make a separate table for each set of
attributes with a primary key (parser, append query)
Customers
CustomerID
Name
Orders
OrderID
Item
CustomerID
OrderDate
Second Normal Form (2NF)
 In 1NF and every non-key column is fully dependent
on the (entire) primary key

Means : Do(es) the key field(s) imply the rest of the fields? Do we
need to know both OrderID and Item to know the Customer and
Date? Clue: repeating fields
OrderID
1
1
1
Item
Hammer
Saw
Nails
CustomerID
1
1
1
OrderDate
11/30/1998
11/30/1998
11/30/1998
Solution: Remove to a separate table (Make Table)
Orders
OrderID
CustomerID
OrderDate
OrderDetails
OrderID
Item
Third Normal Form (3NF)
 In 2NF and every non-key column is mutually
independent

means : Calculations
Item
Hammer
Saw
Nails
Quantity
2
5
8
Price
$10
$40
$1
Total
$20
$200
$8
•Solution: Put calculations in queries and forms
OrderDetails
OrderID
Item
Quantity
Price
Put expression in text control or in query:
=Quantity * Price
Data Warehousing and
Datamining
Data warehouse
 Supports reporting and query tools
 Stores current and historical data
 Consolidates data for management analysis and
decision making
What is a Data Warehouse?
"A warehouse is a subject-oriented, integrated, time-variant
and non-volatile collection of data in support of
management's decision making process".
Bill Inmon (1990)
"A Data Warehouse is a repository of integrated information,
available for queries and analysis. Data and information
are extracted from heterogeneous sources as they are
generated.…”
Anonymous
Components of a Data Warehouse
Data Mining
 ON-LINE ANALYTICAL PROCESSING
(OLAP): ability to manipulate, analyze large
volumes of data from multiple perspectives
 MINING: Seeking relationships that are not
known in advance. A function of the software
and data organization.
DW Characteristics
 Subject Oriented:Data that gives information
about a particular subject instead of about a
company's ongoing operations.
 Integrated: Data that is gathered into the data
warehouse from a variety of sources and
merged into a coherent whole.
 Time Variant: All data in the data warehouse is
identified with a particular time period.
Data Acquisition
 The process of moving company data from
the source systems into the warehouse.
 Often the most time-consuming and costly
effort.
 Performed with software products known as
ETL (Extract/Transform/Load) tools.
 Over 50 ETL tools on market.
Data Cleansing
 Typically performed in conjunction with data
acquisition.
 A complicated process that validates and, if
necessary, corrects the data before it is
inserted.
 AKA "data scrubbing" or "data quality
assurance".