Database Design

Download Report

Transcript Database Design

Databases, Spatial
Databases, and PostGIS
Overview of relational database
concepts and PostGIS
DBMS Perspective
From Spatial Databases, A Tour, by Shekhar and Chawla
Role of DBMS

Typical small system architecture:
Role of DBMS

Larger spatial DBMS
From “Introduction to Spatial Data Management with Postgis,” by Arnulf Christl http://www.ccgis.de,
http://www.mapbender.org/
Definitions
Database Management Systems (DBMS) –

“Software that controls the organization, storage, retrieval, security
and integrity of data in a database. It accepts requests from the
application and instructs the operating system to transfer the
appropriate data.” (Computer Desktop Encyclopedia, 2007)

Key features:
–
–
–
–
–

Data Security
Data Integrity
Interactive Query
Interactive Data Manipulation
Data Independence
May be Network, Hierarchical, Object, Relational.
Relational is by far the most commonly-used and wellestablished, and handles most data management problems very
well.
Definitions
Relational Database Management Systems (RDBMS)–

“A database that maintains a set of separate, related files
(tables), but combines data elements from the files for
queries and reports when required. The concept was
developed in 1970 by Edgar Codd, whose objective was to
accommodate a user's ad hoc request for selected data.”
[See Codd Article for details.] (Computer Desktop Encyclopedia,
2007)

Data stored in separate tables, each containing tabular data
like a spreadsheet, joined together as needed.

In the early days of RDBMS, many vendors claimed to offer
relational databases when they did not – Codd came up
with 12 rules defining the requirements for a database
system to be truly relational.
Tables
Relational tables have these properties:

Column Values Are of the Same Kind

Each Row is Unique

The Sequence of Columns is not significant

The Sequence of Rows is not significant

Each column must have a unique name (within the table)

Represents a single entity. Example:
Columns
Table columns have these properties:

They have a data type (similar to variables, but slightly different) – char,
varchar, number, float, text, BLOB (image), etc.

Column values should be independent from each other.

Values may be required (not null), or nullable. (Some databases differentiate
between null and zero-length string).

Columns may be indexed to improve access speed.

A column may be used as the basis of the order of the physical data (clustered
index).

Each represents an independent attribute.
Primary Keys

The primary key is an attribute or a set of attributes that uniquely
identify a specific instance of an entity.

Every entity in the data model must have a primary key whose
values uniquely identify instances of the entity.

To qualify as a primary key for an entity, an attribute must have the
following properties:
– it must have a non-null value for each instance of the entity.
– the value must be unique for each instance of an entity.
– the values must not change or become null during the life of each entity
instance
Composite and Artificial Keys

Sometimes more than one attribute is required to
uniquely identify an entity. A primary key that made up
of more than one attribute is known as a composite
key.

An artificial key is one that has no intrinsic meaning.
These can be very useful when no attribute has all the
primary key properties, or the meaning of the primary
key is otherwise complicated or conditional.
Foreign Keys

A foreign key is an attribute that defines a relationship
between two tables. Foreign keys provide a method for
maintaining integrity in the data (referential integrity). Every
relationship between entities must be supported by a foreign
key.

A foreign key is an attribute in one table and a primary key in
another.

Typically, an RDBMS will let you specify whether dependent
entities are deleted (cascade delete) when a parent is deleted,
or disallowed. This ensures the referential integrity of the
database.

Foreign keys make possible establishing relationships in an
Entity-Relationship Diagram [see sample below].
Primary and Foreign Keys

Example:
http://msdn.microsoft.com/en-us/library/ms171971.aspx
Indexes
Disk reads are expensive.
Finding a record in a large table can be slow,
often requiring a full table scan, that is, checking
every row in the table.
 Indexes allow for much more efficient searching.
 If the performance of a query is bad, most of
the time it can be fixed with better indexes.
 Indexes tend to decrease query time, but
increase insert/delete time, since they need to
be adjusted (automatically) when the data
changes.


Indexes
From Spatial Databases, A Tour, by Shekhar and Chawla
Indexes

Example of an index minimizing disk
access:
Indexes





Spatial data doesn’t have a simple natural ordering like most
traditional data types.
In PostGIS, interacting with spatial indexes is similar to other data
types, but internally is handled differently using the Generalized
Search Tree algorithm.
This allows flexible partitioning based on the kind of data being
indexed, rather than on an existing alphanumeric sequence
PostGIS has implemented spatial partitioning algorithms for GIST
indexes.
http://gist.cs.berkeley.edu/gist1.html
Other Features of RDBMS









Triggers
Transaction Support
Stored Procedures
Views
User-Defined Functions
User-Defined Data Types
Extended features:
– Full-text search
– Spatial data
Replication
Others…
Normalization

The most elementary structure for a database is a single table, which
corresponds to a spreadsheet or attribute table.

In this model there is one kind of entity (consisting of rows) which has a
uniform set of attributes (consisting of columns).

This is fast and simple, but very limited.

Example of un-normalized data – NYC PLUTO tax lot data:
Normalization

While this might adequately capture the data we need, it has some
drawbacks:
– There will be lots of redundant data if many parcels are added.
The borough and owner names, for example, will all be
replicated for each row where these values are the same. This
makes maintaining the data difficult.
– What happens if a single parcel is zoned for more uses than
the number of columns allow? Or if one parcel has several
owners?
– How do you know for sure which parcel is which?
Normalization

The solution is to begin normalizing the data. You can look at this as a three-step process
implementing each of Codd’s “Normal Forms”.

First normal form rules:


–
A row of data cannot contain repeating groups of similar data (atomicity); and
–
Each row of data must have a unique identifier or primary key.
Second normal form rules:
–
No attribute can be dependent on only a portion of the primary key.
–
Every column must depend only on the entire primary key; if it is dependent on one or
more other columns, these should be moved into new tables.
Third normal form:
–
No dependencies within non-key attributes.
Normalization




Normalization usually involves creating multiple tables,
each of which complies with the three normal forms.
A normalized version of the PLUTO data, for example,
would likely have separate zoning and owner tables.
However, in practice it is often useful to de-normalize the
database structure. The PLUTO data in its current form
is extremely usable, and de-normalizing it for usability
makes sense.
Typically reporting applications will use some form of denormalization to improve query performance, while more
transactional applications will use normalized database
structures to minimize redundant updates.
SQL

Structured Query Language is the interface you use to communicate with an
RDBMS.

It consists of Queries; DML – data manipulation language; and DDL – data
definition language.

SQL is standardized, but different DB vendors have different flavors and
extensions (Oracle Spatial, for example, adds spatial keywords to SQL).

SQL is not really a full featured language like C#, although most database
vendors have SQL-based languages like PL/SQL or TSQL that let you embed SQL
statements directly in procedural code.

Typically, a program will interact with a database by submitting SQL statements,
one by one, to a database using a data access layer that sends the requests to
the database and returns the results to the program.
SQL
DDL statements - examples:

CREATE TABLE - creates a new database table

ALTER TABLE - changes a database table’s structure

DROP TABLE - deletes a database table

CREATE INDEX - creates an index

DROP INDEX - deletes an index
SQL
Queries:

SELECT – extracts data from a database table
DML statements:

UPDATE - updates data in a database table

DELETE - deletes data from a database table

INSERT - inserts new data into a database table
SQL
Return all the rows in a table:
SELECT * from TableName
Order by a field:
SELECT * from TableName
ORDER BY ColumnName
Add a where clause:
SELECT * from TableName
WHERE ColumnName < 5
ORDER BY ColumnName
Joining two tables:
SELECT Table1.Column1, Table2.Column1
FROM Table1 INNER JOIN Table2
ON Table1.Id = Table2.Id
Others:
SELECT DISTINCT Column1 from TableName
SELECT COUNT(*) from TableName
see this discussion for others.
http://db.grussell.org/imp.html
Data Access Layer



Databases come with interactive SQL query interfaces.
Programmatic interfaces are provided by the DB vendors
as well as third party vendors.
There are many different standards and conventions
such as ODBC, ADO, ADO.NET, JDBC, OLE DB (to see
a list of Microsoft’s for example, see
http://msdn2.microsoft.com/enus/library/ms810810.aspx.
Data Access Layer

Typical application sequence:
– Connect to database
 Requires editable string with database
connection info; usually stored in
configuration file.
 Requires exception handling in case
connection is unsuccessful!
– Execute SQL
– Iterate through results (if query)
 Display/output data
– Close connection
Data Access Layers

ADO.NET is commonly used in C#:
SQLServerConnection Conn = new SQLServerConnection("host=nc-star;port=1433; User
ID=test01;Password=test01;
Database Name=Test");
try
{
Conn.Open();
}
catch (SQLServerException ex)
{
Console.WriteLine(ex.Message); return;
}
try
{
string strSQL = "SELECT ename FROM emp WHERE sal>50000";
SQLServerCommand DBCmd = new SQLServerCommand(strSQL, Conn);
SQLServerDataReader myDataReader;
myDataReader = DBCmd.ExecuteReader();
while (myDataReader.Read())
{
Console.WriteLine("High salaries: " + myDataReader["ename"].ToString());
}
myDataReader.Close();
Conn.Close();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
return;
}
Spatial Data

Spatial data can be represented using a relational database structure.
From Spatial Databases, A Tour,
by Shekhar and Chawla
But this is complex and inefficient, requiring multiple tables to capture
topology.
 Usually spatial data can best be captured by a binary column that contains
all of the geometry associated with a record.
 A table in a spatial database will usually have a single geometry column of
a single type, although a generic Geometry column can support multiple
geometry types.

Spatial Databases

Spatial databases are usually built on top of RDBMS.

Spatial requirements are more involved, including
referential integrity based on topological geometrical
relationships, not just foreign-key constraints.

OGC has a SQL spec for simple features storage very
similar to the simple features spec used for geo tools:
http://www.opengeospatial.org/standards/sfs

PostGIS complies with OGC spec.
Spatial Databases
Sample join with spatial columns:
From Spatial Databases, A Tour, by Shekhar and Chawla
Spatial Databases
Mechanics of a spatial query:
From Spatial Databases, A Tour, by Shekhar and Chawla
PostGIS
PostGIS is the spatial add-on for PostgreSQL
 Compares well to others but is free!
(www.postgresql.org).
 Based on relational structure with support for spatial
data types and spatial functions.
 Can be used to serve up map layers.
 Can also be used for typical database transactional
functions.
 No intrinsic support for raster data, unlike ArcSDE or
Oracle Spatial.

PostGIS


RDBMS use database tables and columns to represent the database
structure itself.
PostGIS supports GIS data by adding additional metadata tables:
– spatial_ref_sys – contains a definition for each available
coordinate system and projection. These can be added or edited
as needed.
– geometry_columns – contains a definition of each geometry
column in the current database.


http://www.opengis.org/techno/interop/EPSG2WKT.TXT
http://www.opengis.org/techno/specs.htm
PostGIS

PostGIS geometry is based on the Open Geospatial
Consortium simple features specification:
http://www.opengeospatial.org/standards/sfs
PostGIS

Uses human-readable well-known text format to
represent geometry (converts to binary internally):
From “Introduction to Spatial Data Management with Postgis,” by Arnulf Christl http://www.ccgis.de,
http://www.mapbender.org/
PostGIS

Sample PostGIS SQL - note special role of
AddGeometryColumn, which manages
geometry_columns table.
From “Introduction to Spatial Data Management with Postgis,” by Arnulf Christl http://www.ccgis.de,
http://www.mapbender.org/
PostGIS

Examples of supported functions – Comparisons:
– ST_Equals, ST_Disjoint, ST_Intersects, ST_Touches,
ST_Crosses, ST_Within, ST_Contains, and
ST_Overlaps.

Set operations:
– ST_Intersection, ST_Difference, ST_Union,
ST_SymDifference, ST_Buffer, and ST_ConvexHull

Others:
– AsText, GeometryFromText, Transform (re-project).
PostGIS
Non-spatial data can be loaded from the SQL interface using the Copy
command.
 Spatial data can be loaded using a command-line utility that comes with the
database, or a GUI shape file loader.

From “Introduction to Spatial Data Management with Postgis,” by Arnulf Christl http://www.ccgis.de,
http://www.mapbender.org/
PostGIS

Data can be exported from PostGIS using the query
interface.

Spatial data can be exported to a shape file using the
pgsql2shp command-line utility. The basic syntax is:
psql2shp [<options>] <database> <SQL query>

See http://www.bostongis.com/postgis_quickguide_1_4.bqg
PostGIS

PGAdmin is easy-to-use admin interface for Postgresql
database. It includes a pane for DDL and an interactive
SQL query tool.
PostGIS

Sample application: NYC Solar Map
– Vector data and raster tile metadata stored in
PostGIS database
– Raster tiles stored on file system
– Data accessed managed by C# web services
– Web services communicate with web
application via JSON (Javascript object
notation), with GeoJSON extensions.
– Client application uses Javascript to display
map layers and query C# web service