Relational Databases

Download Report

Transcript Relational Databases

Quick Lesson on Databases
• Relational databases are key to
managing complex data
• You’ve been using relational databases
with “Joins” and “Relates” in ArcGIS
• GeoDatabases are relational databases
• Structured Query Language (SQL) is the
primary language for relational
databases
• You’ve been using SQL statements in
ArcGIS to query data
Relational Databases
• Need to represent data with a complex
structure
Plot
Species
Tree
Database Tables
• What you’ve seen in ArcGIS only more
flexible
• Tables are made up of “fields” (columns)
and “records” (rows)
• Queries are used to combine and subset
tables into new tables
• Each table should have a unique,
integer, ID, referred to as a primary key
– Greatly improves query performance
Field Data Types
• Numeric
– Float or integer
– Auto numbered, use for primary keys
• Dates
– YYYY-MM-DD HH:MM:SS.SS
– 2013-04-05 14:23:12.34
• Text
– Specified width
– “Variant” width
• Binary Large Objects (BLOB)
What’s Wrong With This?
Tree Query
LAT
LON
MEASYEAR MEASMON MEASDAY COMMON_NAME HT
45.446392
-122.236107
1995
6
22 Douglas-fir
49
45.446392
-122.236107
1995
6
22 Douglas-fir
27
45.446392
-122.236107
1995
6
22 Douglas-fir
95
45.446392
-122.236107
1995
6
22 Douglas-fir
66
45.446392
-122.236107
1995
6
22 Douglas-fir
118
45.446392
-122.236107
1995
6
22 Douglas-fir
76
45.446392
-122.236107
1995
6
22 Douglas-fir
147
45.456116
-122.397774
1995
6
22 Douglas-fir
185
45.456116
-122.397774
1995
6
22 Douglas-fir
105
45.456116
-122.397774
1995
6
22 Douglas-fir
105
45.456116
-122.397774
1995
6
22 Douglas-fir
89
45.193054
-122.51667
1996
6
23 Douglas-fir
90
45.193054
-122.51667
1996
6
23 Douglas-fir
95
45.193054
-122.51667
1996
6
23 Douglas-fir
96
45.193054
-122.51667
1996
6
23 Douglas-fir
99
Relational Databases
• Allow us to “relate” tables to:
– Reduce the overall amount of data
• Removes duplicates
– Makes updates much easier
– Improves search speeds
Entity-Relationship Diagram
• ERD
– Unified Markup Language (UML)
Relationship Types
One to one
One to many
Many to many
Entities
Plot
Relationships
Species
Tree
Plot
ID
Lat
Lon
Year
Month
Day
1
45.446392
-122.236107
1995
6
22
2
45.193054
-122.51667
1995
6
22
Tree
Species
ID
Common Name
ID
PlotID
SpeciesID Height
1
Douglas-fir
1
1
1
49
2
Ponderosa Pine
2
1
1
27
3
1
1
95
4
1
1
66
5
1
1
118
…
1
…
…
12 2
1
90
13 2
1
95
Primary Key
Foreign Key
Database Normalization
1. Eliminate duplicate columns from the
same table
2. Move fields that have “duplicate” row
entries and move them to a related
table
3. All field entries should be dependent on
the primary key
4. There should be only one primary key in
each table
Database Dictionary
• Defines each of the tables and fields in a
database
• A database forms the basis for data
management behind many GIS projects,
web sites, and organizations
• Proper documentation is key to long term
success!
– Database design (including ERDs)
– Database Dictionary
Geospatial Databases
• Not required to store spatial data!
• Provide:
– Field types for spatial data: point, polyline,
polygon, etc.
– Spatial operations: union, intersect, etc.
– Spatial queries: return records that overlap
with a polygon, etc.
– Some provide spatial reference control
Relational Databases
• Enterprise-Level
– SQL Server
– PostgreSQL
– MySQL
– Oracle
– Sybase
• File-Level
– Geodatabase
– MS-Access
What we really want
• What we need from a database:
– Distributed, concurrent access
(concurrency)
– Automatic Backup
– Version control
– Unlimited amounts of data
– Quick data access
– Inexpensive
– Broad OS Support
– File-level copying
– GeoSpatial queries, operations, data types
What we have
SQL Server
PostgreSQL
ESRI
Geodatabase
MS-Access
Concurrency
Yes
Yes
No
No
Automatic
backup
Yes
Yes
No
No
Versioning
No
No
No
No
Data Size
100s of millions
100s of millions
100,000?
100,000?
Performance
Fast
Fast
Good
Poor
Cost
$600 per CPU
Free
~$10,000
w/ArcGIS
~$400
OS
Windows
Any
Windows
Windows
File-level copy
No
No
Yes
Yes
Spatial Queries
Yes
Yes
Yes
No
Spatial data
types
Yes
Yes
Yes
No
Spatial
operations
Yes
Yes
Yes
No
Structured Query Language (SQL)
• Comes from the database industry
• “INSERT”, “DELETE”, and “SELECT”
rows in tables
• Very rich syntax
• Portions of “SELECT” grammar used
heavily in ArcGIS:
– Selecting attributes
– Raster calculator
– Geodatabases
Transaction SQL
• “SQL” is a subset of T-SQL
• T-SQL allows full management of a
database:
– Create & drop:
• Tables, fields/columns, relationships, indexes,
views, etc.
– Administrative functions
• Varies some between databases
Using SQL
• All Databases have “query editors” that
allow us to write, save, edit, and use
SQL queries
• Use programming languages to “write”
queries and “fetch” records from the
database
SQL: SELECT
SELECT Field1, Field2
FROM TableName
JOIN TableName2
WHERE Filter1 AND Filter 2
GROUP BY Field1,Field2
ORDER BY Field1 [DESC], Field2 [DESC]
Selecting Fields
• SELECT *
– Returns all fields as new table
• SELECT Field1,Field2
• SELECT Table1.Field1,Table2.Field1
– Return specified fields
• SELECT Table1.Field1 AS NewName
– Avoids name collisions
Plot
ID
Lat
Lon
Year
Month
Day
1
45.446392
-122.236107
1995
6
22
2
45.193054
-122.51667
1995
6
22
Tree
Species
ID
Common Name
ID
PlotID
SpeciesID Height
1
Douglas-fir
1
1
1
49
2
Ponderosa Pine
2
1
1
27
3
1
1
95
4
1
1
66
5
1
1
118
…
1
…
…
12 2
1
90
13 2
1
95
Example 1: All Fields
• SELECT * FROM Tree
• Returns all the records and fields in tree
ID
PlotID
SpeciesID Height
1
1
1
49
2
1
1
27
3
1
1
95
4
1
1
66
5
1
1
118
…
1
…
…
12 2
1
90
13 2
1
95
Example 2: Specific Fields
SELECT PlotID, Height
FROM Tree
• Returns all rows but
only specified fields
PlotID
Height
1
49
1
27
1
95
1
66
1
118
1
…
2
90
2
95
Example 3: Speific Rows
SELECT PlotID, SpeciesID
FROM Tree
• WHERE Height>50
• Returns all rows but only specified fields
PlotID
Height
1
95
1
66
1
118
1
…
Selecting Tables
• FROM Table1
– Returns contents of one table
• FROM Table1 INNER JOIN Table2 ON
Table2.ForeignKey=Table1.PrimaryKey
– Returns records from Table2 that match
primary keys in Table1
– Does not return all rows in Table1
Example 4: Joining
SELECT PlotD,Lat,Lon,Height
FROM Trees
INNER JOIN Plots
ON Trees.PlotID=Plots.ID
PlotID
Height
Lat
Lon
1
49
45.446392
-122.236107
1
27
45.446392
-122.236107
1
95
45.446392
-122.236107
1
66
45.446392
-122.236107
…
Example 4: Joining
SELECT PlotD, Lat, Lon, Height, Common_Name
FROM Trees
INNER JOIN Plots
ON Trees.PlotID=Plots.ID
INNER JOIN Species
ON Trees.SpeciesID= Species.ID
PlotID
Height
Lat
Lon
Common_Name
1
49
45.446392
-122.236107
Douglas-fir
1
27
45.446392
-122.236107
Douglas-fir
1
95
45.446392
-122.236107
Douglas-fir
1
66
45.446392
-122.236107
Douglas-fir
…
Selecting Tables (con’t)
• FROM Table1 OUTER JOIN Table2 ON
Table2.ForeignKey=Table1.PrimaryKey
– Returns all matches between Table1 and
Table2 and any records in Table1 that don’t
match records in Table2
– Missing values are NULL
Filters or “WHERE” clauses
SELECT *
FROM Table1
WHERE (Field1 Operator Value1)
BooleanOperator (Field1 Operator Field2)
Filter Examples
• WHERE:
– ID = 1
– Area < 10000
– Area <= 10000
– Name = “Crater Lake” (case dependent)
– Name LIKE “Crater Lake” (ignores case,
except in PostgreSQL!)
• Notice:
– String values have double quotes
– Syntax for strings vary some between
databases
SQL Comparisons
•
•
•
•
•
•
•
Equals: =
Greater than: >
Less than: <
Greater than or equal: >=
Less than or equal: <=
Not equal: <>
Like: case independent (except in
PostgreSQL), string comparison with
wild cards (%)
– In PosgreSQL use “upper(..)” or “lower(..)”
Boolean Operators
A
B
A AND B
A OR B
NOT A NOT B
T
T
T
T
F
F
T
F
F
T
F
T
F
T
F
T
T
F
F
F
F
F
T
T
More Complex Filter Examples
• WHERE:
– Name LIKE “Hawaii” AND Area < 10000
– Species LIKE “Ponderosa” AND DBH > 1
ORDER BY
SELECT *
FROM Table 1
ORDER BY LastName DESC, FirstName
DESC
• Careful with performance on large
datasets and string fields
GROUP BY
• Aggregates data
SELECT Species ,AVG(Height)
FROM Trees
GROUP BY Species
• Only aggregated fields can appear in
SELECT list
SQL INSERT
• INSERT INTO TableName
(Field1,Field2) VALUES
(Value1,”Value2”)
• String values must be in quotes
– Other values can also be in quotes
• If the table has an “auto numbered” ID
field, it will be added automatically
• Otherwise, very difficult to set the ID field
SQL DELETE
DELETE FROM TableName
WHERE ID=Value
- Deletes one row
DELETE FROM Plot
WHERE PlotID=12
- Deletes all rows with PlotID=12
DELETE FROM TableName
- Deletes everything in TableName!
Database Performance
Primary Key Search
Indexes
• Added to a table
– Typically for one field
• Adds overhead to INSERT and
DELETEs
• Important for:
– Large tables
– Complex queries
– Especially text searches!
Maintaining Performance
• Always use integer, auto numbered
primary keys
• Avoid iterative or hierarchical queries
• Sometimes code is faster:
– Do simple query, load into RAM and sort
• With REALLY big data, don’t use SQL
– NoSQL, accessing data directly, without the
use of a relational database package
– There are “NoSQL” products in the works
• Avoid text searches and sorts
Rasters and Databases
• Don’t put rasters into a database!
– Makes it impossible to backup and restore
the database
– Put a file path to the rasters in the database