lecture3_13Sept2010

Download Report

Transcript lecture3_13Sept2010

Fundamentals of GIS
Lecture 3
Part 1.Important Database Concepts
Part 2. Queries
Lecture slides by Austin Troy, University of Vermont, © 2010, except where noted
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
How is Data Stored?
•People use number system with base 10
•Each digit corresponds to 10 to some power
•Hence a number with 3 digits has 103 or 1000
possibilities
•Why are computer values so often in multiples of eight?
•Because computers use a base 8 system of storing
numbers and values
•A byte is 8 “on-off switches” or bits
•Each switch/bit represents a binary number; one byte is
28 or 256 possibilities
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
How do binary numbers translate to real numbers?
•Switch combinations determine base ten number based
on the formula:
•N10= 2b-1+2b-2+…2b-b
•Where b= number of bits storing the number
•Hence the binary number
11111111 = 27*1+ 26 *1 + 25 *1 + 24 *1 + 23 *1 + 22
*1 + 21 *1 + 20 *1 = 255
•And the binary number
11111110 = 27*1+ 26 *1 + 25 *1 + 24 *1 + 23 *1 + 22
*1 + 21 *1 + 20 *0 = 254
2
10
2
10
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Another approach to coding numbers: ASCII
(American Standard Computer Info Index)
Based on Hexadecimal Numbering System
•4 bit or base sixteen (24) system for representing numbers
•0-9 =0-9 but 10-15= A,B,C,D,E,F
•Each digit represents up to 16 instead of 10
•So, the first digit in a two digit number xy= (16*x)+y
•Hence
•21h= (16*2) +1 = 33 = 00100001
10
2
•B2h= (16*11) + 2 =178 = 10110010
10
Lecture materials by Austin Troy (c) 2010 except where noted
2
Fundamentals of GIS
ASCII system provides standardized method for
coding alphanumeric characters, and uses byte of 8
bits for each symbol. Those characters include
everything you see on your keyboard and then some
Standard character set is coded as hexadecimal
numbers going from zero to FF (28).
Example: Letter ‘A’ is 41h = 6510=010000012
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Number of Possible Values is fn of number
of bits
•Number of possible values for a unit of data is an
exponential function of the number of switches
•28=256
eight bit data
•216=65,536
sixteen bit data
•232= 4,294,967,296
thirty two bit data
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Number of bits determines data types
Examples of Integer data types
Byte: 28 (0 to 255)
Short Integer: 216 (ranges from –32,767 to
+32,767 without decimals, the sixteenth bit
determines sign)
Long Integer: 232 (+/-2.147483e+09 )
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Floating point data types
• In this case the number can have a decimal, but the
number of places is variable
• With this type of number the number of bits
determines not just the number of possible
magnitudes but also the level of precision of the
decimal, represented as number of decimal places.
• Fewer bits in FP numbers can lead to rounding
errors
• Two types of FP number
• Single Precision: Often 232
• Double Precision: Usually double the bits of single precision (i.e.
264)
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Other data types
• Currency (type of number with specific
behaviors)
• Date (recognizes order in dates)
• String (text)
– When numbers are represented as text they have
no numerical properties (e.g. zip codes)
• Boolean (yes, no)
• Object (e.g. pictures, bits of code, behaviors,
multi-media, programs)
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Three database models
• Hierarchical
• Network
• Relational
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Hierarchical Database Model
A one-to-many method for storing data in a database that
looks like a family tree with one root and a number of branches or
subdivisions. Problem: linkages in the tables must be known before
Groovy 70s TV
Action shows
Dukes of
Hazzard
Tom
Wopat
Eric
Estrada
CHIPs
Larry
Wilcox
Sitcoms
Drama
Dallas
Larry
Hagman
Fantasy
Island
Ricardo
Montalban
WKRP
Loni
Anderson
Lecture materials by Austin Troy (c) 2010, except where noted
Welcome
back Kotter
John
Travolta
Gabe
Kaplan
Fundamentals of GIS
Hierarchical Database Model
•Example where this model works well:
•plant and animal taxonomies
•Soil classification
•Works when: classes are totally mutually exclusive
•Problem with this model:
•Does not work when have entities that belong to several
classes or do not have mutual exclusivity
•Think about the problems with Windows Explorer
•Example: classifying your music collection
•You may create classes like rock, jazz, classical, Latin,
with folders for artists nested within
•However, an artist may do rock and Latin and jazz on the
same album, or one song may be a combination
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Networked Database Model
A database design for storing information by linking all
records that are related with a list of “pointers.” Problem: linkages in
the tables must be known before. Not adaptable to change.
Action shows
Dukes of
Hazzard
CHIPs
NBC
Sitcoms
Drama
Dallas
CBS
Fantasy
Island
Love Boat
ABC
Lecture materials by Austin Troy (c) 2010, except where noted
Three’s
company
Fundamentals of GIS
Relational (Tabular) Database Model
A design used in database systems in which relationships are
created between one or more flat files or tables based on the idea
that each pair of tables has a field in common, or “key”. In a
relational database, the records are generally different in each table
The advantages: each table can be prepared and maintained
separately, tables can remain separate until a query requires
connecting, or relating them, relationships can be one to one, one to
many or many to one
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Records are the unit that the data
are specific to
fields
Fields, or columns, are attribute
categories
Cells are where individual values
of a record for a field are stored
Headings: are
the labels for
the columns
records
cells
Name
Phone
Address
Student ID
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Is a field that is common to two or more flat files; allows a
query to be done across multiple tables or allows two tables to
be joined
Flat file: professor info
Name
Phone
Address
faculty ID
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
Flat file: course info
Course
name
Course
number
enrollment
faculty ID
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Join Tables
Based on the values of a field that can be found in both tables
The name of the field does not have to be the same
The data type has to be the same
In this case we
have a one to
Key A B
one join; here
1
Key C
1 4
the key is unique
2 5
1 10
2
Key A B C
2 20
3
1
3 6
1 4 10
3 50
2 5 10
2
JOIN
Lecture materials by Austin Troy (c) 2010, except where noted
3
3
6
50
Fundamentals of GIS
Join Tables
Key
1
A
B
4
1
1
2
2
3
In this case we have a
one to many join; here
the key is not unique
Key C
5
1
2
6
10
20
Key
1
JOIN
Lecture materials by Austin Troy (c) 2010, except where noted
A
B
C
4
10
1
1
2
5
10
2
3
6
20
Fundamentals of GIS
 Relational (Tabular) Database Model: 70s TV example
Now we can have various flat files (tables) with different record
types and with various attributes specific to each record
Actor
Year
born*
Sidebur
n length
Table 1- specific to actors
Show
John
Travolta
1948
slight
WBK
Eric Estrada
1949
moderate
CHIPS
Larry
Wilcox
1953
slight
CHIPS
Tom Wopat
1950
major
Dukes
Show
Lead
actor
Co-star
Netw
ork*
Welcome
back
Kotter
John
Travolta
Gabe Kaplan
CBS
CHIPs
Eric
Estrada
Larry Wilcox
CBS
Dukes
Tom
Wopat
John
Schneider
NBC
Table 2- specific to shows
*entirely guessed at- I am not responsible for mistaken TV trivia
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
 Relational (Tabular) Database Model
This allows queries that go across tables, like which CBS lead actors
were born before 1951? Answer: John Travolta and Larry Wilcox
It does this by combining information from the two tables, using
common key fields
Actor
Year
born*
Sidebur
n length
Show
John
Travolta
1948
slight
WBK
Eric Estrada
1949
moderate
Larry
Wilcox
1953
Tom Wopat
1950
Show
Lead
actor
Co-star
Netw
ork*
John
Travolta
Gabe Kaplan
CBS
CHIPS
Welcome
back
Kotter
slight
CHIPS
CHIPs
Eric
Estrada
Larry Wilcox
CBS
major
Dukes
Dukes
Tom
Wopat
John
Schneider
NBC
*entirely guessed at- I am not responsible for mistaken TV trivia
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
 Relational (Tabular) Database Model
Object-relational databases can contain other objects as well, like
images, video clips, executable files, sounds, links
Actor
Year
born*
Sideburn
length
John
Travolta
1948
slight
Eric Estrada
1949
moderate
Larry
Wilcox
1953
slight
Tom Wopat
1950
major
Picture
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
 Relational Database: another
example: property lot info
Parcel Street
ID
address
zoning
11
15 Maple
St.
Residential-1
12
85 Brooks
Ave
Commercial-2
13
74 Windam
Ct.
Residential 4
One-to-one
relationship
Owner Parcel ID
occupation
J.
Smith
13
lawyer
R.
Jones
11
dentist
T.
Flores
12
Real estate
developer
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Parcel Street
ID
address
zoning
11
15 Maple
St.
Residential-1
12
85 Brooks
Ave
Commercial-2
74 Windam
Ct.
Residential 4
13
In this case, several
people co-own the
same lot, so no longer
one lot, one person
One-to-many
relationship
Owner
Parcel ID occupation
J. Smith
13
lawyer
R. Jones 11
dentist
12
J.
McCann
financier
T.
Flores
12
Lecture materials by Austin Troy (c) 2010, except where noted
Real estate
developer
Fundamentals of GIS
Parcel
ID
Assuming each owner owned several parcels, we
would structure the database differently
One-to-many
relationship
Street
zoning
address
Properties owned by T. Flores
11
15 Maple St.
Residential-1
12
85 Brooks Ave
Commercial-2
13
74 Windam Ct.
Residential 4
Owner
occupation
# properties
owned
J. Smith
lawyer
2
R. Jones
dentist
5
J. McCann financier
T. Flores
Real estate
developer
Owner
Parcel Date of
transaction
ID
Flores
13
4-15-00
Flores
15
4-17-01
Flores
19
3-12-99
2
3
Note: this table includes data
pertinent only to Flores’
ownership of these properties
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Example
Here’s an example of a chart showing the relationships between
flat files in a sample relational database for food suppliers* in
Microsoft Access
* This comes from an MS ACCESS sample database
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
* This comes from an MS ACCESS sample database
Fundamentals of GIS
A real time RDBMS allows for realtime linking and
embedding of tables based on common fields
Here we see
all the
orders for
product ID
3; there is
no need to
include
product ID
in that subtable
Lecture materials by Austin Troy (c) 2010, except where noted
* This comes from an MS ACCESS sample database
Fundamentals of GIS
Part 2. Queries
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries
•This is how we ask questions of the data
•To ask queries, we use mathematical operators,
like =, >, <
•To ask queries on multiple criteria, we use logical
operators, like AND and OR
•Queries can simply select records or perform
more advanced operations with those selections,
such as make new tables, or summarize values by
averages
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries in Arc GIS
•Arc GIS queries only select (highlight) records
•When a record is selected, so is its
corresponding feature
•To summarize selected values, use the
“statistics” function or “summarize” tool
•To create new values based on a query, use the
“calculate” tool.
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries
Here’s an example of a simple
query in Arc GIS
PRICE > 250000. This
highlights all records
(houses) in the specified
layer with a sales price
greater that $250,000
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries
That results in the following selection on the map
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries
And it also selects the corresponding records in the attribute
table
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries
Here’s an example with a polygon layer; I’m querying for
census tracts over 8000 people in population.
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries: multiple criteria
Now let’s add a criteria; let’s say we’re looking for big
population tracts (>8,000) with a high rate of population
change (> 3% annual). Note the use of the AND operator.
Note also that a subset of the last selection was selected
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries: Select From Set vs. New Set
We did the previous selection by clicking Using the “create a
new selection” method.
We could have done the same thing by doing the first query
(pop>8000), clicking “Apply,” then, without clearing that
selection, typing in a new query for the second condition
(popchng97 > 3) and choosing the “Select From Selection”
method instead
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Three query methods in Arc GIS
New Selection: Creates a new query from scratch
Add to Current Selection: Used when there is already a
group of records/features selection; it is equivalent to the
OR operator and widens the selection by introducing a
criterion that is equivalent to the first one
Select from Current Selection: Used when there is
already a of records/features selection; selects a subset
from the originally selected set; equivalent to the AND
operator
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries: OR operator
Here’s a query where we use the OR operator to select either
tracts greater than 8000 population OR with a growth rate
greater than 3%; results in many more records selected; can
also do the same thing by doing one query using “new
selection” then another, using “add to current selection”
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries: Strings
Queries can also be made on text strings, but it is
imperative to put the values in quotes. Here we query for
both BLM and Parks and Rec land.
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries: Strings and numbers
String and number queries can be combined. For
example, let’s say we’re looking for land for a suburban
park and our criteria are that we need areas whose land
use is classed as agricultural and that are bigger than
500,000 square feet.
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries: Strings and numbers
Results in:
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Queries: Strings and numbers
Whereas if our query asks for agricultural land use
without the area criterion, we get:
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
So what can Arc GIS do with queries?
A query selects records; once selected you can:
•
Look at the selection
•
Requery the selection
•
Do stats on the selection
•
Create new fields that recategorize the selection by an an
attribute field
•
Create new fields by doing calculations across several
fields
•
Create a shapefile from the selection
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Examples
Let’s query high unemployment census tracts in LA
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Now let’s do “statistics” to determine the population
in those areas. Answer: almost 5 million people
live in tracts with 6%+ unemployment (see Sum).
We can also see that there are 844 tracts meeting
that description (see Count)
Right click on the
heading to get this menu
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Another thing we can do is convert the selection to a
either a new shapefile or geodatabase feature class
Right click and then click
Data>>export data
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Now, let’s say we wanted to prioritize inner city areas
for urban redevelopment projects:
•
Let’s query based on unemployment and home value
•
Based on these we’ll create a new field that classes all
tracts into High, Medium and Low priority areas
•Tracts with
median home
value < $100,000
and unemployment >
12% are “High”
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
To reclassify, we create a new field, “priority”,
activate the field heading and use the field
calculator to set all selected records to “high”
Note: we must
uses quotes
with a text field
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Now we would set criteria for “medium” and “low”
based on unemployment and home value. These
would probably be more complex queries because
we’re querying for records, say, between 8 and 12%
unemployment and between $100,000 and $150,000
median value.
Note: AND is used
three times, with two
parenthetical clauses
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Now, for the third class our task is easier—we just
select everything that has not been selected yet. To
do this we query for “priority”= ‘’ where those two
marks after the equals sign are single quote marks.
By putting empty quote marks, you’re querying
for records with no values in them for that field.
Now you’d set all those fields equal to “low.”
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Now we can make a category map showing us that
classification based, which is based on two
attributes—median value and unemployment
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Another example:
This time, let’s take a vegetation layer and query for
stands with crown fire potential; because there are
several classes we have to query for all
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Then let’s calculate a fire hazard index for selected
polygons equal to .5(rate of spread * flame length)
We’ll create a new field, “fireindex” (floating point) and
set all selected polygons equal to that calculation
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Then, for all other polygons without crown fire
potential, a different equation can be used, say
.38(Rate of spread * Flame length). But first we
have to take the inverse of the selection by using
the “switch selection” function
Then we can do the new calculation on the new
selection
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Now we can plot out the map of fire index, plotted out
using graduated color (quantity) mapping
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Access and Arc GIS queries
You can do all these queries and much much more in MS Access,
which is a relational DBMS.
For the most part, you’ll use Access to manipulate and query your
attribute tables from geodatabases
This can be done because a geodatabase is an MS Access file
(.MDB)
There are six basic queries you can do in Access:
Select, cross-tab, make table, update,
append, delete
We’ll learn more about these in lab
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Access Queries
•Select: the most general purpose and versatile query—creates a
new temporary table; used for getting summary statistics for a field,
or breaking down summary statistics by category
•Cross-tab: for summarizing statistics across two factors (row and
column)
•Make table: for creating a new, stand-alone data table from a
query
•Update query: this is where we fill a field (could be an empty
field) in an existing table with new values, either equal to a
constant, to values in another field or to an operation using values
from another field; can use Where criteria on this
•Append/delete queries: query that defines rows to append to or
delete from a table; append queries usually require another table.
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Access Queries
•Queries can be used to:
•Summarize information stored in one or many tables (e.g. sales
by year, sales by category, sales by saleperson, sales by date,
orders by date, orders by product type, orders by zip code)
•Create new fields using simple or complex expressions, with
the option of using criteria to specify which records will be
filled in for that field
•Derive averages, maxima, minima, sums, standard deviations,
and counts for values in fields, with or without criteria
•Derive those same things for categories within a field
•Summarize and ask questions of attribute data stored in
different tables
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Access Queries
•Example of query run to get sums of sales values across
product categories:
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Relational attribute queries
Here’s a an Access select query; note how it queries across
various linked tables
This one asks for a summary of sales by category and product
name for the dates between 1/1/1997 and 12/31/1997
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Advanced Single layer query operations
Queries can be used to return statistics: here we get
the mean price from a database of housing sales
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Advanced Single layer query operations
And here we summarize mean price by zip code
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Remember the food database?
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Advanced Single layer query operations
This simple select query yields a summary table of
sales by category for a given year period: generates
a mean value for each category
relates
criteria
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
This select query perform a math operation: it
multiplies price and quantity, times a discount
and delivers a table of order subtotals
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Advanced Single layer query operations
Here we sort
sales by
product and
city
operation
criteria
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Advanced Single layer query operations
Here we sort sales
by city only
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Advanced Single layer query operations
Queries can also be used to make reports, like
this invoice
Lecture materials by Austin Troy (c) 2010, except where noted
Fundamentals of GIS
Advanced Single layer query operations
Queries can be programmed to make custom database
interfaces, so users can easily ask questions of the
data, like this, where orders are summarized by buyer
and the user chooses the country to query on
Lecture materials by Austin Troy (c) 2010, except where noted