Transcript Data Models
Data Models for Ecological
Databases
John Porter
Department of Environmental
Sciences
University of Virginia
Characteristics of Ecological
Data
High
Satellite
Images
GIS
Weather
Stations
Business
Data
Data
Volume
(per
dataset)
Primary
Productivity
Gene Sequences
Biodiversity
Surveys
Population Data
Soil Cores
Low
High
Complexity/Metadata Requirements
Choosing a DBMS
What tasks to do you want the DBMS to
accomplish?
query
sorting
analysis
Is there a type of DBMS whose
structure best mirrors that of the
underlying data?
Database Management System
(DBMS) Types
File
system-based
Hierarchical
Network
Relational
Object-oriented
Advantages and Disadvantages
of using a DBMS
Advantages
• additional
capabilities
– sorting
– query
– integrity checking
• easy access to data
Disadvantages
• few graphical or
statistical capabilities
• proprietary formats
may limit archival
quality of data
• require expertise and
resources to
administer
File-System Based
Directory
Files
Files
Files
Filesystem-based
very simple and easy to set up
inefficient
few capabilities
Hierarchical
Project
Hierarchical
efficient
Datasets
Investigators
not very general
Variables Locations
e.g. phylogenetic
structures
Codes
Methods
geographical
images
Network Database
Projects
Datasets
Links are hard-coded into
database. They are not a
property of the data
Locations
Network Database
very flexible
unwieldy to modify
not widely used
Relational Database
Projects
Location_id
Data_id
Datasets
Location_id
Linkages are through
the properties of the
data itself - not hard
coded
Locations
Relational
widely-used, mature
table-oriented
restricted range of structures
Object Oriented
Methods
Object Data
Structure
Object-oriented
•developing -few
commercial
implementations
•diverse structures
•extensible
Data Modeling
Data modeling is used to develop the
database structures used in a database
Your data model effects
• reliability of the data
• efficiency and speed of queries
• the complexity of the database
Data modeling is an art, not a science!
Flat-file
Genus
Quercus
Quercus
Quercus
Quercus
Quercus
Species
alba
alba
alba
rubra
rubra
Common Name
White Oak
White Oak
White Oat
Red Oak
Red Oak
Species
Genus
Observer
Jones, D.
Smith, D.
Doe, J.
Fisher, K.
James, J.
Date
Observation
Species
Common
Name
Observer
Date
15-Jun-1998
12-Jul-1935
15-Sep-1920
15-Jun-1998
15-Sep-1920
Normalization
One widely-used approach for reducing
errors within a database is to normalize
your data structures
Normalization is the process of
eliminating duplicate or redundant
information
Two-table Relational Database
Spec_code
QRCALB
QRCRBR
Spec_code
QRCALB
QRCALB
QRCALB
QRCRBR
QRCRBR
Genus
Quercus
Quercus
Observer
Jones, D.
Smith, D.
Doe, J.
Fisher, K.
James, J.
Species
Species
alba
rubra
Common Name
White Oak
Red Oak
Date
15-Jun-1998
12-Jul-1935
15-Sep-1920
15-Jun-1998
15-Sep-1920
Spec_code
Spec_code
Observation
Genus
Species
Common
Name
Observer
Date
Complex Data Model
Species
Images
Observations
Internet Links
Locations
Observers
Specimens
Data Model for Metadata at
VCR/LTER
Personnel
Projects
Mailing Lists
Dataset
Locations
Variable
Codes
Dataset
Variable
Optional Linkage
Mandatory Linkage
“Beanstalk”& “String of Pearls”
What Value Date
Location
Temp
SEV
23
10/19/00
Metadata
•methods
•units
Location Table
•Lat/Lon
Humid 95
10/19/00
SEV
Precip 0.01
10/18/00
VCR
Beanstalk / String of Pearls
Highly normalized
Extremely flexible - capable of handling
many different kinds of data
Inefficient
• Querys can be very slow
• Can require large amounts of space