Transcript db taxonomy
Representing taxonomy
MarBEF-IODE workshop
Oostende, 19-23 March 2007
Philosophy
Structure has to be as simple as possible
But not any simpler!!
Alternatives to represent classification and
hierarchy
Alternatives to represent synonymy
Hierarchy: flat table
Every rank in the hierarchy is
represented by a field in the table
Simplest solution
Easy to create
Easy to query
Hierarchy: flat table
Hierarchy: flat table
Problems
not normalised!
Not a real problem if a quick-and-dirty solution is
all that is needed
Difficult to maintain hierarchy in the long term
‘Standard’ problems with non-normalised
database
Possible conflicting information, inefficient storage…
Cfr MASDEA; too simple
Hierarchy: normalised tables
Every rank is represented by a separate
table
Not very difficult to write a query to
regenerate flat table
Every taxon can have additional
information
Extra fields with description…
Hierarchy: cascading tables
Hierarchy: normalised tables
Avantages
Easy to maintain and query
Normailised, possible to add information at any level of the
hierarchy
Drawbacks
Ranks are hard-wired on the structure of the database
New rank would require change of the structure of the database
And probably of the user interface, web interface…
Number of tables
Lot of functionality duplicated
Taxonomic reality
Ranks used depend on the taxonomic group
Botany: mainly infra-specific; zoology: mainly on
higher levels
Many of the ranks are only sparsely used
Needs for a more flexible system
Much of the functionality is the same across all
ranks
‘parent’, synonymy
Authority, description…
‘Open Hierarchy’
Possible to define new ranks without having to
rewrite the structure of the database
All taxonomic names are stored in a field in a
single table; other fields indicate parent and
rank
Many-to-one relation: a single parent, several
descendants
Include ID of parent in the record of the
descendant
Open Hierarchy
Avantages
Completely normalised
Flexible
Drawbacks
Difficult to query classification
Queries of the type ‘all species of the Echinodermata’…
Solution:
‘Calculated field’
Programmatical (loop in computer language)
Recursive query
Synonymy
Every taxon can have several synonyms;
in principle, only one valid name for any
synonym
Many-to-one relation: one valid name, many
synonymous names
Include ID of the valid name in the record of the
synonymous name
Other fields for the type of synonymy…
Implementation in OBIS
(PostgreSQL)
Calculated field: ‘stored path’
Calculate a field, as a concatenation of id
of parent, parent of parent…
E.g. x5x45x65x
5: Animalia
45: Arthropoda
65: Crustacea
Stored path of all taxa belonging to
Crustacea start with x5x45x65x
Query the Stored Path
Get all species from Echinodermata:
select * from obis.tnames
where storedpath~(select
'^'||storedpath||id||'x' from obis.tnames
where tname='Echinodermata')::text
and rank_id=220
Recursive query
All taxa belonging to given taxon:
with recursive includedtaxa(id, tname) as (
select id, tname from obis.tnames
where tname='Semelidae'
union
select tnames.id, tnames.tname
from obis.tnames inner join includedtaxa
on tnames.parent_id=includedtaxa.id
) select * from includedtaxa order by tname
The other way
Finding parent of given rank of a species
with recursive parenttaxa(id, parent_id, tname) as (
select id, parent_id, tname from obis.tnames
where tname='Abra alba'
union
select tnames.id, tnames.parent_id, tnames.tname
from obis.tnames inner join parenttaxa
on parenttaxa.parent_id=tnames.id
and tnames.rank_id>=140
) select * from parenttaxa order by tname
Rest of the taxonomic model
Ranks should be in a separate table
Information on the level of the rank can be
added
Possibility of extra quality control
Rank of a parent as compared to rank of
descendants
Rank of siblings should be same
Documentation
Documenting sources of information
Add sources/references
‘Audit trail’: source of the information in the
database
Taxonomic information: reference of the original
description
Type of the source: expert, database, publication
Date and person responsible for the last
revision of the record
Sources
Many-to-many relation
Every source can contain information on several
taxa
A single taxon can be documented in several
sources
Necessitates an extra table to represent the
relationship
Divide one many-to-many in to one-to-many
relationships
Add distribution
Localities from where a taxon has been
reported
Many-to-many relation
One locality has several taxa
One taxon is found on several localities
Relation must be qualified
Source!
Validity of the observation