Transcript db taxonomy

Representing taxonomy
MarBEF-IODE workshop
Oostende, 19-23 March 2007
Philosophy
 Structure has to be as simple as possible
 But not any simpler!!
 Alternatives to represent classification and
hierarchy
 Alternatives to represent synonymy
Hierarchy: flat table
 Every rank in the hierarchy is
represented by a field in the table
 Simplest solution


Easy to create
Easy to query
Hierarchy: flat table
Hierarchy: flat table
 Problems

not normalised!
Not a real problem if a quick-and-dirty solution is
all that is needed
 Difficult to maintain hierarchy in the long term
 ‘Standard’ problems with non-normalised
database

 Possible conflicting information, inefficient storage…

Cfr MASDEA; too simple
Hierarchy: normalised tables
 Every rank is represented by a separate
table
 Not very difficult to write a query to
regenerate flat table
 Every taxon can have additional
information

Extra fields with description…
Hierarchy: cascading tables
Hierarchy: normalised tables
 Avantages


Easy to maintain and query
Normailised, possible to add information at any level of the
hierarchy
 Drawbacks

Ranks are hard-wired on the structure of the database




New rank would require change of the structure of the database
And probably of the user interface, web interface…
Number of tables
Lot of functionality duplicated
Taxonomic reality
 Ranks used depend on the taxonomic group


Botany: mainly infra-specific; zoology: mainly on
higher levels
Many of the ranks are only sparsely used
 Needs for a more flexible system
 Much of the functionality is the same across all
ranks


‘parent’, synonymy
Authority, description…
‘Open Hierarchy’
 Possible to define new ranks without having to
rewrite the structure of the database
 All taxonomic names are stored in a field in a
single table; other fields indicate parent and
rank
 Many-to-one relation: a single parent, several
descendants

Include ID of parent in the record of the
descendant
Open Hierarchy
 Avantages


Completely normalised
Flexible
 Drawbacks

Difficult to query classification


Queries of the type ‘all species of the Echinodermata’…
Solution:



‘Calculated field’
Programmatical (loop in computer language)
Recursive query
Synonymy
 Every taxon can have several synonyms;
in principle, only one valid name for any
synonym

Many-to-one relation: one valid name, many
synonymous names
Include ID of the valid name in the record of the
synonymous name
 Other fields for the type of synonymy…

Implementation in OBIS
(PostgreSQL)
Calculated field: ‘stored path’
 Calculate a field, as a concatenation of id
of parent, parent of parent…
 E.g. x5x45x65x




5: Animalia
45: Arthropoda
65: Crustacea
Stored path of all taxa belonging to
Crustacea start with x5x45x65x
Query the Stored Path
 Get all species from Echinodermata:
select * from obis.tnames
where storedpath~(select
'^'||storedpath||id||'x' from obis.tnames
where tname='Echinodermata')::text
and rank_id=220
Recursive query
 All taxa belonging to given taxon:
with recursive includedtaxa(id, tname) as (
select id, tname from obis.tnames
where tname='Semelidae'
union
select tnames.id, tnames.tname
from obis.tnames inner join includedtaxa
on tnames.parent_id=includedtaxa.id
) select * from includedtaxa order by tname
The other way
 Finding parent of given rank of a species
with recursive parenttaxa(id, parent_id, tname) as (
select id, parent_id, tname from obis.tnames
where tname='Abra alba'
union
select tnames.id, tnames.parent_id, tnames.tname
from obis.tnames inner join parenttaxa
on parenttaxa.parent_id=tnames.id
and tnames.rank_id>=140
) select * from parenttaxa order by tname
Rest of the taxonomic model
 Ranks should be in a separate table


Information on the level of the rank can be
added
Possibility of extra quality control
Rank of a parent as compared to rank of
descendants
 Rank of siblings should be same

Documentation
 Documenting sources of information
 Add sources/references



‘Audit trail’: source of the information in the
database
Taxonomic information: reference of the original
description
Type of the source: expert, database, publication
 Date and person responsible for the last
revision of the record
Sources
 Many-to-many relation


Every source can contain information on several
taxa
A single taxon can be documented in several
sources
 Necessitates an extra table to represent the
relationship

Divide one many-to-many in to one-to-many
relationships
Add distribution
 Localities from where a taxon has been
reported
 Many-to-many relation


One locality has several taxa
One taxon is found on several localities
 Relation must be qualified


Source!
Validity of the observation