Semantic Enrichment - Global Register of Migratory Species

Download Report

Transcript Semantic Enrichment - Global Register of Migratory Species

Increasing Usability of
Biodiversity Databases through
Semantic Enrichment
Klaus Riede
Zoologisches Forschungsinstitut &
Museum Alexander Koenig (ZFMK)
Adenauerallee 150-164
53113 Bonn, Germany
Semantic Enrichment:
Some examples.....
Huge Biodiversity Databases already exist.
They cover distinct organims:
Fishbase, Orthoptera Species File
OR
Distinct themes:
Threat: IUCN Red List Database (www.redlist.org)
Migration: Global Register of Migratory Species
(www.groms.de)
Why do we need semantic enrichment?
Semantic Enrichment:
Some examples.....
Try to search for:
Number of „Extinct Tropical Timber Trees“
Database: IUCN Red List Database (www.redlist.org)
Query: Tropical tree
Problem: plants are not classified according to life-form
Plant families such as TAXODIACEAE comprise trees
(e.g. Taiwania cryptomeroides - VULNERABLE)
CUPRESSACEAE contain shrubs (Actinostrobus) AND trees ( Thuja spp.)
Semantic Enrichment:
Searching for Red-Listed Trees
To search the IUCN Red List Database (www.redlist.org)
for „Threatened“ trees, you have to know plant taxonomy:
Searching the Order CONIFERALES (containing Taxodiaceae trees):
–16 Critically Endangered,
–43 Endangered,
–93 Vulnerable,
...but some of those are shrubs (Cupressaceae: Actinostrobus)
Threatened Cupressaceae:
– 2 Critically Endangered, (e.g. Thuja sutchuensis)
– 15 Endangered, (e.g. Juniperus cedrus)
– 25 Vulnerable (e.g. Cupressus gigantea)
Semantic enrichment is necessary
to search for „Trees“
http://www.botanik.uni-bonn.de/conifers/index.htm
Two Worlds:
Relational databases and complex data sets
Relational Databases
Complex data sets
S P
Digital Orthoptera Specimen Access
ounds,
ictures
gene sequences (links)
SYSTAX
geographic coordinates
GROMS
Global Register of Migratory Species
Maps
(GIS-data: shapes)
Example #1
Data-mining for Knowledge Gaps
The „Global Register of Migratory Species“ Database
contains literature citations on migration.
Knowledge gaps were detected by searching for text strings
such as: poor* , little known, unknown
www.groms.de
The relational organisation of the GROMS database allows
application of SQL queries for text-mining:
References Table:
ID
Author, Title etc
Joint Table:
Lit_ID
1:many
Species_ID
Text:
[..................
....migration...
Species Table
many:1 ID
Taxon name
Migration
Red List status, etc
unknown...........
........................]
5,500 entries
8,500 entries
4,355 entries
Many:Many relation connects References and Species Names
SQL statement:
Searching for non-passerine birds with poorly known
migration behaviour:
SELECT Tab_Arten.Latein, Tab_Arten.Englisch, Tab_Arten.Migration,
Jointab_Art_Lit.Lit_Bezug, Tab_Literatur.Autor_Name, Tab_Literatur.Autor_Vorname,
Tab_Literatur.Coautoren_Namen, Tab_Literatur.Jahr, Tab_Arten.animalgroup,
Tab_Arten.Familie
FROM Tab_Literatur RIGHT JOIN (Tab_Arten INNER JOIN Jointab_Art_Lit ON
Tab_Arten.ID = Jointab_Art_Lit.ID_Art) ON Tab_Literatur.ID = Jointab_Art_Lit.ID_Lit
WHERE (((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND
((Jointab_Art_Lit.Lit_Bezug) Like "*unknown*")) OR (((Jointab_Art_Lit.Theme)=7) AND
((Tab_Arten.Animal_Class)=2) AND ((Jointab_Art_Lit.Lit_Bezug) Like "*perhaps*")) OR
(((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND
((Jointab_Art_Lit.Lit_Bezug) Like "*little*")) OR (((Jointab_Art_Lit.Theme)=7) AND
((Tab_Arten.Animal_Class)=2) AND ((Jointab_Art_Lit.Lit_Bezug) Like "*poor*")) OR
(((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND
((Jointab_Art_Lit.Lit_Bezug) Like "*possib*"))
ORDER BY Tab_Arten.animalgroup, Tab_Arten.Familie;
Apus caffer
White-rumped
swift
intercontinental Migratory in northernmost and southernmost parts of range. Spanish population present early May to Aug-Oct, some recorded into
Chaetura vauxi
Vaux's swift
intercontinental Nominate race a migrant, present in far N of range May to mid-Sept, exceptionally late Mar on coast. Migrates through S California
early Dec, with autumn migration through Straits of Gibraltar mid-Aug to mid-Oct; S African population present Aug-May, mainly
absent from S Cape and much reduced farther N within S breeding range Jun-Jul. Poorly understood wet-season movements into Sahel
may be feature of N sub-Saharan populations. Otherwise resident. Migrates in flocks of up to 100. S African migrants may be
transequatorial. Some degree of altitudinal migration in Natal. First record from rabia 1982, and seen at least once subsequently in
Tihamah coastal plains, Saudi Arabia, in Mar 1989. Vagrant to Norway (May, Jun) and Finland (Nov).
mid-Apr to early May, with weaker autumn passage peaking early Sept, though continuing to early Oct, migrants leaving the state by
mid-Oct. Recorded SE Farallon Is, 42 km W of San Francisco, in similar numbers over 22 years, in spring 813 in early-late May, and in
autumn 803 early Sept to late Oct. Recorded E to Louisiana and Florida Passage through NW Mexico Apr-May and mid-Sept to Oct;
nominate race present C Mexico to W Honduras, mid-Sept to May. Incidence of wintering in California increasing, small flocks
occurring mainly in S, though wintering as far as NW California not unknown.
Result: 349 birds with unsufficiently known migration
behaviour
Migratory in northernmost and southernmost parts of range.
Spanish population present early May to Aug-Oct, some
White-rumped swift recorded into early Dec, with autumn migration through Straits
of Gibraltar mid-Aug to mid-Oct; S African population present
Aug-May, mainly absent from S Cape and much reduced
farther N within S breeding range Jun-Jul. Poorly understood
wet-season movements into Sahel may be feature of N subSaharan populations. Otherwise resident. Migrates in flocks of
up to 100. S African migrants may be transequatorial. Some
degree of altitudinal migration in Natal. First record from rabia
1982, and seen at least once subsequently in Tihamah
coastal plains, Saudi Arabia, in Mar 1989. Vagrant to Norway
(May, Jun) and Finland (Nov).
Poorly known. Nominate race migratory and partially
Caprimulgus
sedentary, some populations moving S after breeding season.
climacurus
Race sclateri possibly sedentary and partially migratory.
Long-tailed nightjar Race nigricans probably sedentary. Outside breeding
season, range also includes S Ivory Coast, SW Nigeria, S
Cameroon, Equatorial Guinea, Gabon, SE Congo (lower
Congo river valley), NE Angola (one record Luaco), SE
Sudan, SW Ethiopia, W Kenya (sporadic in Turkana and
Pokot region) and E Uganda.
Apus caffer
mainly based on
„Handbook of the birds of the World (del Hoyo et al. 1992-2003
www.groms.de
Example #2:
Automatic Annotation of Sound Parameters
The Orthoptera Song Repository of the DORSA project was used to
annotate all 5,000 sound files automatically with sound parameters.
Sound parameters were added to the SysTax database, which stores
specimen data from various museum databases, including herbaria.
The annotated SysTax Oracle database is now searchable for sound
parameters, such as Carrier Frequency and Pulse Rate
Deutsche Orthopteren Sammlungen - www.dorsa.de
Orthopteren-Typenmaterial in
deutschen Museen.
Deutsche Orthopteren Sammlungen - www.dorsa.de
Überprüfung, Bestimmung, Verifizierung von
•Angaben über Typenmaterial,
•Auffinden „historischer“ Typen,
•Festlegung von Lektotypen
Deutsche Orthopteren Sammlungen - www.dorsa.de
Taxonomic database
(OSF:
Orthoptera Species File,
USA)
Specimens
(german museums, phonotheks)
(www.dorsa.de)
Mutual links
Extraction of sound parameters by using
MatLab Software
Carrier
frequency
Pulse rate
Carrier
frequency
In cooperation with:
Dept of Neuroinformatics, Ulm
Enriched sound file table:
pulse distance, length, frequency etc were added to the
SYSTAX table
PULSEDISTPULSEDISTS
PULSELENGT
PULSELENGTDUTY_CYCLE
FREQUENCY FREQUENCYS
FILENAME
12.5 ms
419.8
10.5 ms
2.6
0.84
7590.44 Hz 66.60
n3/tr/trigsp01/s6n023.wav
19.3 ms
94.8
7.0 ms
2.2
0.36
7593.24 Hz 63.79
n1/tr/trigsp01/s6n023.wav
18.1 ms
453.3
10.6 ms
4.7
0.59
7357.79 Hz 521.05
n3/tr/trigsp01/s6n027.wav
18.1 ms
290.4
10.3 ms
5.2
0.57
7302.54 Hz 610.23
n1/tr/trigsp01/s6n027.wav
18.1 ms
983.1
10.4 ms
3.2
0.57
7684.25 Hz 114.84
n1/tr/trigsp01/s6n027f.wav
13.2 ms
203.2
6.6 ms
3.3
0.50
7104.88 Hz 76.85
n3/tr/trigsp01/s6n029.wav
13.6 ms
79.1
12.0 ms
3.5
0.88
7128.05 Hz 78.00
n1/tr/trigsp01/s6n029.wav
11.5 ms
458.2
6.1 ms
3.1
0.53
7702.09 Hz 380.65
n3/tr/trigsp01/s6n031.wav
11.6 ms
113.8
5.7 ms
2.6
0.49
7806.72 Hz 78.67
n1/tr/trigsp01/s6n031f.wav
22.9 ms
130.4
8.4 ms
2.6
0.37
6867.13 Hz 77.54
n3/tr/trigsp01/s6n034.wav
22.9 ms
171.9
8.4 ms
2.6
0.37
6855.36 Hz 90.63
n1/tr/trigsp01/s6n034.wav
22.9 ms
126.7
8.4 ms
2.6
0.37
6855.70 Hz 102.59
n1/tr/trigsp01/s6n034f.wav
14.4 ms
114.3
11.7 ms
2.6
0.81
6672.19 Hz 53.27
n3/tr/trigsp01/s6n041.wav
14.6 ms
209.3
11.7 ms
3.2
0.80
6641.43 Hz 60.03
n1/tr/trigsp01/s6n041.wav
14.6 ms
209.3
11.7 ms
3.2
0.80
6643.62 Hz 70.45
n1/tr/trigsp01/s6n041f.wav
39.4 ms
1988.8
19.5 ms
7.3
0.49
6165.52 Hz 124.48
n3/tr/trigsp01/s6n044.wav
13.0 ms
100.7
11.0 ms
2.0
0.85
7207.58 Hz 41.06
n1/tr/trigsp01/s6n044f.wav
13.2 ms
295.5
11.0 ms
4.0
0.83
6965.09 Hz 1129.63
n3/tr/trigsp01/s6n049.wav
13.0 ms
100.0
8.5 ms
2.4
0.65
7205.56 Hz 41.95
n1/tr/trigsp01/s6n049.wav
11.5 ms
506.9
9.8 ms
2.9
0.85
7528.76 Hz 64.86
n1/tr/trigsp01/s7n008.wav
11.5 ms
82.3
10.0 ms
2.7
0.87
7545.55 Hz 51.77
n3/tr/trigsp01/s7n008.wav
11.5 ms
506.9
9.8 ms
2.9
0.85
7527.69 Hz 63.26
n1/tr/trigsp01/s7n008f.wav
13.2 ms
148.8
11.0 ms
2.7
0.83
7322.22 Hz 66.56
n3/tr/trigsp01/s7n026.wav
13.5 ms
1586.9
11.2 ms
3.7
0.83
7330.96 Hz 58.02
n1/tr/trigsp01/s7n026.wav
13.5 ms
1581.0
11.2 ms
3.7
0.83
7332.97 Hz 44.75
n1/tr/trigsp01/s7n026f.wav
17.8 ms
174.6
11.0 ms
2.9
0.62
7464.91 Hz 61.32
n3/tr/trigsp01/s7n031.wav
17.7 ms
123.3
10.4 ms
3.3
0.59
7467.31 Hz 54.61
n1/tr/trigsp01/s7n031.wav
Bioacoustic, automatised
classification of ethospecies
allows
Rapid Assessment
Mapping with microphones allows to answer
important research questions, such as:
- species ranges/ endemism
- species abundance
- species turnover
- community patterns
- activity patterns
- vulnerability to habitat degradation
- extermination rates
Example #3
Enriching databases with Geographic information
- Adding lat-lon coordinates by Geo-referencing
- GIS Analysis of complex geometries (shapes)
by intersection with other GIS-layers and subsequent update
Georeferencing is necessary to update place names with
lat-lon data
Label
Ahwaz
Ainazi-Svetupe River
Ainovy Islands
Akaki Region
Akh-Chala Plavni Novogolov
Akhna Dam
Akhtarski and Sladki Limans
Akrotiri Salt Lake
Aksehir Gölü
Akureyri
Akyatan Gölü
Country
Iran
Latvia
Russia
Ethiopia
Azerbaijan
Cyprus
Russia
Cyprus
Turkey
Iceland
Turkey
long_dec
48.66667
24.3
31.58333
38.83333
48.66667
33.8
38
32.93333
31.4
-18.08333
35.31667
?
lat_dec
31.16667
57.78333
69.83334
8.833333
39.5
35.03333
46
34.51667
38.55
65.66666
36.58333
Geographic coordinates were added to place names,
using Times Atlas or gazetteers (Getty, Alexandria Project)
Label
Ahwaz
Ainazi-Svetupe River
Ainovy Islands
Akaki Region
Akh-Chala Plavni Novogolov
Akhna Dam
Akhtarski and Sladki Limans
Akrotiri Salt Lake
Aksehir Gölü
Akureyri
Akyatan Gölü
Country
Iran
Latvia
Russia
Ethiopia
Azerbaijan
Cyprus
Russia
Cyprus
Turkey
Iceland
Turkey
long_dec
48.66667
24.3
31.58333
38.83333
48.66667
33.8
38
32.93333
31.4
-18.08333
35.31667
lat_dec
31.16667
57.78333
69.83334
8.833333
39.5
35.03333
46
34.51667
38.55
65.66666
36.58333
Mapping requires specimen data
enriched with geographic coordinates
The DORSA mapserver is available at
www.dorsa.de
Deutsche Orthopteren Sammlungen - www.dorsa.de
Herkunftsländer des Typenmaterials in deutschen Museen
Example #3
Enriching databases with Geographic information
based on GIS calculation of range territories
Distribution maps (shapes) are available at www.groms.de
Import of Intersection Results:
1,000 mapped species - 2,522 administrative units
340,000 combinations (dbf attribute table:province – species)
Queensland search results:
Summary:
Semantic enriching of relational databases is possible by
automatic annotation:
Relational database
Link
External data set
(sounds, GIS)
Running annotation
program (eg GIS intersection
Enriched Relational
Database
Importing
Result table
Table with annotation
Results
Enrichment allows SQL retrieval of complex data parameters