Value of a coordinate: geographic analysis of agricultural

Download Report

Transcript Value of a coordinate: geographic analysis of agricultural

Value of a coordinate: geographic
analysis of agricultural biodiversity
Andy Jarvis, Julian Ramirez, Nora Castañeda, Samy Gaiji,
Luigi Guarino, Hector Tobón, and Daniel Amariles
Contents
• Why crop wild relatives?
• How a coordinate can
help us complete the
collections
• Cleaning coordinate data
• Needs from standards
Wild relatives of crops
• Include both progenitor species and closely related species of cultivated
crops
• Faba beans – 0 wild relatives
• Potato – 172 wild relative species
• Increasingly useful in breeding, especially for biotic resistance
Wild relative species
A. batizocoi - 12 germplasm accessions
A. cardenasii - 17 germplasm accessions
A. diogoi - 5 germplasm accessions
Florunner, with no rootknot nematode resistance
COAN, with population
density of root-knot
nematodes >90% less
than in Florunner
Gap Analysis: Strategies
to fill the holes in our
seed collections
The Gap Analysis road map
Taxonomy review
Data gathering
Georeferentiaton
Final
Gap Analysis
Environmental
recommendations
process
data gathering
The Gap Analysis process
Geographic dimension
Proxy for:
• Diversity
• Possibly biotic traits
Taxonomic dimension
Environmental dimension
Proxy for:
Proxy for:
• Range of traits
• Abiotic traits
http://gisweb.ciat.cgiar.org/gapanalysis/
Crop
Barley
Bean
Chickpea
Cowpea
Faba bean
Finger millet
Maize
Pearl millet
Pigeon pea
Sorghum
Wheat
Wheat
Genus
# species
Hordeum
27
Phaseolus
72
Cicer
23
Vigna
64
Vicia
9
Eleusine
7
Zea
4
Pennisetum
54
Cajanus
26
Sorghum
31
Aegilops
23
Triticum
3
G
H
Total Avg. Reco
1419 10965 12384
2435 2952 5387
314
19
333
2509 6306 8815
511
949 1460
3
68
71
228
143
371
963 3409 4372
197
601
798
320 4138 4458
4016 2231 6247
1374
1 1375
Total number of herbarium specimens
and germplasm accessions available for
each major crop wild relative genepool
through the GBIF portal
Environmental coverage
HERBARIUM
GERMPLASM
NO
GERMPLASM
DEFICIENT
GERMPLASM
POTENTIAL
RICHNESS
RARE
ENVIRONMENTS
Which species,
and where
Wild Vigna collecting priorities
• Spatial analysis on
current conserved
materials
• *Gaps* in current
collections
• Definition and
prioritisation of
collecting areas
• 8 100x100km cells to
complete collections
of 23 wild Vigna
priority species
Richness in collecting zones at
genepool level
Predicted change in species richness to 2050.
Exploration and ex-situ conservation of
Capsicum flexuosum
• Uncommon species of wild chili,
found in Paraguay and Argentina,
historically used by local indigenous
communities
• 18 known registers of the plant prior
to this work
• 2 germplasm accessions conserved in
the USDA
• GIS used to target field collections
• 6 new collections of C.
flexuosum
• 160 seeds conserved ex situ
OBJECTIVE: Locate and collect germplasm of this species in Paraguay
Behind all this
Data Quality
The GBIF database: status of the data
•
•
•
•
•
The database holds 177,887,193 occurrences
Plantae occurrences are 44,706,505 (25,13%)
33,340,000 (74.5%) have coordinates
How many of them are correct, and reliable?
How many new georreferences could we get?
CURRENT STATUS OF
THE Plantae RECORDS
The GBIF database: status of the data
• How to make the terrestrial data reliable enough?
– Verify coordinates at different levels
• Are the records where they say they are?
• Are the records inside land areas (for terrestrial plant species
only)
• Are all the records within the environmental niche of the
taxon?
– Correct wrong references
– Add coordinates to those that do not have
– Cross-check with curators and feedback to the database
• Using a random sample of 950.000
occurrences with coordinates
• Are the records where they say they are?:
country-level verification
Records with null country:
Records with incorrect country:
Total excluded by country
Records mostly
located
in country
boundaries
Inaccuracies in
coordinates
58.051
6.918
64.969



6,11% of total
0,72% of total
6,83% of total
• Are the terrestrial plant species in land?:
Coastal verification
Records in the ocean:
Records near land (range 5km):
Records outside of mask:
Total excluded by mask
9.866
34.347
369
44.582




1,03% of total
3,61% of total
0,04% of total
4.69% of total
Errors, and more errors
Not so bad at all… stats
• 44’706.505 plant records
• 33’340.008 (74,57%) with coordinates
• From those
– 88.5% are geographically correct at two levels
– 6.8% have null or incorrect country (incl. sea plant
species)
– 4.7% are near the coasts but not in-land
Summary of errors or misrepresented data
RESULTING DATABASE
TOTAL EVALUATED RECORDS: 950.000
Good records:
840.449 
88.47% of total
Next steps
• It now takes 27 minutes to verify 950,000 records,
177million would be 83 hours (3 ½ days)
• Identify terrestrial plant species and separate them from
sea species
• Use a georreferencing algorithm to:
– Correct wrong references
– Incorporate new location data to those with NULL lat,lon
• Interpret 2nd & 3rd-level administrative boundaries and
use them too
• Implement environmental cross-checking (outliers)
Geo-referencing: BioGeomancer
http://bg.berkeley.edu/
Conclusions
• A coordinate can tell us a lot, and answer a number of
interesting research questions, solve a lot of problems
• Agricultural world sadly behind the mainstream
biodiversity world
– Data not online, not available
– Databases not connected
• Quality of coordinate data is critical:
– We need the concept of precision included
– We need fields such as location descriptions, and
administrative 2nd and 3rd level descriptions for
georeferencing
– We need effective two way communications for verifying,
correcting and assigning coordinates from nodes to
indexes and vice-versa
Economy of scale
[email protected]