Harnessing complexity Visualizations and algorithms for

Download Report

Transcript Harnessing complexity Visualizations and algorithms for

Predicting food web connectivity
Phylogenetic scope, evidence thresholds, and
intelligent agents
Cynthia Sims Parr
Ecological Society of America
Memphis, TN August 8, 2006
ELVIS: Ecosystem Localization,
Visualization, and Information System
Oreochromis niloticus
Nile tilapia
Bacteria
Microprotozoa
Food web
Species list Amphithoe longimana
constructor
constructor Caprella penantis
Cymadusa compta
Lembos rectangularis
Batea catharinensis
Ostracoda
Melanitta
Tadorna tadorna . . .
?
ELVIS’s Food Web Constructor
predicts basic network structure
Prelude to systems models
Food Web
node
S
link
G
Evolutionary tree
A
G
taxon
step
S
taxon
Evolutionary Distance Weighting
1.
2.
3.
Set distance thresholds
Find relatives of target nodes X, Y with known link status
E.g. relative A is close to X, relative B close to Y
where Link Value between A and B is known
For each found link, compute weight based on distance
WeightAB 
4.
1
1  ( DistanceXA  PenaltyXA )  ( DistanceYB  PenaltyYB )
Compute certainty index for a predicted link by combining
weighted link values, with a discount for negative
evidence
N
weighti
CertaintyIdxXY  
( LinkValuei )
i 1 discount
Food web database
Source
Animal Diversity Web
EcoWEB
Webs on the Web
Interaction Web DB
Tuesday Lake
Total
Webs
Nodes
Links
n/a
212
19
26
2
711
4503
1373
2139
101
2165
11967
12056
9882
510
259
8827
36580
4600 distinct taxa
Food web data: Cohen 1989, Dunne et al. 2006, Vazquez 2006,
Jonsson et al. 2005
Evolutionary tree: Parr et al. 2004. + plants from ITIS +
hierarchy of non-taxonomic nodes
Testing the algorithm



Take each web out of the database
Attempt to predict its links
Compare prediction with actual data
Accuracy percentage of all predictions that are correct
89%
Precision percentage of predicted links that are correct
55%
Recall percentage of actual links that are predicted
47%
Choosing parameters




30 web subsample
Representative of habitats, years, # nodes,
percent identified to species
Iterate over parameter settings
Tradeoff between
Precision
percentage of predicted links that are correct
Recall
percentage of actual links that are predicted
Evolutionary distance threshold
2 steps up and 4 steps down
recall
precision
0.6
0.5
0.55
0.45
0.5
0.4
0.45
0.35
S3
0.4
1
2
3
steps up
S1
4
S4
steps
S1 down
0.3
1
2
3
steps up
4
Evolutionary direction penalty
not very sensitive
WeightAB 
1
1  ( DistanceXA  PenaltyXA )  ( DistanceYB  PenaltyYB )
ancestor
descendent
siblings
Negative evidence discount is sensitive
N
weighti
CertaintyIdxXY  
( LinkValuei )
i 1 discount
0.7
0.6
Recall
0.5
0.4
Precision
0.3
0.2
0
25
50
75
Negative evidence discount
100
Results over all webs
Is evolutionary distance weighting
better than strict database search?
100
***
***
80
%
Database search
Evolutionary distance
weighting
60
***
40
20
0
Accuracy
Precision
Recall
Paired T-tests
df=251
***p<0.001
Database search is more precise, but
evolutionary distance wt has better recall.
Older webs contribute
Recall percentage of actual links that are predicted
47%  48% with no EcoWEB data
Precision percentage of predicted links that are correct
55%  39% with no EcoWEB data
large webs have fewer
unknown “taxa”
recent webs are bigger
180
% taxa unknown
number of taxa
160
140
120
100
80
60
40
20
0
1910
1930
1950
1970
1990
90
80
70
60
50
40
30
20
10
0
0
2010
50
150
200
Number of Taxa
Year of study
large webs have better
taxonomic resolution
…but large webs are
harder to predict
120
1.2
100
1
80
Recall rate
% identified to species
100
60
40
20
0.8
0.6
0.4
0.2
0
0
50
100
150
Number of taxa
200
0
0
50
100
Number of taxa
150
200
Some phyla are easier to predict
than others
0.8
0.6
0.5
0.4
0.3
0.2
0.1
Phylum
ol
lu
sc
a
M
a
ho
rd
at
C
ph
yt
a
ac
i ll
ar
io
B
A
rt
hr
op
od
a
nn
el
id
a
0
A
Recall rate
0.7
How can we do better predicting links?
Trait space distance weighting
Euclidean distance in natural history
N-space
Parameterize functions from the literature that might predict
links using characteristics of taxa. For example, size or
stoichiometry.
LinkStatusAB= ƒ(α, sizeA, sizeB), ƒ(β, stoichA, stoichB) …
…need more data
ETHAN
Evolutionary Trees and Natural History ontology
Animal Diversity Web
http://www.animaldiversity.org







geographic range
habitats
physical description
reproduction
lifespan
behavior and trophic info
conservation status
Triples
“Esox lucius” hasMaxMass “1.4 kg”
“Esox lucius” isSubclassOf “Esox”
“Esox” eats “Actinopterygii”
UMBC Triple Shop
Query
What are body masses of fishes that eat fishes?
Enter a SPARQL query
SELECT DISTINCT ?predator ?prey ?preymaxmass ?predatormaxmass
WHERE {
?link rdf:type spec:ConfirmedFoodWebLink .
?link spec:predator ?predator .
?link spec:prey ?prey .
?predator rdfs:subClassOf ethan:Actinopterygii .
?prey rdfs:subClassOf ethan:Actinopterygii .
OPTIONAL { ?predator kw:mass_kg_high ?predatormaxmass }
.
OPTIONAL { ?prey kw:mass_kg_high ?preymaxmass }
}
. . . leaving out the FROM clause
UMBC Triple Shop
Create a dataset
Find semantic web docs that can answer query.
Esox_lucius.owl
webs_publisher.php?
published_study=11
Actinopterygii.owl
http://swoogle.umbc.edu
UMBC Triple Shop
Get results
Apply query to dataset with semantic reasoning.
http://sparql.cs.umbc.edu/tripleshop2/
Summary



Food Web Constructor uses evolutionary
approach and large databases
We chose parameters using subsample
Explored results over entire database





Evolutionary distance weighting recalls links better
than database search
Older webs are useful
Large webs harder to predict
Some phyla are easier than others to predict
For future algorithms, we can gather and
integrate data via ontologies and intelligent
agents
http://spire.umbc.edu
UMBC: Tim Finin, Joel Sachs, Andriy Parafiynyk, Li
Ding, Rong Pan, Lushan Han, UMCP: David Wang,
RMBL: Neo Martinez, Rich Williams, Jennifer
Dunne, UC Davis: Jim Quinn, Allan Hollander
UMMZ Animal Diversity Web: Phil Myers, Roger
Espinosa
UMCP: Bill Fagan, Bongshin Lee, Ben Bederson
ETHAN workflow
Keywords
HTML
Keywords
OWL
XSLT
template
Filters
ETHAN
ADW
taxon acct
HTML
Taxon
acct
OWL
ADW
database
MySQL
Others
ITIS
Animal
name
tree
Acct
data
tabular
text
Taxon Path
OWL
Plants, etc.
SPIRE
taxon
database
MySQL
Evolutionary
Tree side
of ontology
OWL
Phylumsized
ET
chunk
OWL
UMBC
Info. Retrieval Agents
Food Web Constructor
Evidence Provider
U Maryland
UC Davis
Semantic Web Tools
Species List
constructor
Semantic Prototypes In
NASA
Ecoinformatics
Goddard
Invasive Species
Forecasting System
Remote Sensing Data
Rocky Mtn
Bio Lab
Food Webs
Ecological Interaction
Ontologies
Food Web Constructor example
Nile Tilapia in St. Marks
http://spire.umbc.edu/fwc
Question
What are potential
predators and prey of
Oreochromis niloticus in
the St. Marks estuary in
Florida?
Procedure
Submit species list for St.
Marks, with Oreochromis
niloticus added.
Food Web Constructor generates
possible links
Evidence provider gives details
Nile tilapia – what organisms
could be impacted?
Implications:
parameterized functions
LinkPredictedCD = ƒ(α , sizeC,sizeD) + ƒ(β , stoichC,stoichD)




Requires good data for target species
Can incrementally add natural history functions to
get better estimate, try different functions from
literature or use genetic algorithms
Parameterizing functions: multivariate statistics,
machine learning, fuzzy inference
Could use evolutionary info if you localize
parameter estimates to clades or taxonomic
subsets
Distance weighting options

Evolutionary

2 steps
X
Y


Uses phylogeny or classification or
combination of these – assumes
related organisms like each other
Distance could be branch length or #
steps
Does not need natural history data
Ontologies
Richer way to design databases: instances of
concepts that have well-defined meanings and
formal relationships.
“Higher Taxon”
lives in “Australia”
is-a
TaxonA
HigherTaxon
is-a
Reproductive
Characteristic
TaxonB
“Taxon
A” lives in
“Taxon
A”
“TaxonA”
“Australia”
hasAgeOfSexualMaturity
hasBreedingDuration
“1
“5year”
months”
“Taxon B” lives in
“Australia”
is-a
Breeding
Season
has-a
Breeding Duration
is-a
Sexual
maturity
has-a
Age of Sexual
Maturity