312 - UMBC ebiquity research group

Download Report

Transcript 312 - UMBC ebiquity research group

Using the Semantic
Web to Support
Ecoinformatics
Andriy Parafiynyk
University of Maryland, Baltimore County
http://ebiquity.umbc.edu/paper/html/id/319/Using-the-Semantic-Web-toSupport-Ecoinformatics
Joint work with Tim Finin, Joel Sachs, Cynthia Sims Parr, Rong Pan, Lushan Han,
Li Ding (UMBC), Allan Hollander (UCD), David Wang (UMCP)
UMBC
an Honors University in Maryland
 This research was supported by NSF ITR 0326460
and matching funds received from USGS National Biological Information Infrastructure
1
Invasive Species
• Invasive species cost the U.S. economy over
$138 billion per year [1].
• By various estimates, these species
contribute to the decline of 35 to 46 percent of
U.S. endangered and threatened species
• The invasive species problem is growing, as
the number of pathways of invasion
increases.
[1] Pimental et al. 2000 Environmental and economic costs associated with non-indigenous species in the United
States. Bioscience 50:53-65.
[2] Charles Groat, Director U.S. Geological Survey, http://www.usgs.gov/invasive_species/plw/usgsdirector01.html
UMBC
an Honors University in Maryland
2
Currently most common ways of dealing with
data among biologists:
• Journal articles
• Excel spreadsheets
• Local databases
• Some information is on-line in HTML/XML
UMBC
an Honors University in Maryland
3
Semantic Web can offer:
• Ontologies to arrive to a common vocabulary and
define exactly what is what across disciplines
(multiple ontologies with mappings possible)
• Constant on-line data availability with convenient
ways of data acquisition and processing
• Data discovery (Swoogle)
• Data integration from different sources, queries
on data from multiple sources
• Expanding the knowledge base by inferencing
• Data can be easily updated or added, users
notified
UMBC
an Honors University in Maryland
4
Collect data
OR
Find data tables in literature
or data registry
OR
Email author of data
Develop intelligent query
for semantic web data
Massage data manually
Build automatically updating
dynamic dataset
Create local spreadsheet
Download to local spreadsheet
Run analyses
Run analyses
Publish paper
Publish paper
Post supplemental data file on web
Write up metadata record
Register dataset with data registry
Start over for next project
UMBC
an Honors University in Maryland
Green: data gathering;
(Query and data already
publicly available)
Reanalyze using latest dataset
Pink: data integration and manipulation
5
An NSF ITR collaborative project with
• University of Maryland, Baltimore County
• University of Maryland, College Park
• U. Of California, Davis
• Rocky Mountain Biological Laboratory
UMBC
an Honors University in Maryland
6
Food Webs
• A food web models the trophic (feeding)
relationships between organisms in an ecology
– Food web simulators are used to explore the
consequences of changes in the ecology, such as the
introduction or removal of a species
– A locations food web is usually constructed from studies
of the frequencies of the species found there and the
known trophic relations among them.
• Goal: automatically construct a food web for a new
location using existing data and knowledge
• ELVIS: Ecosystem Location Visualization and
Information System
UMBC
an Honors University in Maryland
7
East River Valley Trophic Web
UMBC
http://www.foodwebs.org/
an Honors University in Maryland
8
Species List Constructor
Click a county, get a species list
UMBC
an Honors University in Maryland
9
The problem
• We know which species exist in the location and
can further restrict and fill in with other ecological
models
• But we don’t know which of them might be eaten
by a potential invasive, or which might eat the
invasive
• We can reason from taxonomic data (similar
species) and known natural history data (size,
mass, habitat, etc.) to fill in the gaps.
UMBC
an Honors University in Maryland
10
Food Web Constructor
Predict food web links using database and taxonomic reasoning.
UMBC
an Honors University in Maryland
In an new estuary, Nile
Tilapia could compete
with ostracods (green)
to eat algae. Predators
(red) and prey (blue) of
ostracods may be
affected
11
Evidence Provider
Examine evidence for predicted links.
UMBC
an Honors University in Maryland
12
ELVIS
• Final goal:
ELVIS
(Ecosystem Location Visualization and
Information System) as an integrated set of
web services for constructing food webs for a
given location.
UMBC
an Honors University in Maryland
13
Background Ontologies
• SpireEcoConcepts:
–
–
–
–
confirmed and potential food web links
bibliographic information of food web studies
ecosystem terms
taxonomic ranks
• California Wildlife Habitat Relationships Ontology
– life history
– geographic range
– management information
• ETHAN (Evolutionary Trees and Natural History)
Concepts and properties for ‘natural history’
information on species derived from data in the
Animal diversity web and other taxonomic sources
UMBC
an Honors University in Maryland
14
Data representation:
ETHAN Ontology
• ethan_animals.owl: phylogenetic information about
organisms
• ethan_keywords.owl: geographic range, habitats,
physical description, trophic information,
reproduction, lifespan, behavioral information,
conservation Status
• Information in triples:
– “Esox lucius” is a subclass of “Esox”
– “Esox lucius” has max mass “1.4 kg”
– “Esox” eats “Actinopterygii”
UMBC
an Honors University in Maryland
15
Using ETHAN and OWL inferencing
to predict success of invasive species
• Known food web links: rabbit eats carrot
yummy!!!
• What about hare?
yummy???
Yes with high probability since both are subclasses of the same class in
taxonomic hierarchy, have same habitat etc
UMBC
an Honors University in Maryland
16
• http://swoogle.umbc.edu/
• Running since summer 2004
• 1.8M RDF docs, 320M triples, 10K ontologies,
15K namespaces, 1.3M classes, 175K properties,
43M instances, 600 registered users
UMBC
an Honors University in Maryland
17
Applications and use cases
1 Supporting Semantic Web developers
– Ontology designers, vocabulary discovery, who’s using
my ontologies or data?, use analysis, errors, statistics, etc.
2 Searching specialized collections
– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
3 Supporting Semantic Web tools
– Triple shop: finding data for SPARQL queries
UMBC
an Honors University in Maryland
18
1
Search for ontologies
which contain this terms
UMBC
an Honors University in Maryland
19
746 ontologies were found that
had these two terms
By default, ontologies are ordered
by their ‘popularity’, but they can
also be ordered by date or size.
UMBC
an Honors University in Maryland
20
We can also search for
any RDF documents
containing these terms
UMBC
an Honors University in Maryland
21
5,378 documents were found
that had these two terms
UMBC
an Honors University in Maryland
22
UMBC Triple Shop
23
• http://sparql.cs.umbc.edu/tripleshop2/
• Finding datasets in the absence of the FROM
clause
• Constraints by URI domain or namespace
(more coming)
• Reasoning (none/rdfs/owl)
• Dataset persistence: queries and results can be
saved, tagged, annotated, shared, searched for,
etc.
UMBC
an Honors University in Maryland
23
Swoogle Triple Shop
What are body masses of
fishes that eat fishes?
. . . leaving out the FROM clause
UMBC
an Honors University in Maryland
24
specify dataset
UMBC
an Honors University in Maryland
25
RDF documents were
found that might have
useful data
UMBC
an Honors University in Maryland
26
We’ll select them all
and add them to the
current dataset.
UMBC
an Honors University in Maryland
27
We’ll run the query
against this dataset
to see if the results
are as expected.
UMBC
an Honors University in Maryland
28
The results can be
produced in any of
several formats
UMBC
an Honors University in Maryland
29
Results
http://sparql.cs.umbc.edu/tripleshop2/
UMBC
an Honors University in Maryland
30
Looks like a useful
dataset. Let’s save it
and also materialize
it the TS triple store.
UMBC
an Honors University in Maryland
31
Contributions
• OWL ontologies for ecoinformatics domain
– data representation
– data sharing
– inferencing
• OWL data discovery
• Ability to automatically construct datasets
relevant to the query
• Dataset storage/sharing
UMBC
an Honors University in Maryland
32