Semantic Interoperability for Geographic Information Systems

Download Report

Transcript Semantic Interoperability for Geographic Information Systems

Semantic Interoperability for
Geographic Information Systems
The Illinois DLI Project
Tobun Dorbin Ng
Artificial Intelligence Lab
http://ai.bpa.arizona.edu
The University of Arizona
DLI Project-wide Workshop (Berkeley)
January 5-6, 1998
Challenge for DLs
• Information Infrastructure Technology and
Application (IITA) Working Group
– workshop in May 1995
• The “Grand Challenge”
– interoperability at a deep semantic level
– providing DL users with a coherent view of
heterogeneous autonomously managed resources
Semantic Interoperability
• “The ability of a user to access, consistently and
coherently, similar (though autonomously defined
and managed) classes of digital objects and
services distributed across heterogeneous
repositories, with federating or mediating software
compensating for site-by-site variations.”
• Provides systems for cross-correlating items of
information across multiple sources to solve
problems
An Architecture for
Scalable Semantic Interoperability
Agent
Interface
Semantic
Analysis
• Reasoning
by
Spreading
Activation
• Browsing
Data
Management
Concept
Communication
Information
Consultant
Concept Spaces
Level of Semantic
Abstraction
fine-grained
coarse-grained
Category Spaces
Create
Access
Data
Communication
Multimedia
Collaboration
GUI
Communication
Semantic Interoperability Environment
Full Text
Abstract
Text
Landsat
AVHRR
DEM
Aerial Photo
Image
Video
Voice
Structured
Types
Distributed, Heterogeneous Database Collections
Semantic Components
• Structure: nodes and links
Concept Spaces
Fine-grained
Concepts
Categories
Category Spaces
Coarse-grained
Geographic Information
Systems Testbed
• GeoRef Information Services, American
Geological Institute (AGI)
– 350K records, 400 Mbytes, 1981-1995
– GeoRef Thesaurus, 27K terms
– Geo-referenced records
• Petroleum Abstracts Service, University of Tulsa
– 500K records, 400 Mbytes, 1984-1995
• Compendex, Engineering Information, Inc.
– 22K records, 50 Mbytes, 1992-1995
– 42 geoscience-related domain areas
GIS Testbed (cont’d)
• Aerial Photos, UC at Santa Barbara
– 1000 images, 32 Mbytes each
– Geo-referenced
• Advanced Very High Resolution Radiometer
(AVHRR), NASA
– 1993 global data, 2 Gbytes
– Geo-referenced
• Geographic Name Information System (GNIS),
US Geological Survey
– 56K place names & their geographic coordinates
Semantics in Text
• Term Phrases as Concepts
• Automatic Indexing
– Extract term phrases from unstructured free text
– Form term phrases from adjacent words
– Apply stopwords
• Structural Fields
– Pre-assigned indices
– Author names
• Vector Space Model
– Term & document frequencies
Semantics in Image
• Image Tiles and Regions as Concepts
• Create Image Tiles: 128x128 pixel subsets
• Extract Features using Gabor Filters
– Gabor filters: scale tunable edge & line detectors
– Apply in 6 orientations & 5 scales
– A tile: 30 pairs of means and standard deviations
• Segment Image using Texture Flow Analysis
– Group adjacent tiles with similar textures
– Determine texture flow with direction & energy
– Define boundaries by opposite orientations
Semantics in
Satellite Numerical Data
• AVHRR Data from NASA’s Pathfinder
– Afternoon observations over all land and coastal zones
– Spatial resolution: 8 km
– 5 channels of electromagnetic spectrum
• Vegetation Density as Concept
– Normalized Difference Vegetation Index (NDVI)
– Non-vegetation (-1.0) to green vegetation (+1.0)
• Temperature as Concept
– Convert the radiances from channels 4 & 5
• Use GNIS to name each 8-km unit
Semantic Analysis:
Concept Space
• Co-occurrence Analysis Algorithm
ClusterWei ght(T j , Tk ) 
ClusterWei ght(Tk , T j ) 
d ijk  tf ijk  log
d ij  tf ij  log
in1 dijk
in1 dij
in1 d ikj
in1 d ik
 wf (Tk )
X j  Xk
 wf (T j )
X j  Xk
N
df jk
N
df j
wf (T j ) 
log( N df j )
log N
Xj
Xk
Semantic Analysis:
Category Space
• Kohonen Self-organizing Maps (SOM) Algorithm
– Initialize input & output nodes, connection weights
– Present record (vector of N features) in order
– Compute distances to all nodes:
N 1
d j   ( xi (t )  wij (t )) 2
i 0
– Select winning node j* (minimum dj) & update weights
to node j* and neighbors
– Label regions in category space
– Apply the above steps recursively for large regions
System Implementation
• Analysis using 32-node SGI Origin2000
–
–
–
–
Textual concept spaces: 15 hr, 32 nodes
Feature Extraction: 100 images, 24 hrs, 32 nodes
Texture category space: 28 images, 6 hrs, 1 node
AVHRR category space: California, 2 hrs, 1 node
• Web Interface
– Java front-end
– CGI-bin servers for all information retrieval
– Server size: 7 Gbytes
• Text (2.5 Gbytes), image (4 Gbytes), AVHRR (0.5 Gbytes)
User Study 1:
Textual Concept Space
• Concept vs. Keyword Search
– 12 subjects with geoscience backgrounds
– Each subject performed 4 searches using both
• concept search: use concepts to retrieve documents
• keyword search: use keywords to retrieve documents
– Decisions judged by a subject expert
• Recall of concept search (53%) was significantly
better than that of keyword search (37%)
• Precision of concept search (38%) was no worse
than that of keyword search (36%)
Textual Category Space
User Study 2:
Textual Category Space
• Browse 2-dimensional hierarchical category space
– 12 subjects with geoscience backgrounds
– Qualitative study
• Positive feedback:
– Spatial factor & color
– Beneficial to non-experts
– Novelty of graphical representation
• Negative feedback:
– No search capability
– No systematic organization of terms
User Study 3: Image Analysis
• 3 experiments
– Similarity Analysis: human visual perception vs.
Euclidean distance on Gabor features
– Segmentation: human vs texture flow analysis
– Categorization: human vs SOM algorithm
• 10 subjects in each experiment
• 10 images used, each has 192 tiles
• Decisions judged by a remote sensing expert
User Study 3 (cont’d)
Similarity Analysis
Segmentation
Categorization
Recall
subj
78%
60%
40%
sys
66%
53%
42%
Precision
subj sys
43% 48%
67% 53%
35% 34%
• Positive findings:
– System is as good as human in retrieving images
– Set of Gabor features is a good representative of texture
• Room for improvement:
– Need other low level image features (shape, contrast)
– Need a better similarity measure