Slides - Academia Sinica
Download
Report
Transcript Slides - Academia Sinica
Spatiotemporal Infrastructure for
Semantic Network in Digital
Archives
Eric Yen
Computing Centre, Academia Sinica
Dec, 2002
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Outline
Introduction
NDAP Approaches – Space-Time-Language Coordinates
Archiving and processing of millions of geospatial materials
in AS
Characteristics
How to delve into the knowledge level
Experiences & Lessons we learned
Extend to more general solution
Geolibrary
The Trends
Conclusions
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Introduction to Digital Archive
Digital Archive is a collection of digital objects.
A digital object is defined as something (e.g., an image, an audio
recording, a text document, a movie, a map) that has been
digitally encoded and integrated with metadata to support
discovery, use, and storage of those objects.
Goals for Digital Archive (functional point of view)
Protection of the original
Duplication for safety
Search and Retrieval
Easy Access
Resource Sharing
Lower cost of maintenance and dissemination
Max. flexibility for integration of heterogeneous/homogeneous
information resources
Providing abundant resources for knowledge discovery and knowledge
construction
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Knowledge Discovery and Construction
Knowledge construction means the active process of
manipulating data to arrive at abstract models of relationships
among phenomena in the world that facilitate our
understanding of those phenomena and, ultimately, of the
world. [1]
Knowledge discovery is a nontrivial process of identifying
valid, novel, useful, and understandable pattern in data. [2]
Persistent cataloging, classification, and segmentation of
digital objects is the ground for finding patterns, models, and
trends of large volume data.
Reference:
1. MacEachren, A. et al, Constructing knowledge from multivariate spatiotemporal Data: integrating
geographic visualization with knowledge discovery in database methods
2. Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., 1996, From data mining to knowledge discovery:
An overview. In advances in Knowledge Discovery and Data Mining, pp.1-34.
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Types of Elementary
Knowledge Organization Systems
Classification Systems
Ontologies
Taxonomies
Index Languages
Thesauri and other controlled lists of keywords
Glossary
Dictionaries
Clustering Approaches
Lexical Databases
Concept Maps/Spaces
Semantic Road Maps
…
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Why Knowledge-based
Approach for Digital Library ?1
Providing “Conceptual Infrastructure”
Mapping out the conceptual structure and providing a common language for a field
Providing classification/typology and concept definitions. Clarifying concepts by putting
them into context. Thus providing orientation and serving as a reference tool for individual
researchers and practitioners and thereby
Assisting with the exploration of the conceptual context of a research problem and in
structuring the problem, thereby providing the conceptual basis for the design of good
research, for the consistent definition of variables, and thus the cumulation of research
results.
Providing the conceptual basis for the exploration of the various aspects of a program in
program planning, in the identification of approaches and strategies, and in the
development of evaluation criteria
Assisting users in understanding context
Assisting information providers with conceptualizing a topic and with finding
the proper term
Discovery of high quality resources
Providing frameworks for information exchange and resource interoperability
Dagobert Soergel, Evaluation of Knowledge
Organization
(KOS)and Digital Libraries
2002APEC
WorkshopSystems
on e-Learning
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Why Knowledge-based
Approach for Digital Library ?2
Information Storage & Retrieval
Information system(s) in which the vocabulary is to be used
Use of the vocabulary
Vocabulary control in indexing and searching (controlled vocabulary)
Vocabulary control only for searching. Assist with clarifying a search topic and
assembling all applicable concepts and terms, whether searching with a controlled
vocabulary of free-text.
ISAR technique(s) (such as: printed index, computer search system). Support of
inclusive (hierarchically expanded) searching
Automated vs. manual indexing or query formulation. Approach to indexing to be
supported: Request-oriented vs. entity-oriented
Techniques for eliciting user needs (e.g., menu based on search tree; questions based
on facet structure)
Summary evaluation of the vocabulary's adequacy for the stated purpose on the
more detailed analysis as outlined below.
Translation
Language learning
2002APEC
WorkshopSystems
on e-Learning
Dagobert Soergel, Evaluation of Knowledge
Organization
(KOS)and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Digital library requirements for
knowledge organization schemas
The need for knowledge organization in subject gateways
and discovery services, issues of application and use
Web-based directory structures as knowledge
organization systems
Knowledge organization as support for web-based
information retrieval, query expansion, cross-language
searching
Semantic portals
ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Digital library requirements for
knowledge based data processing
Knowledge organization for filtering, information
extraction, summary
Knowledge organization support for multilingual systems,
natural language processing or machine translation
Structured result display, clustering
End-user interactions with knowledge organization
systems, evaluation and studies of use, knowledge bases
for supportive user interfaces, visualization
ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Digital library requirements for
knowledge structuring and management
Suitable vocabulary structures, conceptual relationships
Comparison between established library classification
systems and home-grown browsing structures
Methodologies, tools and formats for the construction and
maintenance of vocabularies and for mapping between
terms, classes and systems
Frameworks for the analysis of assumptions and
viewpoints underlying the construction and application of
terminology systems
Methods for the combination and adaptation of different
vocabularies
ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Digital library requirements for access to
knowledge structures
Data exchange and description formats for knowledge
organization systems, the potential and limitations of
XML and RDF schemas
Handling of subject information in metadata formats
Standards and repositories for machine-readable
description of networked knowledge organization
schemas (as collections/systems)
Interoperability, cross-browsing and cross-searching
between distributed services based on knowledge
organization systems
Distributed access to knowledge organization systems:
standard solutions and protocols for query and response,
taxonomy servers
ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Discover Knowledge from
Digital Archive
Geospatial information means those geo-materials that are
georeferenced and having well-documented metadata
Ref. Components of a digital object in digital archive
Geospatial Content Based
Extracting knowledge by space-time-language
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Knowledge about Space
Temporal Characteristics is embedded and could not be neglected
Acquisition
Direct Experience
Locomotion thru environment(crawling, walking, running, bicycling, driving,
flying, etc.)
Stationary viewing
Secondary Environmental Experience
Static medium: maps, diagrams, paintings, photos, etc.
Dynamic medium: animate static visual figures to show changes over time
Other ways to conceive those that can not be viewed
Characteristics
Multimodal: proprioceptive, kinesthetic, auditory, visual, etc.
Language is often used to convey spatial information
Multi-perspective and scales
充分瞭解人類獲取、整合與利用空間資訊模式,將可促進此類
資訊的更有效利用,以及建立更符合實際需求的應用機制(e.g.,
aid for decision making)
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Spatial Representation in GIS
Data Model
Vector: explicit
Basic elements: point, line and polygon
Raster: implicit
Geographic space is organized into partitions (layers)
Space-dominant representations focus on the spatial arrangement
of entities based on the geometric and thematic properties of
these entities.
Space is a neutral container
Entities only exist when associated to a layer or theme
Applied primarily in traditional mapping
Layer-based raster and vector models
Each layer is associated to a period or point in time
Change- or update-based scenario
Analysis based on similarity or dissimilarity between aggregations
(layers) at different points of time
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Why Thinking in Spatio-Tempoal
ways?
Because the earth is running: It’s
incomplete to describe an events/object in
spatial domain only.
Learn from the past, and plan for (predict)
the future.
Characteristics of Space & Time
Importance
To organize space over time
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Discover Knowledge from
Geospatial Information
Geospatial information means those geo-materials that are
georeferenced and having well-documented metadata
Ref. Components of a digital object in digital archive
Geospatial Content Based
Feature Identification
Feature comparison: enhance the likelihood of relationships among
features
Feature interpretation: merge the identified features and their
relationships with real world entity, by domain knowledge
Linking to other resources that are related to this feature, this place
and the time parsing the collected information from metadata or
lexical analysis
Demands
Link spatiotemporal data analysis techniques to GIS
Feature interpretation tools must provide connections between abstract representations
of data, metadata that describe those data, an analyst’s
knowledge,
andonknowledge
2002APEC
Workshop
e-Learning and Digital Libraries
sources external to the data set being explored (e.g., thru digitalAcademia
library) Sinica, Taipei, Taiwan. Dec. 16-20
Discover Knowledge from Geospatial Information
Feature Identification
Def: Finding instances of identifiable features in spatiotemporal data
Emphasis is on examining the distribution of data in all of its dimensions in an effort to
notice any distinct object, regularity, anomaly, hot spot, etc.
Example:
Distribution of Tombs in
Han Dynasty
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Integrated Support for
Research
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
WebGIS-based System Architecture
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Challenges of
Geospatial Information Processing
High threshold for general users
Hard to find required geospatial content/service
New retrieval technology for geospatial
information
Persistent metadata and archive
Mechanism for effective management of huge
volume of data set
Efficient ways for digitization/vectorization of
geospatial materials
Integration with other information resources
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Discover Knowledge by
Space-Time-Language Coordinates
Constructing the linkage among diversified archives thru
language (vocabulary)
Lingual coordinate has both spatial and temporal extents
Lingual-Temporal Plane: evolution of language thru time
Lingual-Spatial Plan: spatial distribution in dialect
Multi-lingual support for digital archive
Establishment of domain-specific controlled vocabulary sets,
and serve as basis of ontology
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Discover Knowledge by
Space-Time-Language Coordinates
Time
Space
Language
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Space, Time and Language Coordinates for Digital Archives
Time
Historical
GIS
Space
Digital
Archives
Language
in Time
Language Changes
Language
in Text, in
Speech...
Language
Language
in Space
Language variations
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Lingual Coordinate in NDAP
A lexis/vocabulary in context is analogy to the basic unit of a concept in knowledge
Lexis is the basic unit for any kind of language process, such as recognition, parsing,
wordformation, semantics, conversation and analysis
Thru lexical analysis, collection of all the lexical types(詞類), lexical patterns(grammar文法),
and instances could pave the base as lingual coordinate.
Collection of enough description(context incl. metadata) for a specific domain(could be a set of
digital objects), ontology(collection of concepts for the domain) of that field is constructed.
How do we know if that is enough? Need the self-learning capability in the mechanism
Atomic attributes of a place name
Name
Glyph & stroke: original writing, all the historical and contemporary writing, and Romanization(pinyin)
Pronunciation: indigenous and evolutions afterward
meaning (if we could restore to original fonts & sound)
Footprint
Could be ambiguous: M N
Time: (start, end), could be vague for historical names
Type: (geographic type, also could know the administrative level if it represents an administrative
area)
Atomic attributes of a datum
People, event, time, place, object
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Constructing Space-Time-Language Coordinates for NDAP
Geographic searching is a powerful and important tool
More than 80% information resources pertain to specific geographic areas and are either
explicitly or implicitly geo-referenced.
To utilize benefits of geographic search, we have to geo-reference information contents first.
the cost of creating geographic footprints for each record (the Alexandria Digital Library Project
spent $4m over four years) is very high. The automatic extraction of geo-referenced information
is also possible but there is a need for sophisticated tools that go further than geographic name
extraction.
Moving from information management toward knowledge management
(Demands) New ways of information search & retrieval
Traditional full-text search
Keyword-based or query by example search
Query by information content (image, audio, video, and multimedia contents)
Incorporation of geographic & temporal search
Versatile ways for presenting information & knowledge
2D, 3D, or 4D
Multimedia, virtual reality
Map-on-demand, thru the parser of geographic names from context, or directly by the coordinates
Separation of content representation & presentation
The core is the metadata-based content analysis
CA(Information Content)Metadata Schemes for management of contents
Identify the best way of information representation and become persistent archive
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
中國歷史文化地圖之整合應用
清代地方誌檢索
漢籍全文檢索
圖書聯合目錄
查詢
人物資料庫查
詢
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Roles of Visualization in
Knowledge Discovery
Role
Useful in finding holes or errors in data sets
Useful for noticing abstract features and patterns
Predigest complex relations of data sets into visual form
Facilitate access to multiple perspectives on information, thru
interactivity
Facilitate decisions on appropriate model representation during
analysis stage.
Process tracking: uncover key aspects of a process
Parameter control to get corresponding outcome on the fly
Functionality
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Geolibrary
Objective: Lower the barriers for applying GIScience
technologies
Approaches
Collecting and providing basic georeferenced spatial data/knowledge
persistently
Building up application environment and tools for utilization of
spatiotemporal knowledge and technologies
Development of spatiotemporal-based technologies for multi-disciplinary
contents integration, aggregation, knowledge discovery in map-metaphor
Focus & Approach
Construction of the System Infrastructure for Spatial and Temporal
Information Technology
Development of Core Technology
Establishment of Effective Service Model for Research Support
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Clearinghouse
An instance of implementation of interoperability
Functionality
Locating the required resources/services
Maintaining a persistent catalog of resources/services for
sharing
Exchange of information content
Format transformation
Clearinghouse (catalog)
Metadata
Framework GEOdata
Standards
Partnerships
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Effective Management System for Huge Volume
of Data
Remote sensing data: 2TB/day;And will accumulate to 5 Peta
Byte in 2005。
According to the statistics of EU Space Center
Raw data from satellite : 100GB/day, 500GB/day (after Feb. 2002)
800 TB data had been archived
Big Challenge of IT for cataloging, searching, retrieval,
management, identification, knowledge discovery, and integration、
Trading off between decentralization and consolidation on cost,
Convergent to multi-centers of information resources in Internet
Think about how to facilitate the collaboration among those centers –
Community and virtual organization
Demands for complete architecture and services Data Grid
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
What’s the Solution
Support sharing and coordinated use of diverse resources in
dynamic “virtual organizations” – Grid !
Good technical solutions for key problems, such as
Security enhancement like authentication and authorization
Resource discovery and monitoring
Reliable remote service invocation
High-performance remote data access
-- Grid !
Good quality reference implementation, multi-lingual support,
interfaces to many systems, large user base, industrial support,
etc. – Grid !
Persistent Web Services – Grid !
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Measuring Success
High degree of component autonomy
Low cost of infrastructure
Ease of contributing components
Ease of using components
Breadth of task complexity supported by the approach
Scalability in the number of components
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20
Conclusions and Future Work
Building the right infrastructure will be crucial
Intersection of spatiotemporal coordinates and lingual
coordinate constitutes a good framework both for knowledge
extraction and interoperability
Consensus gathering and technology development still the
major challenges for interoperability
Open System, Open Standard, and Open Source
2002APEC Workshop on e-Learning and Digital Libraries
Academia Sinica, Taipei, Taiwan. Dec. 16-20