Representing Culture Historical Temporal Information In Relational

Download Report

Transcript Representing Culture Historical Temporal Information In Relational

UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Representing Temporal Information
in Cultural Historical Databases
19.5.2009
TIES444 Software Engineering Seminar
Miika Nurminen ([email protected])
University of Jyväskylä
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Outline
•
•
•
•
Motivation
Representing temporal information in Duo & Arte
Problems with current representation
Towards a generic model for representing uncertain
temporal information in a relational database
• Alternative approaches
• Conclusion
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Motivation
• Culture historical information provides a rich and
challenging domain for data management, both from
temporal and general perspective
– A multitude of complex metadata can be attached to a given object
– A combination of “well-formed” (relatively static, precise, wellknown) and ambiguous (uncertain, imprecise) information
– Standards for representing the information exist (CIDOC-CRM,
MuseoSuomi, etc), but in practice the field is scattered – the
databases used in museums are not interoperable in general
– In small museums, paper may still be used for cataloging (and
even in museums that have a computer system – as a backup)
• From a time ontology perspective, flexible, expressive,
and easy-to-use –structures that allow incomplete and
imprecise information but still support querying are needed
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Collection management systems in
JYU Museum
• JYU Museum uses two database-based client/server applications for
collection management and museology student projects:
– DUO photographs, recordings, books and other objects (in use since
2003, includes ~28000 items)
– ARTE for works of art (in use since 2006, includes ~1000 items)
• Other applications related to collection management (e.g. image
processing, web publishing) are also used, next-generation systems
(e.g. IDA) are in development
• DUO & ARTE use separate databases, but share most of the code in
reusable components (DB management, GUI components, search
engine).
• The databases have parts with identical or nearly similar structures
(e.g. persons, exhibitions, temporal information)
http://sovellusprojektit.it.jyu.fi/tare/dokumentit/kayttoohje/kayttoohje.html
http://users.jyu.fi/~minurmin/duo/
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Representation of temporal information in
Duo & Arte
• Standard database DATE type is used for some ”certain” dates (e.g.
logging and modification information exhibition dates, check out dates)
• A custom tblAika table is used for most of the collection metadata –
related information.
• Depending on the metadata field, user can see only years and interval
marks. For more specific fields, days and months can be edited as well.
• Any field can be left empy
• Interval mark introduces a number of conventions that can not be easily
be reflected in searches (e.g. ”-”:normal, ”n”:about, ”-luku”:decade, etc.
If the mark is left empty, only the beginning date should count.
• In practice, the
potential
semantics in
interval mark is
not accounted
for in queries
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
DUO example form with time interval
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Querying temporal information in Duo/Arte
• For precise information (dates expressed in DATE datatype), exact
match, or a given upper/lower bound can be used
• For imprecise information, three search options are provided
– Unbounded interval: matches if either (or both) end of the interval is within
the query, includes 0-years in interval
– End points: like unbounded interval, but a nonzero value must match the
query (i.e. does not include 0-years)
– Bounded interval: matches if both ends of the interval are within the query
• In result list, start date, inverval mark, and end date are compressed to
one field. For technical reasons, unspecified (interpreted as unbounded)
years are shown as zeroes.
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Problems in current approach
• Despite a few clean-up attemps, semantics in
- -- - - - ?? - ??- interval marks (and the words used) are not easily 1960 - l - l n.- lahj - n / /huhti
controlled
/kevät ? -? -> 0 1937
-1998 2001 2002 7 alk
• Same query cannot be used for both precise
-alku -alkup elokuu
(DATE) and imprecise (tblAika) data fields
ennen helmikuu
• No standard convention enforced to present
huhtikuu -jälkeen
joulukuu kesä kevät
”points” in time in db table –based approach. By
kevätlk -l -l ? -l
convention, a start date no interval mark can be
alku -l ap -l n ? -l
interpreted as a point. However, this has not been n. -l vaihde -l.alk l.lop -l? -loppu used consistently.
luku -luku- -luku? • Definitions and user interface for different types of lukujen v -luvulta temporal queries is not intuitive to end users
luvulta? -luvun lop luvut maaliskuu
• Although the time representation in db is of
marraskuu n n- n asti
general-purpose, it does not support a lifecyclen? noin -noin noin-?
based approach for object documentation (i.e. time syksy syyslukuka
tammikuu -vaihde
information is ”hard-wired” to objects to specific
metadata fields, but cannot be used in an
extensive way with user-defined roles like CIDOC) Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Requirements for a new temporal model
•
•
The goal is to generalize the temporal model such that both precise and
imprecise information including both points and intervals are accounted for in
same structure
Could utilize ideas from time ontologies (e.g. query operators), but the
representation should be physically in relational database.
– Ease of integration to existing applications – minimize 3rd-party component usage to
keep the application as self-contained and easy to install as possible
– Performance – temporal information is used in almost all end-user specified queries
and reports
•
Object lifecycle could be utilized in time information using a new, extensible
role table that includes information about the metadata field used
– Similar approach is already used with manufacturer roles (i.e. a person
manufacturing an ”item” in DUO database can be photographer, artist, director,
writer, etc)
– Eases integration with CIDOC-CRM metadata
•
User interface issues (e.g. visual component for temporal queries?)
tblKappale
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
Alternative approaches
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
• Integration with domain specific ontologies (CIDOC, MuseoSuomi, etc)
– Each object in DUO database should have at least partial representation in
another datastore
– Semantic annotation of collection items is time-consuming and even if only
temporal information and ID codes were transferred, system becomes
essentially more complex
• Utilization of general-purpose time-based ontologies (OWL Time, etc)
– Requires integrating new software components to application (e.g. RDF
database frontend (Jena), inference engine (Pellet), transformation and
updating existing data
– Highly sophisticated approach and ideal for research (especially in
semantic web track), but even more complex than CIDOC approach
– Most of the information in time ontology might not be needed in this
particular application
– RDF databases and query languages are not yet as mature and stable
technology as relational databases
• Utilizing a different computational model for time representation
– Fuzzy logic or probabilistic models might be effective for representing
uncerstain temporal information – work well with general uncertain data
anyway
– Might end up as relatively simple model in theory, but customized
processing is needed to specify and represent the time information
Software Engineering Seminar
UNIVERSITY OF JYVÄSKYLÄ
DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY
Conclusion
• Culture historical information provides a rich and
challenging domain for data management, both from
temporal and general perspective
• Collection managemenent systems in JYU Museum were
introduced and problems with representing and retrieving
temporal information were identified
• A new temporal model accounting different representations,
uncertainty, and object lifecycle was roughly sketched
• The new model should be applied directly in relational
database. Alternative, non-db approaches were evaluated
but were considered too complex or immature to be used in
production environment
• The model must be specified in more detail along with
potential user interface in cooperation with end users
• Transformation from production database should be
carefully planned
Software Engineering Seminar