Main presentation title goes here.

Download Report

Transcript Main presentation title goes here.

Using Pivots to Explore
Heterogeneous Collections
A Case Study in Musicology
Daniel Alexander Smith
8 December 2009
musicSpace
http://mspace.fm/projects/musicspace
• IAM Group, School of Electronics and Computer Science
• Music, School of Humanities
2
Outline
• How musicologists use data
• Limitations of existing approaches
• Our data extraction and integration methodology
• Interface walkthrough
3
musicSpace Tasks
• Triage data partners sources
• Extract information
• Map data sources to schemas/ontologies
• Produce interface over aggregated data
• Customise interface based on feedback
4
Data in Musicology
5
Musicologists consult many data sources
6
. . . but what if they could use just one?
7
Intractable research questions
• Which scribes have created manuscripts of a composer’s
works, and which other composers’ works have they
inscribed?
• Which poets have had their poems set to music by
Schubert, which of these musical settings were only
published posthumously, and where can I find recordings of
them?
• Which electroacoustic works were published within five
years of their premier?
8
Why they are intractable (1)
• Need to consult several sources
• Metadata from one source cannot be used to guide searches
of another source
• Solution: Integrate sources
9
Why they are intractable (2)
• They are multi-part queries, and need to be broken down
with results collated manually
• Requires pen and paper!
• Solution: Optimally interactive UI
10
Why they are intractable (3)
• Insufficient granualrity of metadata and/or search option
• Solution: Increase granularity
11
Metadata
Extraction
12
Previous work
• Comb-e-chem modelled Chemistry data
• We use similar approach
• Translated this work to the arts
• Musicology modelled using Semantic Web technologies
13
Musicology Data Sources
• Disparate data
• How to pull them together and view on demand
14
musicSpace Data Partners
British Library Music Collections
British Library Sound Archive
Cecilia
Copac
RISM (UK and Ireland)
Public





Commercial
 Grove Music Online
 Naxos Music Library
 RILM
Future?
 Alexander Street Press Music
Online
 CHARM
 ‘Personal’ datasets
15
Data and Info Management problems
• Sources allow searching, but not over everything
• Data export (MARC typically) shows extra fields, e.g.
characters in opera, document types hidden amongst
metadata
• Sometimes viewable on original site, but not searchable
• Offering extracted metadata already a benefit with one
source
16
Grove Extraction Example
• More complicated, as Grove is a full text encyclopaedia
• Some digitisation via Grove Music Online
• Weak semantic metadata extraction
• Thus we performed some data entry
17
Grove Works Lists Source Data
18
Works List Metadata Tool
19
Data Integration
Integration
• Domain Expert + Technologist partnership
• This will be case for some time now
• Technology to best automate tasks to make domain expert’s
job less onerous
21
Metadata mapping
• Domain experts devise single schema
• Provide mappings of fields in a particular data source to
that unified schema
• Enables an interface across all sources
22
Downside
• New source comes online with information not covered by
unified schema
• Have to make changes to all mappings to ensure accurate
coverage
23
New Approach: Pivoting
• Marking up a single source, versus pushing all to a single
schema
• Use a pivot instead to situate metadata for integration
• Essentially means that the interface does the heavy lifting of
integration
• Reduced effort by domain experts
24
Interface Video
25
Interface Video
• Find a composer
• See all copyists of their manuscripts
• Choose a copyist and see which other composers that
copyist has worked on
26
27
Thank you
http://ecs.soton.ac.uk/projects/musicspace
[email protected]