Main presentation title goes here.
Download
Report
Transcript Main presentation title goes here.
Using Pivots to Explore
Heterogeneous Collections
A Case Study in Musicology
Daniel Alexander Smith
8 December 2009
musicSpace
http://mspace.fm/projects/musicspace
• IAM Group, School of Electronics and Computer Science
• Music, School of Humanities
2
Outline
• How musicologists use data
• Limitations of existing approaches
• Our data extraction and integration methodology
• Interface walkthrough
3
musicSpace Tasks
• Triage data partners sources
• Extract information
• Map data sources to schemas/ontologies
• Produce interface over aggregated data
• Customise interface based on feedback
4
Data in Musicology
5
Musicologists consult many data sources
6
. . . but what if they could use just one?
7
Intractable research questions
• Which scribes have created manuscripts of a composer’s
works, and which other composers’ works have they
inscribed?
• Which poets have had their poems set to music by
Schubert, which of these musical settings were only
published posthumously, and where can I find recordings of
them?
• Which electroacoustic works were published within five
years of their premier?
8
Why they are intractable (1)
• Need to consult several sources
• Metadata from one source cannot be used to guide searches
of another source
• Solution: Integrate sources
9
Why they are intractable (2)
• They are multi-part queries, and need to be broken down
with results collated manually
• Requires pen and paper!
• Solution: Optimally interactive UI
10
Why they are intractable (3)
• Insufficient granualrity of metadata and/or search option
• Solution: Increase granularity
11
Metadata
Extraction
12
Previous work
• Comb-e-chem modelled Chemistry data
• We use similar approach
• Translated this work to the arts
• Musicology modelled using Semantic Web technologies
13
Musicology Data Sources
• Disparate data
• How to pull them together and view on demand
14
musicSpace Data Partners
British Library Music Collections
British Library Sound Archive
Cecilia
Copac
RISM (UK and Ireland)
Public
Commercial
Grove Music Online
Naxos Music Library
RILM
Future?
Alexander Street Press Music
Online
CHARM
‘Personal’ datasets
15
Data and Info Management problems
• Sources allow searching, but not over everything
• Data export (MARC typically) shows extra fields, e.g.
characters in opera, document types hidden amongst
metadata
• Sometimes viewable on original site, but not searchable
• Offering extracted metadata already a benefit with one
source
16
Grove Extraction Example
• More complicated, as Grove is a full text encyclopaedia
• Some digitisation via Grove Music Online
• Weak semantic metadata extraction
• Thus we performed some data entry
17
Grove Works Lists Source Data
18
Works List Metadata Tool
19
Data Integration
Integration
• Domain Expert + Technologist partnership
• This will be case for some time now
• Technology to best automate tasks to make domain expert’s
job less onerous
21
Metadata mapping
• Domain experts devise single schema
• Provide mappings of fields in a particular data source to
that unified schema
• Enables an interface across all sources
22
Downside
• New source comes online with information not covered by
unified schema
• Have to make changes to all mappings to ensure accurate
coverage
23
New Approach: Pivoting
• Marking up a single source, versus pushing all to a single
schema
• Use a pivot instead to situate metadata for integration
• Essentially means that the interface does the heavy lifting of
integration
• Reduced effort by domain experts
24
Interface Video
25
Interface Video
• Find a composer
• See all copyists of their manuscripts
• Choose a copyist and see which other composers that
copyist has worked on
26
27
Thank you
http://ecs.soton.ac.uk/projects/musicspace
[email protected]