Finding Musical Information - Indiana University Computer Science

Download Report

Transcript Finding Musical Information - Indiana University Computer Science

Finding Musical Information
Donald Byrd
School of Informatics & Jacobs School of Music
Indiana University
5 April 2008
Copyright © 2006-08, Donald Byrd
1
Review: Basic Representations of Music & Audio
Audio
Time-stamped Events
Music Notation
Common examples
CD, MP3 file
Standard MIDI File
Sheet music
Unit
Sample
Event
Note, clef, lyric, etc.
Explicit structure
none
little (partial voicing
information)
much (complete
voicing information)
Avg. rel. storage
2000
1
10
Convert to left
-
easy
OK job: easy
Convert to right
1 note: pretty easy
OK job: fairly hard
other: hard or very hard
-
Ideal for
music
bird/animal sounds
sound effects
speech
music
music
27 Jan.
2
Review: Basic & Specific Representations vs. Encodings
Basic and Specific Representations (above the line)
Audio
Waveform
Time-stamped Events
Time-stamped MIDI
SMF
.WAV
Red Book (CD)
Csound score
Music Notation
Gamelan not.
Time-stamped expMIDI
Csound score
Notelist
expMIDI File
Tablature
CMN
Mensural not.
MusicXML
Finale
ETF
Encodings (below the line)
rev. 15 Feb.
3
Ways of Finding Music (1)
• How can you identify information/music you’re interested in?
–
–
–
–
You know some of it
You know something about it
“Someone else” knows something about your tastes
=> Content, Metadata, and “Collaboration”
• Metadata
– “Data about data”: information about a thing, not thing itself (or part)
– Includes the standard library idea bibliographic information, plus
information about structure of the content
– Metadata is the traditional library way
– Also basis for iTunes, etc.: iTunes Music Library.xml
– Winamp, etc., use ID3 tags in MP3’s
• Content (as in content-based retrieval)
– The main thing we’ve talked about: cf. tasks in Music Similarity Scale
• Collaborative
– “People who bought this also bought…”
6 Mar. 06
4
Ways of Finding Music (2)
• Do you just want to find the music now, or do you
want to put in a “standing order”?
• => Searching and Filtering
• Searching: data stays the same; information need
changes
• Filtering: information need stays the same; data
changes
– Closely related to recommender systems
– Sometimes called “routing”
• Collaborative approach to identifying music makes
sense for filtering, but not for searching(?)
8 Mar. 06
5
Ways of Finding Music (3)
• Most combinations of searching/filtering and the three ways of
identifying desired music both make sense and seem useful
• Examples
Searching
Filtering
By content
Shazam,
NightingaleSearch,
Themefinder
FOAFing the Music,
Pandora, Last.fm
By metadata
iTunes, Amazon.com,
Variations2, etc. etc.
iTunes RSS feed
generator, FOAFing
the Music
Collaboratively
N/A(?)
Amazon.com
6 Mar. 08
6
Searching: Metadata (the old and new way) vs.
Content (in the middle)
• To librarians, “searching” means of metadata
– Has been around as long as library catalogs (c. 300 B.C.?)
• To IR experts, it means of content
– Only since advent of IR: started with experiments in 1950’s
• Ordinary people don’t distinguish
– Expert estimate: 50% of real-life information needs involve both
• The two approaches are slowly coming together
– Exx: Variations2 with MusArt VocalSearch; FOAFing the Music
– Metadata creating “games” (Van Ahn) promise to help a lot
– Need ways to manage both together
• Content-based was more relevant to this course in 2003
• Now, both are important
22 March 07
7
Audio-to-Audio Music “Retrieval” (1)
• “Shazam - just hit 2580 on your mobile phone and identify
music” (U.K. slogan in 2003)
• Query (music & voices):
• Match:
• Query (7 simultaneous music streams!):
– Includes Brahms & Ravel as well as pop
•
•
•
•
Avery Wang’s ISMIR 2003 paper
Example of audio fingerprinting
Uses combinatorial hashing
Other systems developed by Fraunenhofer, Phillips;
Audible Magic now leader for IPR applications
26 Mar. 07
8
Audio-to-Audio Music “Retrieval” (2)
• Fantastically impressive to many people
• Have they solved all the problems of music IR? No,
(almost) none!
• Reason: intended signal & match are identical => no time
warping, let alone higher-level problems
(perception/cognition)
• Cf. Wang’s original attitude (“this problem is impossible”)
to Chris Raphael’s (“the obvious thing”)
• Applications
– Consumer mobile recognition service
– Media monitoring (for royalties: ASCAP, BMI, etc.)
20 Mar. 06
9
A Similarity Spectrum for Content-Based Music IR
• Categories describe how similar to query the items to be
found are expected to be (from closest to most distant)
• Detailed audio characteristics in common
1. Same music, arrangement, performance venue, session,
performance, & recording
2. …
4. Same music, arrangement, performance venue; different session,
performance, recording
• No detailed audio characteristics in common
6. Same music, different arrangement; or different but closelyrelated music, e.g., conservative variations (Mozart, etc.), many
covers, minor revisions
7. Different & less closely-related music: freer variations (Brahms,
much jazz, etc.), wilder covers, extensive revisions
8. Music in same genre, style, etc.
9. Music influenced by other music
27 Mar. 07
10
Searching vs. Browsing
• What’s the difference? What is browsing?
–
–
–
–
Witten’s Managing Gigabytes (1999) has no index entry for either
Lesk’s Practical Digital Libraries (1997) does, but no definition
Clearcut examples of browsing: in a book; in a library
In browsing, user finds everything; the computer just helps
• Browsing is obviously good because it gives user control =>
reduce luck, but few systems emphasize (or offer!) it. Why?
– “Users are not likely to be pleasantly surprised to find that the library
has something but that it has to be obtained in a slow or inconvenient
way. Nearly all items will come from a search, and we do not know
well how to browse in a remote library.” (Lesk, p. 163)
• OK, but for “and”, read “as long as”!
• Searching more natural on computer, browsing in real world
– Effective browsing takes very fast computers—widely available now
– Effective browsing has subtle UI demands
22 Mar. 06
11
Review: How People Find Information
Query
Database
understanding
understanding
Database
concepts
Query
concepts
matching
Results
12
Review: How Computers Find Information
Query
Database
Stemming, stopping,
query expansion, etc.
(no und ersta ndin g)
(no und ersta ndin g)
matching
Results
• In browsing, a person is really doing all the finding
• => diagram is (computer) searching, not browsing!
13
Music IR as Music Understanding
• Dannenberg (ISMIR 2001 invited paper)
• Argues central problem of music IR is music understanding
• …also basis for much of computer music (composition & sound
synthesis) and music perception and cognition
– “A key problem in many fields is the understanding and application of
human musical thought and processing”
• Related problems he’s worked on
– Computer accompaniment (became Coda’s Vivace)
• Score following
• Ensemble accompaniment
– Improvisational style classification
• DAB: No understanding yet; sidestep intractable problems!
14 April
14
Content-based Retrieval Systems: Exact Match
• Exact match (also called Boolean) searching
– Query terms combined with connectives “AND”, “OR”, “NOT”
– Add AND terms => narrower search; add OR terms => broader
– “dog OR galaxy” would find lots of documents; “dog AND
galaxy” not many
– Documents retrieved are those that exactly satisfy conditions
• Complex example: describe material on IR
– “(text OR data OR image OR music) AND (compression OR
decompression) AND (archiving OR retrieval OR searching)”
• Older method, designed for (and liked by) professional
searchers: librarians, intelligence analysts
• Databases: Lockheed DIALOG, Lexis/Nexis, etc.
• Still standard in OPACs: IUCAT, etc.
• …and now (again) in web-search systems (not “engines”!)
• Connectives can be implied => AND (usually)
22 March 06
15
Content-based Retrieval Systems: Best Match
• “Good” Boolean queries difficult to construct, especially
with large databases
– Problem is vocabulary mismatch: synonyms, etc.
– Boston Globe’s “elderly black Americans” example
• New approach: best match searching
– Query terms just strung together
– Add terms => broader & differently-focused search
– “dog galaxy”
• Complex example: describe material on text IR
– “text data image music compression decompression archiving
retrieval searching”
• Strongly preferred by end users, until Google
• Most web-search systems(not “engines”!) before Google
22 March 06
16
Luck in Searching (1)
• Jamie Callan showed friends (ca. 1997) how easy it was to
search Web for info on his family
– No synonyms for family names => few false negatives (recall is
very good)
– Callan is a very unusual name => few false positives (precision is
great)
– But Byrd (for my family) gets lots of false positives
– So does “Donald Byrd” …and “Donald Byrd” music, and
“Donald Byrd” computer
• The jazz trumpeter is famous; I’m not
• Some information needs are easy to satisfy; some very
similar ones are difficult
• Conclusion: luck is a big factor
21 Mar. 06
17
Luck in Searching (2)
• Another real-life example: find information on…
– Book weights (product for holding books open)
• Query (AltaVista, ca. 1999): '"book weights"’ got 60 hits, none relevant.
Examples:
1. HOW MUCH WILL MY BOOK WEIGH ? Calculating Approximate Book
weight...
2. [A book ad] ...
No. of Pages: 372, Paperback
Approx. Book Weight: 24oz.
7. "My personal favorite...is the college sports medicine text book Weight Training: A
scientific Approach..."
• Query (Google, 2006): '"book weights"’ got 783 hits; 6 of 1st 10 relevant.
• => With text, luck is not nearly as big a factor as it was
• Relevant because music metadata is usually text
• With music, luck is undoubtedly still a big factor
– Probable reason: IR technology crude compared to Google
– Certain reason: databases (content limited; metadata poor quality)
22 Mar. 06
18
Nightingale Search Dialog
19
NightingaleSearch: Overview
• Dialog options (too many for real users!) in groups
• Main groups:
 Match pitch (via MIDI note number)
 Match duration (notated, ignoring tuplets)
 In chords, consider...
• Search for Notes/Rests searches score in front window:
one-at-a-time (find next) or “batch” (find all)
• Search in Files versions: (1) search all Nightingale scores
in a given folder, (2) search a database in our own format
• Does passage-level retrieval
• Result list displayed in scrolling-text window; “hot linked”
via double-click to documents
26 Feb. 06
20
Bach: “St. Anne” Fugue, with Search Pattern
26 Feb.
21
IR Evaluation: Precision and Recall (1)
• Precision: number of relevant documents retrieved,
divided by the total number of documents retrieved.
– The higher the better; 1.0 is a perfect score.
– Example: 6 of 10 retrieved documents relevant; precision = 0.6
– Related concept: “false positives”: all retrieved documents that are
not relevant are false positives.
• Recall: number of relevant documents retrieved, divided
by the total number of relevant documents.
– The higher the better; 1.0 is a perfect score.
– Example: 6 relevant documents retrieved of 20; precision = 0.3
– Related concept: “false negatives”: all relevant documents that are
not retrieved are false negatives.
• Fundamental to all IR, including text and music
• Applies to passage- as well as document-level retrieval
24 Feb.
22
Result Lists for Search of the “St. Anne” Fugue
Exact Match (pitch tolerance = 0, match durations)
1: BachStAnne_65: m.1 (Exposition 1), voice 3 of Manual
2: BachStAnne_65: m.7 (Exposition 1), voice 1 of Manual
3: BachStAnne_65: m.14 (Exposition 1), voice 1 of Pedal
4: BachStAnne_65: m.22 (Episode 1), voice 2 of Manual
5: BachStAnne_65: m.31 (Episode 1), voice 1 of Pedal
Best Match (pitch tolerance = 2, match durations)
1: BachStAnne_65: m.1 (Exposition 1), voice 3 of Manual, err=p0 (100%)
2: BachStAnne_65: m.7 (Exposition 1), voice 1 of Manual, err=p0 (100%)
3: BachStAnne_65: m.14 (Exposition 1), voice 1 of Pedal, err=p0 (100%)
4: BachStAnne_65: m.22 (Episode 1), voice 2 of Manual, err=p0 (100%)
5: BachStAnne_65: m.31 (Episode 1), voice 1 of Pedal, err=p0 (100%)
6: BachStAnne_65: m.26 (Episode 1), voice 1 of Manual, err=p2 (85%)
7: BachStAnne_65: m.3 (Exposition 1), voice 2 of Manual, err=p6 (54%)
8: BachStAnne_65: m.9 (Exposition 1), voice 4 of Manual, err=p6 (54%)
26 Feb.
23
Precision and Recall with a Fugue Subject
• “St. Anne” Fugue has 8 occurrences of subject
– 5 are real (exact), 3 tonal (slightly modified)
• Exact-match search for pitch and duration finds 5
passages, all relevant => precision 5/5 = 1.0, recall 5/8 =
.625
• Best-match search for pitch (tolerance 2) and exact-match
for duration finds all 8 => precision and recall both 1.0
– Perfect results, but why possible with such a simple technique?
– Luck!
• Exact-match search for pitch and ignore duration finds 10,
5 relevant => precision 5/10 = .5, recall 5/8 = .625
20 Mar. 06
24
IR Evaluation: Precision and Recall (2)
• Precision and Recall apply to any Boolean (yes/no, etc.) classification
• Precision = avoiding false positives; recall = avoiding false negatives
• Venn diagram of relevant vs. retrieved documents
Retrieved
Relevant
1
2
3
1: relevant, not retrieved
2: relevant, retrieved
4
3: not relevant, retrieved
4: not relevant, not retrieved
20 Mar. 06
25
Precision and Recall (3)
• In text, what we want is concepts; but what we have is
words
• Morris Hirsch observed (personal communication, 1996):
– If you use any text search system, you will soon encounter two
language-related problems: (1) low recall: multiple words are used
for the same meaning, causing you to miss documents that are of
interest; (2) low precision: the same word is used for multiple
meanings, causing you to find documents that are not of interest.
• Precision = avoid false positives; recall = avoid false
negatives
• In music, we want musical ideas; but have notes (etc.),
not even words!
3 April 07
26
What’s Wrong with NightingaleSearch?
Obvious problems
• Too many dialog options for real
users
• Too slow; needs “indexing”
• Those are serious problems, but even worse: it does string matching
•…the wrong idea for music! Cf. “Earth Mover’s Distance”
• a limited kind of string matching: only operation is substitution
rev. 3 Apr 08
27
String vs. Geometric Matching
• Problem: compute similarity of items A & B (e.g., query & document)
• For string matching, usually done via edit distance
– Usually Levenshtein distance: total “cost” of inserts, deletes, & substitutes
to transform A into B
– Implementation detail: via dynamic programming
– Inherently one-dimensional
– For application to music, see Mongeau & Sankoff (1990)
• Alternative: geometric (“point-set”) matching
– Ordinarily two-dimensional, but can be more
– Typke’s Earth Mover’s Distance: weighted point set
– See Typke (2007), Clifford et al (2006)
• In general, geometric is much better for music
– Convincing ex. (from Clifford et al)
rev. 3 Apr 08
28
NightingaleSearch & Extra Notes (Problem 2)
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Theme & Var. 1
29
Nightingale and Independent Voices (Problem 2)
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Variation 2
30
2-D Pattern Matching in JMS and Extra Notes
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Theme & Var. 1
31
2-D Pattern Matching in JMS and Parallel Voices
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Variation 2
32
Relevance, Queries, and Information Needs
•
•
•
Information need: information a user wants or needs.
To convey this to an IR system of whatever kind, must be expressed as a
query, but information need is abstract
Relevance
–
–
–
–
Strict definition: relevant document (or passage) helps satisfy a user’s query
Pertinent document helps satisfy information need
Relevant documents may not be pertinent, and vice-versa
Looser definition: relevant document helps satisfy information need. Relevant
documents make user happy; irrelevant ones don’t
– Aboutness: related to concepts and meaning
•
OK, but what does “relevance” mean in music?
– In text, relates to concepts expressed by words in query
– Jeremy Pickens (2001): maybe “evocativeness”
rev. 3 March 06
33
Precision and Recall (4)
• Depend on relevance judgments
• Difficult to measure in real-world situations
• Precision in real world (ranking systems)
– Cutoff, r precision
• Recall in real world: no easy way to compute
– Collection may not be well-defined
– Even if it is, huge practical problem for large collections
– Worst case: the World Wide Web
• Too bad, since it’s the most important case!
rev. 26 Feb. 06
34
Foote: ARTHUR
• Retrieving Orchestral Music by Long-Term Structure
• Example of similarity type 3 (same music, arrangement;
different performance, recording)
• Based on analysis of audio waveform; does not rely on
symbolic or MIDI representations
– Better for situations of most similarity
– Avoids intractable “convert to right” (infer structure) problem with
audio of many notes at once
– Uses loudness variation => not much use for pop music
• Evaluation via r precision
– Performance very impressive
– ….except he tested with minuscule databases (<100 documents)!
– Very common research situation; => question “does it scale?”
26 Feb.
35
OMRAS Audio-degraded Music IR Experiments (1)
• First work on polyphonic music in both audio & symbolic form
• Started with recording of 24 preludes and fugues by Bach
• Colleagues in London did polyphonic music recognition
• Audio -> events
• Results vary from excellent to just recognizable
• One of worst-sounding cases is Prelude in G Major from the
Well-Tempered Clavier, Book I
• Before (original audio recording)
• After (audio -> MIDI -> audio)
24 Mar. 06
36
OMRAS Audio-degraded Music IR Experiments (2)
• Jeremy Pickens (UMass) converted results to MIDI file
and used as queries against database of c. 3000 pieces in
MIDI form
– Method: Markov models with probabilistic harmonic distributions
on 24 triads
– Significantly better results than harmonic reductions
– Pickens et al. (2002), “Polyphonic Score Retrieval Using
Polyphonic Audio Queries: A Harmonic Modeling Approach”
•
•
•
•
Outcome for “worst” case: the actual piece was ranked 1st!
Average outcome: actual piece ranked c. 2nd
Experiment 1: Known Item
Experiment 2: Variations
24 Mar. 06
37
OMRAS Audio-degraded Music IR Experiments (3)
Ranked
List
7 April
38
OMRAS Audio-degraded Music IR Experiments (4)
•
•
•
•
Extends “query by humming” into polyphonic realm
More accurately: “query by audio example”
“TLF” sets of variations (Twinkles, Lachrimaes, Folias)
Features
– First to use polyphonic audio queries to retrieve from polyphonic
symbolic collections
– Use audio query to retrieve known-item symbolic piece
– Also retrieve entire set of real-world composed variations (“TLF”
sets) on a piece, also in symbolic form
• Uses high-level harmonic representation of music derived
from audio
– Method: Markov models w/ probabilistic harmonic distributions
on 24 triads
– Significantly better results than harmonic reductions
– See Pickens et al (2002)
7 April 06
39
Musical Ideas and “Relevance” (1)
• What is a musical idea?
• Dictionary definition (American Heritage)
– Idea (music): a theme or motif
• Don’s definition
– Ex: theme or part of one, distinctive rhythm pattern, timbre (e.g., in
electronic music; cf. Schoenberg Op. 16 no. 3), etc.; = “hook”?
– Ex: horn call in Le Sacre vs. The Wooden Prince
– Music based on musical ideas as essays are on verbal ideas
– Closely related to "query concepts" and "database concepts" in diagram on
slide “How People Find Information”
– If someone might want to find music with this in it, this is a musical idea
– Music retrieved has is relevant if and only if it has this
• See Belkin (2006), “On Musical Ideas”
• Musical ideas in Our Chosen Music
4 April 08
40
Musical Ideas and “Relevance” (2)
• Musical ideas in common between different
versions of the same music
• “Twinkle, Twinkle”
– Mozart Variations: original (piano, classical)
– Swingle Singers (jazz choral)
• Bartok: Allegro Barbaro
– Original (piano, classical)
– Emerson, Lake, and Palmer “The Barbarian” (rock)
• Star-Spangled Banner
– Piano arrangement
– Hendrix/Woodstock: Taps?; improvisations
– Don Byrd “singable” versions
• Hurt
– Nine Inch Nails original & live versions
– Johnny Cash
24 March 06
41
Musical Ideas and “Relevance” (3)
• Relationship to Similarity Scale categories
• Known Item
• Relevance judgments are essential for evaluation
(precision, recall, etc.)
• Best is judgments by humans, and same people who made
queries (what TREC does)
• Known Items according to Foote, etc.
22 March 06
42
More on IR Evaluation: The Cranfield Model and
TREC
• In text IR, standard evaluation method is Cranfield Model
– From early days of text IR (Cleverdon 1967)
– Requires three elements:
• Database(s)
• Information needs suitable for the database(s)
• Relevance judgments for information needs vs. database(s)
• In text IR, standard is TREC (Text REtrieval Conferences)
– Sponsored by NIST and other government agencies
– Judgments and queries by same person (intelligence analysts)
• In music IR, we’re getting there with MIREX
– Cf. Voorhees (2002), Whither Music IR Evaluation Infrastructure
– Cranfield method is promising—but need databases, information
needs, relevance judgments!
27 March 06
43
A Typical TREC Information Need and Query
• <num> Number: 094
• <title> Topic: Computer-aided Crime
• <desc> Description: Document must identify a crime
perpetrated with the aid of a computer.
• <narr> Narrative: To be relevant, a document must
describe an illegal activity which was carried out
with the aid of a computer, either used as a planning
tool, such as in target research; or used in the
conduct of the crime, such as by illegally gaining
access to someone else’s computer files. A document
is NOT relevant if it merely mentions the illegal
spread of a computer virus or worm. However,a
document WOULD be relevant if the computer virus/worm
were used in conjunction with another crime, such as
extortion.
4 April
44
TREC Relevance Judgments
The first few lines of TREC 1993-94 vol. 12 relevance judgments on
FR (Federal Record) for queries 51ff: query no., document ID, 1/0
51
56
68
68
68
74
74
74
74
74
74
74
74
74
74
74
74
74
75
FR89607-0095 1
FR89412-0104 1
FR89629-0005 1
FR89712-0022 1
FR89713-0072 1
FR891127-0013 1
FR89124-0002 1
FR89124-0043 1
FR89309-0019 1
FR89503-0012 1
FR89522-0039 1
FR89523-0034 1
FR89602-0122 1
FR89613-0036 1
FR89621-0034 1
FR89621-0035 1
FR89703-0002 1
FR89929-0029 1
FR89105-0066 1
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
76
7 April
FR891107-0050 1
FR891124-0112 1
FR891128-0102 1
FR89119-0003 1
FR89217-0143 1
FR89322-0018 1
FR89502-0032 1
FR89508-0020 1
FR89508-0026 1
FR89510-0121 1
FR89510-0125 1
FR89605-0106 1
FR89714-0100 1
FR89804-0002 1
FR89804-0017 1
FR89807-0097 1
FR89815-0072 1
FR89821-0056 1
FR891025-0107 1
45
What if you don’t have relevance judgments?
• My list of candidate databases says “Uitdenbogerd &
Zobel collection has the only existing set of human
relevance judgments [for music] I know of, but the
judgments are not at all extensive.”
• For known-item searches (e.g., Downie, Pickens
monophonic studies), can assume the item is relevant, all
other documents irrelevant, but…
• What if collection includes related documents (e.g., Foote:
ARTHUR, OMRAS sets of variations)?
• Cf. “Similarity Scale”
7 April
46
Music IR Evaluation: MIREX, etc.
• Led by Stephen Downie (Univ. of Illinois)
• MIREX = Music IR Evaluation eXchange
• MIREX 2005 had 7 audio & 3 symbolic tracks
– Audio: artist identification, drum detection,genre
classification, key detection, ...
– Symbolic: genre classification, key detection...
• First two TRECs had only two tracks each!
47
Data Quality in the Real World
• Real World => very large databases, updated frequently
• => not high quality data, no manual massaging
– Music-ir list discussion (2001) included Dunning’s explanation of
why extensive (or any!) manual massaging is out of the question in
many situations
– “We [MusicMatch] have 1/8 to 1/4 full time equivalent budgeted to
support roughly 15 million users who listen to music that [never]
has and never will have sufficient attention paid to it to allow
careful attention by taxonomists.”
• Applies to content as well as metadata
– JHU/Levy project approach to content-based searching: do noteby-note matching, but assume music marked up with “important”
notes identified
– Doubtful this is viable in many situations!
5 March
48
Case Study: OMRAS 1
• OMRAS: Online Music Recognition and Searching
– Details at www.omras.org
• Support from Digital Libraries Initiative, Phase 2
• Originally project of UMass and Kings College London
– Added IU (Don Byrd) and City University (Tim Crawford)
• Goal: search realistic databases in all three representations
• Original research software: Nightingale and JMS
– True polyphonic search, i.e., search polyphonic music for
polyphonic pattern (JMS)
– Full GUI for complex music notation (Nightingale Search)
– Modular architecture: plan to let users mix and match
• Also investigating Z39.50 for searching
7 April
49
Case Study: OMRAS 2
• Goal: search realistic databases in all three representations
• Research databases
–
–
–
–
Monophonic: MELDEX
Polyphonic: CCARH
“TLF” sets of variations (Twinkles, Lachrimaes, Folias)
Allow more complex simulated relevance judgments
7 April
50
Music IR in the Real World 2: Efficiency
• Real World => very large databases, updated frequently
• => efficiency is vital
– Typical search time for MELDEX with 10,000 folksongs (late
90’s): 10-20 sec.
• Requires avoiding sequential searching
– applies to everything: text, images, all representations of music
• Standard solution: indexing via “inverted lists”
– Like index of a book
– Design goal for Infoseek’s UltraSeek w/ many millions of text
documents (late 90’s): .001 sec.
– Infoseek (indexing) is tens of 1000’s of times faster than
MELDEX (sequential searching)
– On a useful-size collection, this is typical
– Cf. 1897 Sears Catalog: “if you don’t find it in the index, look very
carefully through the entire catalogue.” (quoted by Knuth)
rev. 3 April 07
51
Hofstadter on indexing (and “aboutness”) in text
• In Le Ton Beau de Marot:
– “My feeling is that only the author (and certainly not a computer
program) can do this job well. Only the author, looking at a given
page, sees all the way to the bottom of the pool of ideas of which
the words are the mere surface, and only the author can answer the
question, ‘What am I really talking about here, in this paragraph,
this page, this section, this chapter?’”
• We want concepts, but what we have is words
• => Indexing well is beyond computers
• …but (to search real-world document collections) we have
no choice
rev. 3 April 07
52
Efficiency in Simple Music Searches
• With monophonic music, matching one parameter at a
time, indexing not too hard
• Manual version: Barlow and Morgenstern’s index
– Over 100 pages; gives only pitch classes, completely ignores
octaves (and therefore melodic direction)
– Ignores duration and everything else
– Melodic confounds and the “Ode to Joy” problem
• Indexing requires segmentation into units indexed
– Natural units (e.g., words) are great if you can identify them!
– Byrd & Crawford (2002): segmentation of music is very difficult
– If you can’t, artificial units (e.g., n-grams) are better than nothing
• Downie (1999) adapted standard text-IR system (with
indexing) to music, using n-grams as words
– Results with 10,000 folksongs were quite good
– But 10,000 monophonic songs is not a lot of music...
– And polyphony?
rev. 11 March
53
Example: Indexing Monophonic Music
• Text index entry (words): “Music: 3, 17, 142”
• Text index entry (character 3-grams): “usi: 3, 14, 17, 44, 56, 142, 151”
Kern and Fields: The Way You Look Tonight
-7
18
Pitch:
Duration:
•
•
•
•
•
H
+2
27
H
+2
27
E
+1
26
E
-1
24
E
-2
23
E
+2
27
H
E
Cf. Downie (1999) and Pickens (2000)
Assume above song is no. 99
Music index entry (pitch 1-grams): “18: 38, 45, 67, 71, 99, 132, 166”
Music index entry (pitch 2-grams): “1827: 38, 99, 132”
Music index entry (duration 2-grams): “HH: 67, 99”
11 March
54
Efficiency in More Complex Music Searches (1)
• More than one parameter at a time (pitch and duration is
obvious combination)
– For best-match searching, indexing still no problem
– For exact match searching, makes indexing harder
• Polyphony makes indexing much harder
– Byrd & Crawford (2002): “Downie speculates that ‘polyphony will
prove to be the most intractable problem [in music IR].’ We would
[say] polyphony will prove to be the source of the most intractable
problems.”
• Polyphony and multiple parameters is particularly nasty
– Techniques required are quite different from text
– First published research less than 10 years ago
• Indexing polyphonic music discussed
– speculatively by Crawford & Byrd (1997)
– in implementation by Doraisamy & Rüger (2001)
• Used n-grams for pitch alone; duration alone; both together
21 Mar. 06
55
Efficiency in More Complex Music Searches (2)
• Alternative to indexing: signature files
• Signature is a string of bits that “summarizes” document (or
passage)
• For text IR, inferior to inverted lists in nearly all real-world
situations (Witten et al., 1999)
• For music IR, tradeoffs can be very different
• Audio fingerprinting systems (at least some) use signatures
– Special case: always a known item search
• No other research yet on signatures for music (as far as I
know)
rev. 12 March
56
Downie’s View of Music IR: Facets 1
• Downie (2003): “Music information retrieval” survey
• Downie is a library scientist
• “Facets of Music Information: The Multifaceted
Challenge”
1. Pitch: includes key
2. Temporal: tempo, meter, duration, accents => rhythm
3. Harmonic
4. Timbral
But “Orchestration [is] sometimes considered bibliographic”
5. Editorial: performance instructions, including dynamics
6. Textual: lyrics and libretti
7. Bibliographic: title, composer, editor, publisher, dates, etc.
Only facet that is not from content, but about it = metadata
12 March
57
Note Parameters (Review)
• Four basic parameters of a definite-pitched musical note
1. pitch: how high or low the sound is: perceptual analog of
frequency
2. duration: how long the note lasts
3. loudness: perceptual analog of amplitude
4. timbre or tone quality
– Above is decreasing order of importance for most Western music
– Also (more-or-less) decreasing order of explicitness in CMN
12 March
58
Downie’s View of Music IR: Facets 2
• Cf. “Classification: Surgeon General’s Warning”
• Downie’s facets compared to “Four basic parameters”
1. Pitch
2. Temporal
3. Harmonic
4. Timbral
5. Editorial
6. Textual
7. Bibliographic
1. Pitches in “sequence”
2. Durations in “sequence”
1. Pitches simultaneously
4. Timbre
3. Loudness—and timbre, duration (, pitch?)
(none)
(none)
12 March
59
Downie’s View of Music IR: Other “Multi”s
• The Multirepresentational Challenge
– Related to conversion among basic representations
– Problems aggravated by Intellectual Property Rights (IPR) issues
• The Multicultural Challenge
– Vast majority of music-IR work deals with Western CP music
• The Multiexperiential Challenge
– Questions about user groups/priorities, similarity, relevance, etc.
• The Multidisciplinary Challenge
– Music IR involves audio engineering, musicology, computer
science, librarianship, etc.
12 March
60
Downie’s View of Music IR: Types of Systems
• Representational Completeness and Music-IR Systems
– Degree of representational completeness = no. of facets: depth
– Number of works in database: breadth
• Analytic/Production Music-IR Systems
– More depth, less breadth
– Examples: Humdrum, ESAC/Essen (source of MELDEX data)
• Locating Music-IR Systems
– Less depth, more breadth
– Examples: Barlow & Morgenstern, Parsons (1 facet), Themefinder,
RISM
12 March
61
Intellectual Property Rights (IPR) 1
• IPR is huge problem for nearly all music information
technology including IR, both research and ordinary use
– No one knows the answers! Different in different countries!
– Cf. Levering (2000) for U.S. situation
• For music, U.S. copyright is complex “bundle of rights”
– mechanical right: right to use work in commercial recordings,
ROMs, online delivery to public for private use
– synchronization right: right to use work in audio/visual works
including movies, TV programs, etc.
– More complex than for normal text works because performing art
• U.S. Constitution: balance rights of creators and public
– After some period of time, work enters Public Domain
– Period of time has been getting longer and longer
26 March, rev. 15 April
62
Intellectual Property Rights (IPR) 2
• Law supposed to balance rights of creators & public, but…
– “To achieve these conflicting goals and serve the public interest requires a
delicate balance between the exclusive rights of authors and the long-term
needs of a knowledgeable society.” —Levering
– Sonny Bono Copyright Extension Act: 70 years after death!
– Digital Millenium Copyright Act (DMCA), etc.
• “Fair Use”: limit on exclusive rights of copyright owners
– Traditionally used for brief excerpts for reviews, etc.
– Helpful, but not well-defined. In U.S., four tests:
–
–
–
–
1.
2.
3.
4.
Purpose and character of use, including if commercial or nonprofit
Nature of copyrighted work
Amount and substantiality of portion used relative to work as a whole
Effect of use on potential market for or value of copyrighted work
• Other aspects of law
– Educational exemptions
26 March, rev. 15 April
63
Intellectual Property Rights (IPR) 3
• IPR in practice
–
–
–
–
NB I’m not a lawyer!
Mp3.com
Napster, Gnutella, FreeNet
Church choir director arranged, performed in church, donated to
publisher => sued
• Example: Student wants to quote brief excerpts from
Beethoven piano sonatas in a term paper, in notation
• Do they need permission from owner?
– NB I’m not a lawyer!
– Beethoven has been dead for more than 70 years => all works in
Public Domain
– …but not all editions!
– Still, don’t need permission because Fair Use applies
– For recording, probably not P.D., but Fair Use applies
26 March, rev. 15 April
64
Building Symbolic Music Collections
– Direct encoding may be best
•
•
•
•
Most or all existing collections done this way
But in what representation?
No standard => often have to convert
Starting with OMR and polishing may be as good, and faster
– Optical Music Recognition (OMR)
•
•
•
•
First commercially available via Nightingale’s NoteScan
Fairly widespread, e.g., in Finale; SharpEye => MusicXML
Reasonably useful but not as reliable as OCR
As technology improves, likely to get more reliable
– Audio Music Recognition (AMR): a great idea, but...
• Christopher Raphael (1999): AMR is “orders of magnitude
more difficult” than OMR
• Gerd Castan (2003): “There is no such thing as a good
conversion from audio to MIDI. And not at all with a single
mouse click.”
26 March
65
OMR at Its Best
Here's the original:
Scanned into Finale: Only 5 easy edits needed.
Taken from http://www.codamusic.com/finale/scanning.asp
16 April
66
Music Collections: Current and Prospective 1
• Research-only vs. user collections
– IPR problem is serious even for research only!
• Terminology: collection vs. database; corpus = collection?
• Cf. my list of candidate test collections
• Symbolic: most interesting/important include CCARH,
MELDEX folksongs, Themefinder, Classical MIDI Arch.
– Commercial collections (e.g., Sunhawk) are dark horses
• Images (OMR => symbolic!): most interesting/important
for us include Variations2, JHU/Levy, CD Sheet Music
4 April
67
Music Collections: Current and Prospective 2
• Audio: full Naxos catalog via UIUC/NCSA project?
– 4000(?) CDs x 650 MB => several terabytes!
• Parallel corpora
• RWC databases: start from scratch => no IPR problems
– Nice idea, but very expensive—RWC is tiny
• Limitations & pitfalls: size, quality (cf. Huron), repertoire
4 April
68
Music Not Written in CMN by Dead European
Men of the Last Few Centuries 1
• Informal genre identification
– Try with c. 1 sec., 5-10 sec. (vs. Tzanetakis’ 250 msec.)
• “This is all about dead Europeans, and they’re great. But
we are not dead Europeans!” —David Alan Miller,
conductor of the Albany Symphony Orchestra, c. 1990
• How does content-based searching of other music (world
and other!) pose different problems from music in CMN by
Europeans of (say) 15th thru early 20th centuries?
• Possible solutions to those problems?
9 April
69
Music Not Written in CMN by Dead European
Men of the Last Few Centuries 2
• Examples were:
1. “After a Pygmy chant of Central Africa”, arr. by Marie Daulne,
words by Renaud Arnal: Mupepe [recorded by Zap Mama]. On
Adventures in Afropea 1 [CD].
2. Eminem: Without Me. On The Eminem Show [CD].
3. Hildegard von Bingen (12th century): O virga ac diadema
[recorded by Anima]. On Sacred Music of the Middle Ages [CD].
4. Guru Bandana [rec. by Asha Bhosle, Ali Akbar Khan, Swapan
Chaudhuri]. On Legacy: 16th-18th century music from India [CD].
5. Duke Ellington: Sophisticated Lady. On Duke Ellington - Greatest
Hits [CD].
6. Iannis Xenakis (1960): Orient-Occident. On Xenakis: Electronic
Music [CD].
7. Beatriz Ferreyra: Jazz’t for Miles. On Computer Music Journal
Sound Anthology, vol. 25 (2001) [CD].
9 April
70
Music Not Written in CMN by Dead European
Men of the Last Few Centuries 3
• How does content-based searching of other music (world
and other!) pose different problems from music in CMN by
Europeans of (say) 15th thru early 20th centuries?
–
–
–
–
Different textures
Emphasis on different parameters of notes
…if there are notes!
“Pieces” aren’t well-defined (improvisation, etc.)
• Possible solutions to those problems?
– Consider texture, e.g., oblique motion, pedal tones
– Consider text (words)…
– Or at least language of text
9 April
71