Representation of Musical Information
Download
Report
Transcript Representation of Musical Information
Finding Musical Information
Donald Byrd
School of Music
Indiana University
31 March 2006
Copyright © 2006, Donald Byrd
1
Review: Basic Representations of Music & Audio
Audio
Time-stamped Events
Music Notation
Common examples
CD, MP3 file
Standard MIDI File
Sheet music
Unit
Sample
Event
Note, clef, lyric, etc.
Explicit structure
none
little (partial voicing
information)
much (complete
voicing information)
Avg. rel. storage
2000
1
10
Convert to left
-
easy
OK job: easy
Convert to right
1 note: pretty easy
OK job: fairly hard
other: hard or very hard
-
Ideal for
music
bird/animal sounds
sound effects
speech
music
music
27 Jan.
2
Review: Basic & Specific Representations vs. Encodings
Basic and Specific Representations (above the line)
Audio
Time-stamped Events
Waveform
Time-stamped MIDI
Csound score
Time-stamped expM IDI
.WAV
Red Book (CD)
SMF
Csound score
Music Notation
Gamelan not.
Notelist
expM IDI File
Tablature
CM N
M ensural not.
M usicXM L
Finale
ETF
Encodings (below the line)
rev. 15 Feb.
3
Ways of Finding Music (1)
• How can you identify information/music you’re interested in?
–
–
–
–
You know some of it
You know something about it
“Someone else” knows something about your tastes
=> Content, Metadata, and “Collaboration”
• Metadata
– “Data about data”: information about a thing, not thing itself (or part)
– Includes the standard library idea bibliographic information, plus
information about structure of the content
– Metadata is the traditional library way
– Also basis for iTunes, etc.: iTunes Music Library.xml
– Winamp, etc., use ID3 tags in MP3’s
• Content (as in content-based retrieval)
– The main thing we’ve talked about: cf. tasks in Music Similarity Scale
• Collaborative
– “People who bought this also bought…”
6 Mar. 06
4
Ways of Finding Music (2)
• Do you just want to find the music now, or do you
want to put in a “standing order”?
• => Searching and Filtering
• Searching: data stays the same; information need
changes
• Filtering: information need stays the same; data
changes
– Closely related to recommender systems
– Sometimes called “routing”
• Collaborative approach to identifying music makes
sense for filtering, but not for searching(?)
8 Mar. 06
5
Ways of Finding Music (3)
• Most combinations of searching/filtering and the three ways of
identifying desired music both make sense and seem useful
• Examples
Searching
Filtering
By content
Shazam,
NightingaleSearch,
Themefinder
FOAFing the Music,
Pandora
By metadata
iTunes, Amazon.com,
Variations2, etc. etc.
iTunes RSS feed
generator, FOAFing
the Music
Collaboratively
N/A(?)
Amazon.com
6 Mar. 06
6
Searching: Metadata (the old and new way) vs.
Content (in the middle)
• To librarians, “searching” means of metadata
– Has been around as long as library catalogs (c. 300 B.C.?)
• To IR experts, it means of content
– Only since advent of IR: started with experiments in 1950’s
• Ordinary people don’t distinguish
– Expert estimate: 50% of real-life information needs involve both
• The two approaches are slowly coming together
– Exx: Variations2 with MusArt VocalSearch; FOAFing the Music
– Need ways to manage both together
• Content-based was more relevant to this course in 2003
• Now, both are important
22 March 06
7
Audio-to-Audio Music “Retrieval” (1)
• “Shazam - just hit 2580 on your mobile phone and identify
music” (U.K. slogan in 2003)
• Query (music & voices):
• Match:
• Query (7 simultaneous music streams!):
• Avery Wang’s ISMIR 2003 paper
• Example of audio fingerprinting
• Uses combinatorial hashing
• Other systems developed by Fraunenhofer, Phillips
20 Mar. 06
8
Audio-to-Audio Music “Retrieval” (2)
• Fantastically impressive to many people
• Have they solved all the problems of music IR? No,
(almost) none!
• Reason: intended signal & match are identical => no time
warping, let alone higher-level problems
(perception/cognition)
• Cf. Wang’s original attitude (“this problem is impossible”)
to Chris Raphael’s
• Applications
– Consumer mobile recognition service
– Media monitoring (for royalties: ASCAP, BMI, etc.)
20 Mar. 06
9
A Similarity Spectrum for Content-Based Music IR
• “Relationships” describe what’s in common between two
items—audio recordings, scores, etc.—whose similarity is
to be evaluated (from closest to most distant)
– For material in notation form, categories (1) and (2) don’t apply:
it’s just “Same music, arrangement”
1. Same music, arrangement, performance, & recording
2. Same music, arrangement, performance; different recording
3. Same music, arrangement; different performance, recording
4. Same music, different arrangement; or different but closelyrelated music, e.g., conservative variations (Mozart, etc.), many
covers, minor revisions
5. Different & less closely-related music: freer variations (Brahms,
much jazz, etc.), wilder covers, extensive revisions
6. Music in same genre, style, etc.
7. Music influenced by other music
9. No similarity whatever
21 Mar. 06
10
Searching vs. Browsing
• What’s the difference? What is browsing?
–
–
–
–
Managing Gigabytes doesn’t have an index entry for either
Lesk’s Practical Digital Libraries (1997) does, but no definition
Clearcut examples of browsing: in a book; in a library
In browsing, user finds everything; the computer just helps
• Browsing is obviously good because it gives user control =>
reduce luck, but few systems emphasize (or offer!) it. Why?
– “Users are not likely to be pleasantly surprised to find that the library
has something but that it has to be obtained in a slow or inconvenient
way. Nearly all items will come from a search, and we do not know
well how to browse in a remote library.” (Lesk, p. 163)
• OK, but for “and”, read “as long as”!
• Searching more natural on computer, browsing in real world
– Effective browsing takes very fast computers—widely available now
– Effective browsing has subtle UI demands
22 Mar. 06
11
How People Find Information
Query
Database
understanding
understanding
Database
concepts
Query
concepts
matching
Results
12
How Computers Find Information
Query
Database
Stemming, stopping,
query expansion, etc.
(no understanding )
(no understanding )
matching
Results
• In browsing, a person is really doing all the finding
• => diagram is (computer) searching, not browsing!
13
Content-based Retrieval Systems: Exact Match
• Exact match (also called Boolean) searching
– Query terms combined with connectives “AND”, “OR”, “NOT”
– Add AND terms => narrower search; add OR terms => broader
– “dog OR galaxy” would find lots of documents; “dog AND
galaxy” not many
– Documents retrieved are those that exactly satisfy conditions
• Complex example: describe material on IR
– “(text OR data OR image OR music) AND (compression OR
decompression) AND (archiving OR retrieval OR searching)”
• Older method, designed for (and liked by) professional
searchers: librarians, intelligence analysts
• Databases: Lockheed DIALOG, Lexis/Nexis, etc.
• Still standard in OPACs: IUCAT, etc.
• …and now (again) in web-search systems (not “engines”!)
• Connectives can be implied => AND (usually)
22 March 06
14
Content-based Retrieval Systems: Best Match
• “Good” Boolean queries difficult to construct, especially
with large databases
– Problem is vocabulary mismatch: synonyms, etc.
– Boston Globe’s “elderly black Americans” example
• New approach: best match searching
– Query terms just strung together
– Add terms => broader & differently-focused search
– “dog galaxy”
• Complex example: describe material on text IR
– “text data image music compression decompression archiving
retrieval searching”
• Strongly preferred by end users, until Google
• Most web-search systems(not “engines”!) before Google
22 March 06
15
Luck in Searching (1)
• Jamie Callan showed friends (ca. 1997) how easy it was to
search Web for info on his family
– No synonyms for family names => few false negatives (recall is
very good)
– Callan is a very unusual name => few false positives (precision is
great)
– But Byrd (for my family) gets lots of false positives
– So does “Donald Byrd” …and “Donald Byrd” music, and
“Donald Byrd” computer
• The jazz trumpeter is famous; I’m not
• Some information needs are easy to satisfy; some very
similar ones are difficult
• Conclusion: luck is a big factor
21 Mar. 06
16
Luck in Searching (2)
• Another real-life example: find information on…
– Book weights (product for holding books open)
• Query (AltaVista, ca. 1999): '"book weights"’ got 60 hits, none relevant.
Examples:
1. HOW MUCH WILL MY BOOK WEIGH ? Calculating Approximate Book
weight...
2. [A book ad] ...
No. of Pages: 372, Paperback
Approx. Book Weight: 24oz.
7. "My personal favorite...is the college sports medicine text book Weight Training: A
scientific Approach..."
• Query (Google, 2006): '"book weights"’ got 783 hits; 6 of 1st 10 relevant.
• => With text, luck is not nearly as big a factor as it was
• Relevant because music metadata is usually text
• With music, luck is undoubtedly still a big factor
– Probable reason: IR technology crude compared to Google
– Certain reason: databases (content limited; metadata poor quality)
22 Mar. 06
17
IR Evaluation: Precision and Recall (1)
• Precision: number of relevant documents retrieved,
divided by the total number of documents retrieved.
– The higher the better; 1.0 is a perfect score.
– Example: 6 of 10 retrieved documents relevant; precision = 0.6
– Related concept: “false positives”: all retrieved documents that are
not relevant are false positives.
• Recall: number of relevant documents retrieved, divided
by the total number of relevant documents.
– The higher the better; 1.0 is a perfect score.
– Example: 6 relevant documents retrieved of 20; precision = 0.3
– Related concept: “false negatives”: all relevant documents that are
not retrieved are false negatives.
• Fundamental to all IR, including text and music
• Applies to passage- as well as document-level retrieval
24 Feb.
18
Nightingale Search Dialog
19
NightingaleSearch: Overview
• Dialog options (too many for real users!) in groups
• Main groups:
Match pitch (via MIDI note number)
Match duration (notated, ignoring tuplets)
In chords, consider...
• Search for Notes/Rests searches score in front window:
one-at-a-time (find next) or “batch” (find all)
• Search in Files versions: (1) search all Nightingale scores
in a given folder, (2) search a database in our own format
• Does passage-level retrieval
• Result list displayed in scrolling-text window; “hot linked”
via double-click to documents
20
Bach: “St. Anne” Fugue, with Search Pattern
26 Feb.
21
Result Lists for Search of the “St. Anne” Fugue
Exact Match (pitch tolerance = 0, match durations)
1: BachStAnne_65: m.1 (Exposition 1), voice 3 of Manual
2: BachStAnne_65: m.7 (Exposition 1), voice 1 of Manual
3: BachStAnne_65: m.14 (Exposition 1), voice 1 of Pedal
4: BachStAnne_65: m.22 (Episode 1), voice 2 of Manual
5: BachStAnne_65: m.31 (Episode 1), voice 1 of Pedal
Best Match (pitch tolerance = 2, match durations)
1: BachStAnne_65: m.1 (Exposition 1), voice 3 of Manual, err=p0 (100%)
2: BachStAnne_65: m.7 (Exposition 1), voice 1 of Manual, err=p0 (100%)
3: BachStAnne_65: m.14 (Exposition 1), voice 1 of Pedal, err=p0 (100%)
4: BachStAnne_65: m.22 (Episode 1), voice 2 of Manual, err=p0 (100%)
5: BachStAnne_65: m.31 (Episode 1), voice 1 of Pedal, err=p0 (100%)
6: BachStAnne_65: m.26 (Episode 1), voice 1 of Manual, err=p2 (85%)
7: BachStAnne_65: m.3 (Exposition 1), voice 2 of Manual, err=p6 (54%)
8: BachStAnne_65: m.9 (Exposition 1), voice 4 of Manual, err=p6 (54%)
26 Feb.
22
Precision and Recall with a Fugue Subject
• “St. Anne” Fugue has 8 occurrences of subject
– 5 are real (exact), 3 tonal (slightly modified)
• Exact-match search for pitch and duration finds 5
passages, all relevant => precision 5/5 = 1.0, recall 5/8 =
.625
• Best-match search for pitch (tolerance 2) and exact-match
for duration finds all 8 => precision and recall both 1.0
– Perfect results, but why possible with such a simple technique?
– Luck!
• Exact-match search for pitch and ignore duration finds 10,
5 relevant => precision 5/10 = .5, recall 5/8 = .625
20 Mar. 06
23
Precision and Recall (2)
• Precision and Recall apply to any Boolean (yes/no, etc.) classification
• Precision = avoiding false positives; recall = avoiding false negatives
• Venn diagram of relevant vs. retrieved documents
Retrieved
Relevant
1
2
3
1: relevant, not retrieved
2: relevant, retrieved
4
3: not relevant, retrieved
4: not relevant, not retrieved
20 Mar. 06
24
Precision and Recall (3)
• In text, what we want is concepts; but what we have is
words
• Comment by Morris Hirsch (1996)
– If you use any text search system, you will soon encounter two
language-related problems: (1) low recall: multiple words are used
for the same meaning, causing you to miss documents that are of
interest; (2) low precision: the same word is used for multiple
meanings, causing you to find documents that are not of interest.
• Precision = avoid false positives; recall = avoid false
negatives
• Music doesn’t have words, and it’s not clear if it has
concepts, but that doesn’t help :-)
20 March 06
25
Relevance, Queries, and Information Needs
• Information need: information a user wants or needs.
• To convey this to an IR system of whatever kind, must be
expressed as a query, but information need is abstract
• Relevance
– Strict definition: relevant document (or passage) helps satisfy a
user’s query
– Pertinent document helps satisfy information need
– Relevant documents may not be pertinent, and vice-versa
– Looser definition: relevant document helps satisfy information
need. Relevant documents make user happy; irrelevant ones don’t
– Aboutness: related to concepts and meaning
• OK, but what does “relevance” mean in music?
– In text, relates to concepts expressed by words in query
– Jeremy Pickens: evocativeness
rev. 3 March
26
Precision and Recall (4)
• Depend on relevance judgments
• Difficult to measure in real-world situations
• Precision in real world (ranking systems)
– Cutoff, r precision
• Recall in real world: no easy way to compute
– Collection may not be well-defined
– Even if it is, practical problems for large collections
– Worst case (too bad): the World Wide Web
rev. 26 Feb.
27
Foote: ARTHUR
• Retrieving Orchestral Music by Long-Term Structure
• Example of similarity type 3 (same music, arrangement;
different performance, recording)
• Based on analysis of audio waveform; does not rely on
symbolic or MIDI representations
– Better for situations of most similarity
– Avoids intractable “convert to right” (infer structure) problem with
audio of many notes at once
– Uses loudness variation => not much use for pop music
• Evaluation via r precision
– Performance very impressive
– ….except he tested with minuscule databases (<100 documents)!
– Very common research situation; => question “does it scale?”
26 Feb.
28
OMRAS Audio-degraded Music IR Experiments (1)
• First work on polyphonic music in both audio & symbolic form
• Started with recording of 24 preludes and fugues by Bach
• Colleagues in London did polyphonic music recognition
• Audio -> events
• Results vary from excellent to just recognizable
• One of worst-sounding cases is Prelude in G Major from the
Well-Tempered Clavier, Book I
• Before (original audio recording)
• After (audio -> MIDI -> audio)
24 Mar. 06
29
OMRAS Audio-degraded Music IR Experiments (2)
Ranked
List
7 April
30
OMRAS Audio-degraded Music IR Experiments (3)
• Jeremy Pickens (UMass) converted results to MIDI file
and used as queries against database of c. 3000 pieces in
MIDI form
– Method: Markov models with probabilistic harmonic distributions
on 24 triads
– Significantly better results than harmonic reductions
– Pickens et al. (2002), “Polyphonic Score Retrieval Using
Polyphonic Audio Queries: A Harmonic Modeling Approach”
•
•
•
•
Outcome for “worst” case: the actual piece was ranked 1st!
Average outcome: actual piece ranked c. 2nd
Experiment 1: Known Item
Experiment 2: Variations
24 Mar. 06
31
OMRAS Audio-degraded Music IR Experiments (4)
•
•
•
•
Extends “query by humming” into polyphonic realm
More accurately: “query by audio example”
“TLF” sets of variations (Twinkles, Lachrimaes, Folias)
Features
– First to use polyphonic audio queries to retrieve from polyphonic
symbolic collections
– Use audio query to retrieve known-item symbolic piece
– Also retrieve entire set of real-world composed variations (“TLF”
sets) on a piece, also in symbolic form
7 April
32
Musical Ideas and “Relevance” (1)
• What is a musical idea?
• Dictionary definition (American Heritage)
– Idea (music): a theme or motif
• Don’s definition
– Something like a theme or part of theme, distinctive rhythm pattern, or
(especially in electronic music) a timbre
– Music is based on musical ideas as articles are on verbal ideas
– Closely related to "query concepts" and "database concepts" in diagram on
slide “How People Find Information”
– If you can imagine someone wanting to find music that has this in it, this
is a musical idea
– Piece of music retrieved has is relevant if and only if it really has this
• Musical ideas in Our Favorite Music
24 March 06
33
Musical Ideas and “Relevance” (2)
• Musical ideas in common between different
versions of the same music
• “Twinkle, Twinkle”
– Mozart Variations: original (piano, classical)
– Swingle Singers (jazz choral)
• Bartok: Allegro Barbaro
– Original (piano, classical)
– Emerson, Lake, and Palmer “The Barbarian” (rock)
• Star-Spangled Banner
– Piano arrangement
– Hendrix/Woodstock: Taps?; improvisations
– Don Byrd “singable” versions
• Hurt
– Nine Inch Nails original & live versions
– Johnny Cash
24 March 06
34
Musical Ideas and “Relevance” (3)
• Relationship to Similarity Scale categories
• Known Item
• Relevance judgments are essential for evaluation
(precision, recall, etc.)
• Best is judgments by humans, and same people who made
queries (what TREC does)
• Known Items according to Foote, etc.
22 March 06
35
Hofstadter on indexing (and “aboutness”!) in text
• In Le Ton Beau de Marot:
• “My feeling is that only the author (and certainly not a computer
program) can do this job well. Only the author, looking at a given
page, sees all the way to the bottom of the pool of ideas of which the
words are the mere surface, and only the author can answer the
question, ‘What am I really talking about here, in this paragraph, this
page, this section, this chapter?’”
• “Aboutness” = underlying concepts; for music, musical ideas
• Really applies to any searching method, with real indexing or not
29 March 06
36
More on IR Evaluation: The Cranfield Model and
TREC
• In text IR, standard evaluation method is Cranfield Model
– From early days of text IR (Cleverdon 1967)
– Requires three elements:
• Collections(s)
• Information needs suitable for the collection(s)
• Relevance judgments for information needs vs. collection(s)
• In text IR, standard is TREC (Text REtrieval Conferences)
–
–
–
–
–
Sponsored by NIST and other government agencies
Judgments and queries by same person (intelligence analysts)
Large collections & many information needs
For 1st two years, 150 info needs & 742K docs = 111M combinations
Exhaustive relevance judgments impossible, so judge only top retrieved
• In music IR, we’re getting there with MIREX
– Cf. Voorhees (2002), Whither Music IR Evaluation Infrastructure
– Cranfield method is promising—but need databases, information needs,
relevance judgments!
29 March 06
37
A Typical TREC Information Need and Query
• <num> Number: 094
• <title> Topic: Computer-aided Crime
• <desc> Description: Document must identify a crime
perpetrated with the aid of a computer.
• <narr> Narrative: To be relevant, a document must
describe an illegal activity which was carried out
with the aid of a computer, either used as a planning
tool, such as in target research; or used in the
conduct of the crime, such as by illegally gaining
access to someone else’s computer files. A document
is NOT relevant if it merely mentions the illegal
spread of a computer virus or worm. However,a
document WOULD be relevant if the computer virus/worm
were used in conjunction with another crime, such as
extortion.
4 April
38
TREC Relevance Judgments
The first few lines of TREC 1993-94 vol. 12 relevance judgments on
FR (Federal Record) for queries 51ff: query no., document ID, 1/0
51
56
68
68
68
74
74
74
74
74
74
74
74
74
74
74
74
74
75
FR89607-0095 1
FR89412-0104 1
FR89629-0005 1
FR89712-0022 1
FR89713-0072 1
FR891127-0013 1
FR89124-0002 1
FR89124-0043 1
FR89309-0019 1
FR89503-0012 1
FR89522-0039 1
FR89523-0034 1
FR89602-0122 1
FR89613-0036 1
FR89621-0034 1
FR89621-0035 1
FR89703-0002 1
FR89929-0029 1
FR89105-0066 1
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
75
76
7 April
FR891107-0050 1
FR891124-0112 1
FR891128-0102 1
FR89119-0003 1
FR89217-0143 1
FR89322-0018 1
FR89502-0032 1
FR89508-0020 1
FR89508-0026 1
FR89510-0121 1
FR89510-0125 1
FR89605-0106 1
FR89714-0100 1
FR89804-0002 1
FR89804-0017 1
FR89807-0097 1
FR89815-0072 1
FR89821-0056 1
FR891025-0107 1
39
What if you don’t have any relevance
judgments?
• My list of candidate databases says “Uitdenbogerd &
Zobel collection has the only existing set of human
relevance judgments [for music] I know of, but the
judgments are not at all extensive.”
• For known-item searches (e.g., Downie, Pickens
monophonic studies), can assume the item is relevant, all
other documents irrelevant, but…
• What if collection includes related documents (e.g.,Foote’s
ARTHUR, OMRAS sets of variations)?
• Cf. “Similarity Scale”
7 April
40
Music IR Evaluation: MIREX, etc.
•
•
•
•
Led by Stephen Downie (Univ. of Illinois)
Has audio files for all of Naxos catalog, w/heavy security
MIREX = Music IR Evaluation eXchange, like TREC
MIREX 2005 had 7 audio & 3 symbolic tracks
– Symbolic (MIDI): genre classification, key detection...
• For genre classification: 9-way and 38-way tasks
• 13 participants, 60.7% to 82.3% mean accuracy
– Audio: artist identification, drum detection, genre classif., etc.
• For artist identification: 73-way task
• 7 participants, 26 to 72% accuracy
– Many questions about how meaningful results are
– Cf. http://www.music-ir.org/evaluation/mirex-results/
• First two TRECs had only two tracks each => MIREX
seems overambitious
29 March 06
41
Relevance & Classification; Judgments &
Groundtruth
• Deciding relevance is special case of classification
– Relevance is one category
– Classification is n-way
– Normally binary (yes/no) but can be a confidence value
• Other applications
– Medical: what condition is report describing?
– OMR: is symbol a notehead, accent mark, character, …?
• To evaluate, need groundtruth
– MIREX: “document 12485 is in genre B”
• Relevance judgments are special case of groundtruth
– TREC: “document FR89607-0095 is relevant to query 51”
– Pickens: “G-major Prelude, WTC Book I, is relevant to query 2”
29 March 06
42
Music IR in the Real World: Efficiency
• Real World => very large databases, updated frequently
• => efficiency is vital
– Typical search time for MELDEX with 10,000 folksongs (late
90’s): 10-20 sec.
• Requires an approach other than sequential searching
– applies to everything: text, images, all representations of music
• Standard solution: indexing via “inverted lists”
– Like index of a book
– Design goal for Infoseek’s UltraSeek with many millions of text
documents (late 90’s): .001 sec.
– Was successful: tens of thousands of times faster than MELDEX
– On a useful-size collection, this is typical!
– Cf. 1897 Sears Catalog: “if you don’t find it in the index, look very
carefully through the entire catalogue.” Sure, why not?!?
29 March 06
43
Finding Themes Manually: Indices, Catalogs,
Dictionaries
• Manual retrieval of themes via thematic catalogs, indices,
& dictionaries has been around for a long time
• Parsons’index
– Parsons, Denys (1975). The Directory of Tunes and Musical
Themes
– Uses contour only (Up, Down, Repeat)
• Barlow and Morgenstern’s catalog & index
– Barlow, Harold & Morgenstern, Sam (1948). A Dictionary of Musical
Themes
– Over 100 pages; gives only pitch classes, completely ignores
octaves (and => melodic direction)
– Ignores duration & everything else
– Cf. melodic confounds (e.g. “Ode to Joy” )
– They did another volume with vocal themes
29 March 06
44
Efficiency in Simple Music Searches
• With monophonic music, matching one parameter at a
time, indexing isn’t hard
• Like index of a book: says where to find every occurrence
• Barlow and Morgenstern’s and Parsons are different sense
of indexing
• Indexing requires segmentation into units
– Natural units (e.g., words) are great if you can identify them!
– Byrd & Crawford (2002): segmentation of music is very difficult
– Artificial units (e.g., n-grams) are better than nothing
• Downie (1999) adapted standard text-IR system (with
indexing) to music, using n-grams as units
– Results with 10,000 folksongs were good
– But 10,000 monophonic songs isn’t a lot of music...
– And polyphony?
29 March 06
45
Example: Indexing Monophonic Music
• Text index entry (words): “Music: 3, 17, 142”
• Text index entry (character 3-grams): “usi: 3, 14, 17, 44, 56, 142, 151”
Kern and Fields: The Way You Look Tonight
Pitch :
Du rati on : H
•
•
•
•
•
-7
18
+2
27
H
+2
27
E
+1
26
E
-1
24
E
-2
23
E
+2
27
H
E
Cf. Downie (1999) and Pickens (2000)
Assume above song is no. 99
Music index entry (pitch 1-grams): “18: 38, 45, 67, 71, 99, 132, 166”
Music index entry (pitch 2-grams): “1827: 38, 99, 132”
Music index entry (duration 2-grams): “HH: 67, 99”
11 March
46
Efficiency in More Complex Music Searches (1)
• More than one parameter at a time (pitch and duration is
obvious combination)
– For best-match searching, indexing still no problem
– For exact match searching, makes indexing harder
• Polyphony makes indexing much harder
– Byrd & Crawford (2002): “Downie speculates that ‘polyphony will
prove to be the most intractable problem [in music IR].’ We would
[say] polyphony will prove to be the source of the most intractable
problems.”
• Polyphony and multiple parameters is particularly nasty
– Techniques required are quite different from text
– First published research less than 10 years ago
• Indexing polyphonic music discussed
– speculatively by Crawford & Byrd (1997)
– in implementation by Doraisamy & Rüger (2001)
• Used n-grams for pitch alone; duration alone; both together
21 Mar. 06
47
Efficiency in More Complex Music Searches (2)
• Alternative to indexing: signature files
• Signature is a string of bits that “summarizes” document (or
passage)
• For text IR, inferior to inverted lists in nearly all real-world
situations (Witten et al., 1999)
• For music IR, tradeoffs can be very different
• Audio fingerprinting systems (at least some) use signatures
– Special case: always a known item search
– Very fast, e.g., Shazam…and new MusicDNS (16M tracks!)
• No other research yet on signatures for music (as far as I
know)
29 March 06
48
Hofstadter on indexing (and “aboutness”!) in text
• In Le Ton Beau de Marot:
• “My feeling is that only the author (and certainly not a computer
program) can do this job well. Only the author, looking at a given
page, sees all the way to the bottom of the pool of ideas of which the
words are the mere surface, and only the author can answer the
question, ‘What am I really talking about here, in this paragraph, this
page, this section, this chapter?’”
• “Aboutness” = underlying concepts; for music, musical ideas
• Really applies to any searching method, with real indexing or not
29 March 06
49
Data Quality in the Real World
• Real World => very large databases, updated frequently
• => not high quality data, no manual massaging
– Music-ir list discussion (2001) included Dunning’s explanation of
why extensive (or any!) manual massaging is out of the question in
many situations
– “We [MusicMatch] have 1/8 to 1/4 full time equivalent budgeted to
support roughly 15 million users who listen to music that [never]
has and never will have sufficient attention paid to it to allow
careful attention by taxonomists.”
• Applies to content as well as metadata
– JHU/Levy project approach to content-based searching: do noteby-note matching, but assume music marked up with “important”
notes identified
– Doubtful this is viable in many situations!
5 March
50
Finding Themes Automatically: Themefinder
• Themefinder (www.themefinder.org)
• Has repertories: classical, folksong, renaissance
– Classical is probably a superset of Barlow & Morgenstern’s
– They won’t say because of IPR concerns!
• Allows searching by
–
–
–
–
–
Pitch letter names (= Barlow & Morgenstern Dictionary)
Scale degrees
Detailed contour
Rough contour (= Parsons Directory)
Etc.
31 March 06
51
Finding Anything “Automatically”: Humdrum (1)
• Humdrum (www.themefinder.org)
• Can use with many encodings
–
–
–
–
–
Organized by spine: similar to a staff
kern (CMN) is most important
…but others include tablature
Spines can be in different encodings
Much encoded music available in kern
• Allows searching by anything…
– “Themefinder provides a web-based interface to the Humdrum
thema command”
• But not just searching!
– UNIX-based general toolkit for “analysis” (in very broad sense) of
symbolic music
• General & no available GUI => hard to learn & use
– Kornstadt’s Jring adds a GUI, but probably not available
31 March 06
52
Finding Anything “Automatically”: Humdrum (2)
• Has been used in dozens of published papers
• Examples
• kern notation: one staff/spine, multiple staff/spine
– http://dactyl.som.ohio-state.edu/Humdrum/guide02.html
– http://dactyl.som.ohio-state.edu/Humdrum/guide13.html
• A Fretboard searching example, demonstrating that humdrum can use
other representations
– http://www.music-cog.ohiostate.edu/Humdrum/guide25.html#similar_fret_patterns
• Converting kern to MIDI: shows that Humdrum can work with MIDI
– http://www.music-cog.ohio-state.edu/Humdrum/commands/midi.html
31 March 06
53
A Real-World Music Information Need 1
• Hofstadter’s violin-concerto identification problem (2000)
– Both content and metadata; concentrate on content now
– “Here's a music-retrieval question. Last movement of a romantic violin concerto
contains something like this. And it's going to be hard to sing it, but I'll try. [fast, in
groups of six]
– “ DAH-DAH-DAH-DAH-DAH-DAH, DUH-DUH-DUH-DUH-DUH-DUH...I can't
even sing it. DAH-DAH-DAH-DAH-DAH, DAH...
– “ But the melody part goes like this [much slower]:
– “ DAH DEE DUH, DAH DAH DAH DAH DEE DEE DUH,
– “ DAH DAH DAH DAH DAH DAH DUH, [much more of this]
– “ And then [fast, very similar to the first stuff]
– “ DAH-DAH-DAH-DAH-DAH-DAH, DUH-DUH-DUH-DUH-DUH-DUH, [2nd
group is repeated notes]... DAH-DAH-DAH...
– “ And it's like Brahms or Bruch or... Who knows.”
5 March
54
A Real-World Music Information Need 2
7 March, rev. 31 March
55
A Real-World Music Information Need 3
7 March
56
A Real-World Music Information Need 4
• Obviously enough information to identify it
• Hofstadter said:
– “Last movement of a romantic violin concerto contains something like
this... And it’s like Brahms or Bruch or... Who knows.”
– In a later conversation, he said he thought it might be the Saint-Saens
Introduction and Rondo Capriccioso
• Tim Crawford said early on it was Bruch
• And the winner is... Bruch: Violin Concerto no. 1 in G minor, Op. 26,
III, 2nd theme
• In 2000, but identifying by computer hopeless even now
• By human (Don), took months to identify positively. Why?
–
–
–
–
Melodic part started in middle of theme
Don misunderstood rhythm, partly because 1st note wrong
Passagework part inaccurate
Other: ??
5 March
57
Downie’s View of Music IR: Facets 1
• Downie (2003): “Music information retrieval” survey
• Downie is a library scientist
• “Facets of Music Information: The Multifaceted
Challenge”
1. Pitch: includes key
2. Temporal: tempo, meter, duration, accents => rhythm
3. Harmonic
4. Timbral
But “Orchestration [is] sometimes considered bibliographic”
5. Editorial: performance instructions, including dynamics
6. Textual: lyrics and libretti
7. Bibliographic: title, composer, editor, publisher, dates, etc.
Only facet that is not from content, but about it = metadata
12 March
58
Note Parameters (Review)
• Four basic parameters of a definite-pitched musical note
1. pitch: how high or low the sound is: perceptual analog of
frequency
2. duration: how long the note lasts
3. loudness: perceptual analog of amplitude
4. timbre or tone quality
– Above is decreasing order of importance for most Western music
– Also (more-or-less) decreasing order of explicitness in CMN
12 March
59
Downie’s View of Music IR: Facets 2
• Cf. “Classification: Surgeon General’s Warning”
• Downie’s facets compared to “Four basic parameters”
1. Pitch
2. Temporal
3. Harmonic
4. Timbral
5. Editorial
6. Textual
7. Bibliographic
1. Pitches in “sequence”
2. Durations in “sequence”
1. Pitches simultaneously
4. Timbre
3. Loudness—and timbre, duration (, pitch?)
(none)
(none)
12 March
60
Downie’s View of Music IR: Other “Multi”s
• The Multirepresentational Challenge
– Related to conversion among basic representations
– Problems aggravated by Intellectual Property Rights (IPR) issues
• The Multicultural Challenge
– Vast majority of music-IR work deals with Western CP music
• The Multiexperiential Challenge
– Questions about user groups/priorities, similarity, relevance, etc.
• The Multidisciplinary Challenge
– Music IR involves audio engineering, musicology, computer
science, librarianship, etc.
12 March
61
Downie’s View of Music IR: Types of Systems
• Representational Completeness and Music-IR Systems
– Degree of representational completeness = no. of facets: depth
– Number of works in database: breadth
• Analytic/Production Music-IR Systems
– More depth, less breadth
– Examples: Humdrum, ESAC/Essen (source of MELDEX data)
• Locating Music-IR Systems
– Less depth, more breadth
– Examples: Barlow & Morgenstern, Parsons (1 facet), Themefinder,
RISM
12 March
62
Intellectual Property Rights (IPR) 1
• IPR is huge problem for nearly all music information
technology including IR, both research and ordinary use
– No one knows the answers! Different in different countries!
– Cf. Levering (2000) for U.S. situation
• For music, U.S. copyright is complex “bundle of rights”
– mechanical right: right to use work in commercial recordings,
ROMs, online delivery to public for private use
– synchronization right: right to use work in audio/visual works
including movies, TV programs, etc.
– More complex than for normal text works because performing art
• U.S. Constitution: balance rights of creators and public
– After some period of time, work enters Public Domain
– Period of time has been getting longer and longer
26 March, rev. 15 April
63
Intellectual Property Rights (IPR) 2
• Law supposed to balance rights of creators & public, but…
– “To achieve these conflicting goals and serve the public interest requires a
delicate balance between the exclusive rights of authors and the long-term
needs of a knowledgeable society.” —Levering
– Sonny Bono Copyright Extension Act: 70 years after death!
– Digital Millenium Copyright Act (DMCA), etc.
• “Fair Use”: limit on exclusive rights of copyright owners
– Traditionally used for brief excerpts for reviews, etc.
– Helpful, but not well-defined. In U.S., four tests:
–
–
–
–
1.
2.
3.
4.
Purpose and character of use, including if commercial or nonprofit
Nature of copyrighted work
Amount and substantiality of portion used relative to work as a whole
Effect of use on potential market for or value of copyrighted work
• Other aspects of law
– Educational exemptions
26 March, rev. 15 April
64
Intellectual Property Rights (IPR) 3
• IPR in practice
–
–
–
–
NB I’m not a lawyer!
Mp3.com
Napster, Gnutella, FreeNet
Church choir director arranged, performed in church, donated to
publisher => sued
• Example: Student wants to quote brief excerpts from
Beethoven piano sonatas in a term paper, in notation
• Do they need permission from owner?
– NB I’m not a lawyer!
– Beethoven has been dead for more than 70 years => all works in
Public Domain
– …but not all editions!
– Still, don’t need permission because Fair Use applies
– For recording, probably not P.D., but Fair Use applies
26 March, rev. 15 April
65
Building Symbolic Music Collections
– Direct encoding may be best
•
•
•
•
Most or all existing collections done this way
But in what representation?
No standard => often have to convert
Starting with OMR and polishing may be as good, and faster
– Optical Music Recognition (OMR)
•
•
•
•
First commercially available via Nightingale’s NoteScan
Fairly widespread, e.g., in Finale; SharpEye => MusicXML
Reasonably useful but not as reliable as OCR
As technology improves, likely to get more reliable
– Audio Music Recognition (AMR): a great idea, but...
• Christopher Raphael (1999): AMR is “orders of magnitude
more difficult” than OMR
• Gerd Castan (2003): “There is no such thing as a good
conversion from audio to MIDI. And not at all with a single
mouse click.”
26 March
66
OMR at Its Best
Here's the original:
Scanned into Finale: Only 5 easy edits needed.
Taken from http://www.codamusic.com/finale/scanning.asp
16 April
67
Music Collections: Current and Prospective 1
• Research-only vs. user collections
– IPR problem is serious even for research only!
• Terminology: collection vs. database; corpus = collection?
• Cf. my list of candidate test collections
• Symbolic: most interesting/important include CCARH,
MELDEX folksongs, Themefinder, Classical MIDI Arch.
– Commercial collections (e.g., Sunhawk) are dark horses
• Images (OMR => symbolic!): most interesting/important
for us include Variations2, JHU/Levy, CD Sheet Music
4 April
68
Music Collections: Current and Prospective 2
• Audio: full Naxos catalog via UIUC/NCSA project?
– 4000(?) CDs x 650 MB => several terabytes!
• Parallel corpora
• RWC databases: start from scratch => no IPR problems
– Nice idea, but very expensive—RWC is tiny
• Limitations & pitfalls: size, quality (cf. Huron), repertoire
4 April
69
Nightingale and Extra Notes (Problem 2)
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Theme & Var. 1
70
Nightingale and Independent Voices (Problem 2)
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Variation 2
71
2-D Pattern Matching in JMS and Extra Notes
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Theme & Var. 1
72
2-D Pattern Matching in JMS and Parallel Voices
Mozart: Variations on “Ah, vous dirai-je,
Maman” for piano, K. 265, Variation 2
73
Music Not Written in CMN by Dead European
Men of the Last Few Centuries 1
• Informal genre identification
– Try with c. 1 sec., 5-10 sec. (vs. Tzanetakis’ 250 msec.)
• “This is all about dead Europeans, and they’re great. But
we are not dead Europeans!” —David Alan Miller,
conductor of the Albany Symphony Orchestra, c. 1990
• How does content-based searching of other music (world
and other!) pose different problems from music in CMN by
Europeans of (say) 15th thru early 20th centuries?
• Possible solutions to those problems?
9 April
74
Music Not Written in CMN by Dead European
Men of the Last Few Centuries 2
• Examples were:
1. “After a Pygmy chant of Central Africa”, arr. by Marie Daulne,
words by Renaud Arnal: Mupepe [recorded by Zap Mama]. On
Adventures in Afropea 1 [CD].
2. Eminem: Without Me. On The Eminem Show [CD].
3. Hildegard von Bingen (12th century): O virga ac diadema
[recorded by Anima]. On Sacred Music of the Middle Ages [CD].
4. Guru Bandana [rec. by Asha Bhosle, Ali Akbar Khan, Swapan
Chaudhuri]. On Legacy: 16th-18th century music from India [CD].
5. Duke Ellington: Sophisticated Lady. On Duke Ellington - Greatest
Hits [CD].
6. Iannis Xenakis (1960): Orient-Occident. On Xenakis: Electronic
Music [CD].
7. Beatriz Ferreyra: Jazz’t for Miles. On Computer Music Journal
Sound Anthology, vol. 25 (2001) [CD].
9 April
75
Music Not Written in CMN by Dead European
Men of the Last Few Centuries 3
• How does content-based searching of other music (world
and other!) pose different problems from music in CMN by
Europeans of (say) 15th thru early 20th centuries?
–
–
–
–
Different textures
Emphasis on different parameters of notes
…if there are notes!
“Pieces” aren’t well-defined (improvisation, etc.)
• Possible solutions to those problems?
– Consider texture, e.g., oblique motion, pedal tones
– Consider text (words)…
– Or at least language of text
9 April
76
Music IR as Music Understanding
• Dannenberg (ISMIR 2001 invited paper)
• Argues central problem of music IR is music
understanding
• …also basis for much of computer music (composition &
sound synthesis) and music perception and cognition
– “A key problem in many fields is the understanding and
application of human musical thought and processing”
• Related problems he’s worked on
– Computer accompaniment (became Coda’s Vivace)
• Score following
• Ensemble accompaniment
– Improvisational style classification
• DAB: No understanding yet; sidestep intractable problems!
14 April
77
NightingaleSearch: Dialog
16 April
78
NightingaleSearch: Overview
• Dialog options (too many for real users!) in groups
• Main groups:
Match pitch (via MIDI note number)
Match duration (notated, ignoring tuplets)
In chords, consider...
• Search for Notes/Rests searches score in front window: one-at-a-time
(find next) or “batch” (find all)
• Search in Files versions: (1) search all Nightingale scores in a given
folder, (2) search a database in our own format
• Does passage-level retrieval
• Result list displayed in scrolling window; double click => show match
in document
79