Algorithms for pattern discovery and pitch spelling in music

Download Report

Transcript Algorithms for pattern discovery and pitch spelling in music

Algorithms
for pattern discovery
and pitch spelling in music
David Meredith
Goldsmiths College
University of London
Overview of Research Interests

Music information retrieval
• Managing musical data and retrieving useful information
from it

Automatic music transcription
• Computing a score from a recording of a musical passage

Computational music cognition and analysis
• Constructing computational models that extract structures
that listeners hear and music analysts find interesting

Evaluation
• sound methodologies and “gold-standard” test collections
Musical pattern discovery
and pitch spelling

Musical pattern discovery
• Finding themes and other perceptually important
repeated patterns
• Useful for indexing in music information retrieval

Pitch spelling
• Predicting the pitch names (e.g., C#4, B@3) of
notes in a “piano-roll” representation (e.g., MIDI)
• Essential for transcribing music from MIDI (or
audio) to notation
Uses of musical pattern
discovery algorithms

Indexing
• Store themes, motives and other memorable patterns in
index to enable sub-linear retrieval times

Transcription and music analysis
• Beat tracking and metrical structure analysis - similar
patterns have similar metrical structure
• Grouping and phrasing - “parallellism” (Lerdahl and
Jackendoff, 1983) most important factor in grouping

Composer’s assistant, automatic improvisation
• Cure composer’s block by suggesting new material based
on patterns discovered in music already written
• Automatically create new music that develops themes
discovered in music already played
Importance of repeated patterns
in music analysis and cognition

Schenker (1954. p.5):
• repetition “is the basis of music as an art”

Bent and Drabkin (1987, p.5):
• “the central act” in all forms of music analysis is
“the test for identity”

Lerdahl and Jackendoff (1983, p.52):
• “the importance of parallelism [i.e., repetition] in
musical structure cannot be overestimated. The
more parallelism one can detect, the more
internally coherent an analysis becomes, and the
less independent information must be processed
and retained in hearing or remembering a piece”
Most musical repetitions are
neither perceived nor intended
Interesting musical repetitions
are structurally diverse
Want to discover all and only interesting
repeated patterns
 Class of interesting repeated patterns is
structurally diverse because

• patterns vary widely in structural characteristics
• many ways of transforming a musical pattern to
give another pattern that is perceived to be a
version of it
• e.g., truncated, augmented, diminished, inverted,
embellished and even reversed
Example of repeated motive
Example of thematic
transformation
String-based algorithms for
discovering musical patterns

Most previous approaches assume music
represented as strings
• each string represents a voice or part
• each character represents a note or an interval
between two consecutive notes in a voice

Similarity between two patterns measured
in terms of edit distance calculated using
dynamic programming
• see, e.g., Lemstrom (2000), Hsu et al. (1998),
Rolland (1999)
Problems with the string-based
approach - Edit distance
B is an embellished version of A
 If both patterns represented as strings

each symbol represents pitch of note
 then edit distance between A and B is 9


If allow pattern with 9 differences to count
as a match, then get many spurious hits
Problems with string-based
approach - Polyphony

If searching polyphonic music and
• do not know voice to which each note belongs (e.g., MIDI
format 0 file); or
• interested in patterns containing notes from 2 or more voices

then
• combinatorial explosion in number of possible string
representations
• if don’t use all possible representations then may not find all
interesting patterns
Using multidimensional point
sets to represent music (1)
Using multidimensional point
sets to represent music (2)
SIA - Discovering all maximal
translatable patterns (MTPs)
Pattern is translatable by vector v in dataset if it can be
translated by v to give another pattern in the dataset
MTP for a vector v contains all points mapped by v onto
other points in the dataset
O(kn2 log n) time, O(kn2) space
O(kn2) average time with hashing (Lemstrom)
SIATEC - Discovering all
occurrences of all MTPs
Absolute running times of SIA
and SIATEC




SIA and SIATEC implemented in C
run on a 500MHz Sparc on 52 datasets (6≤n≤3456,
2≤k≤5)
< 2 mins for SIA to process piece with 3500 notes
13 mins for SIATEC to process piece with 2000 notes
Need for heuristics to isolate
interesting MTPs


2n patterns in a dataset of size n
SIA generates < n2/2 patterns
• => SIA generates small fraction of all patterns in a dataset


Many interesting patterns derivable from
patterns found by SIA
BUT many of the patterns found by SIA are
NOT interesting
• 70,000 patterns found by SIA in Rachmaninoff’s Prelude in
C# minor
• probably about 100 are interesting

=> Need heuristics for isolating interesting
patterns in output of SIA and SIATEC
Heuristics for isolating musical
themes and motives
Cov=6
CR=6/5
Cov=9
CR=9/5
Comp = 1/3
Comp = 2/5
Comp = 2/3
Coverage Number of point s covered by occurrences of the pattern
Compactness =
Number of point s in patt ern
Number of point s in region spanned by pattern
Compression rat io
Coverage
Number of point s in patt ern
+ Freq. of occurrence of patt ern
-1
COSIATEC - Data compression
using SIATEC
Start
Dataset
SIATEC
List of <Pattern, Translator_set> pairs
Print out best pattern, P, and its translators
Remove occurrences of P from dataset
Is dataset empty?
Yes
End
No
Using COSIATEC for finding
themes and motives in music
First iteration
Second iteration
SIAM - Pattern matching using
SIA
Query pattern
Dataset
O(knm log(nm)) time
 O(knm) space
 O(knm) average
time with hashing

Improving SIAM - Ukkonen,
Lemström & Mäkinen (2003)




Use sweepline-like scanning of the dataset
(Bentley and Ottmann, 1979)
Generalized to approximate matching of sets of
horizontal line-segments
Improved running time to O(mn log m) (without
hashing) and working space to O(m)
Implemented as algorithm P2 on C-BRAHMS
demo web site
• <http://www.cs.helsinki.fi/group/cbrahms/demoengine/>
Improving SIAM - Clifford (In
preparation)





Finds best match in O(n log n) time
Reduce problem to one dimension by
randomised projection
Reduce length of problem by uniform hashing
Perform pattern matching using FFTs
Find best match and check in O(m) time exactly
how many points match at the location that can
be inferred from this match
Pitch spelling algorithms (1)
Pitch spelling algorithms (2)
Pitch spelling in tonal music
(Piston, 1978, p.8)
Pitch name depends on harmonic structure
and voice-leading structure
 Pitch name chosen so that score correctly
represents the way the music is intended to
be perceived and interpreted

Comparative analysis of pitch
spelling algorithms

Algorithms analysed, evaluated and (in
some cases) improved
•
•
•
•
•

Longuet-Higgins (1976, 1987, 1993)
Cambouropoulos (1996,1998, 2001, 2003)
Temperley (2001)
Chew and Chen (2003, 2005)
Meredith (2003, 2005, 2006)
Test corpus
• 195972 notes, 216 movements, 8 baroque and
classical composers
• almost exactly equal number of notes (24500) for
each composer
Evaluation criteria and
performance metrics

Evaluation criteria
• Spelling accuracy - how well an algorithm predicts the pitch
names
• Style dependence - how much spelling accuracy depends
on style

Performance metrics
• Note accuracy - proportion of notes in corpus spelt correctly
• Style dependence - standard deviation of note accuracies
over 8 composers

Robustness to temporal deviations
• Best versions of algorithms also run on version of test
corpus in which onsets and durations were randomly
adjusted
Longuet-Higgins’s algorithm
(1976,1987,1993)

Uses 6 rules to predict pitch names
• Rule 1: pitch names as close to tonic on line of fifths
• Rules 2-6: deal with chromatic intervals and key changes
• Rule 2 incorrectly implemented in music.p

6 versions of algorithm tested
• Original and two versions with Rule 2 “corrected”
• Same three algorithms with pitch names not restricted to being
between G double sharp and A double flat

Two versions of test corpus
• Voices arranged “end-to-end” (should be better)
• Voices “interleaved” with notes sorted by onset and pitch
Longuet-Higgins’s algorithm Results
Correcting Rule 2 implementation lowered
note accuracy
 Made half as many errors when voices
end-to-end
 Allowing pitch names to be anywhere on
the line of fifths doubled number of errors
 Original version performed best
(NA = 98.21%; SD = 1.79)

Cambouropoulos’s algorithm
(1996,1998,2001,2003)



Three published versions of algorithm
Input changed to sequence of MIDI note numbers
Shifting overlapping window
• improves running time and avoids boundary errors

Computes all spellings for each window
• 128 spellings for each 9-note window

Spelling penalised if
• contains intervals that are rare in tonal scales
• contains double sharps or double flats
Cambouropoulos’s algorithm Evaluation

18 ways in which two versions of the algorithm
could differ
• e.g., variable or fixed length window

26 versions implemented and tested
• goal to estimate optimal combination of variable features

Window: Variable-length better than fixed-length
• Best variable-length window version: NA = 99.07%; SD = 0.46

Increasing window size
• increases accuracy but exponentially increases running time
• 12 note window is practical maximum

Algorithm with ‘optimal’ combination of features:

NA = 99.15%; SD = 0.47
Temperley and Sleator’s pitch
spelling algorithm (2001)
Temperley and Sleator’s
algorithm - Evaluation

Output of meter program depends on tempo


Best on natural tempo or half-speed corpora



System tested on 6 versions of corpus, each with
different tempo
NA = 99.30%; SD = 1.13 (without enh. change)
NA = 97.79%; SD = 4.57 (with enh. change)
Highly sensitive to tempo

at 4 times natural tempo, NA = 74.58%
• worse than just spelling all black notes randomly as either
sharp or flat!

Simple implementation of TPR 1 alone achieved

NA = 99.04%; SD = 0.65
Chew and Chen’s algorithm
(2003,2005)



Based on “spiral array” = line of fifths
coiled up
Tonic represented by center of effect =
Centroid of positions in spiral array of
pitch names in preceding window
First spelt so close to global CE, then
re-spelt so close to weighted average of
local and cumulative CEs
Chew and Chen’s algorithm Evaluation

New implementation allows user to
• use line of fifths instead of spiral array
• consider notes starting in each window instead of notes
sounding in each window when computing CEs
• change aspect ratio of spiral array


Run 1296 times on test corpus, each time with
different parameter value combination
Best 12 versions scored NA=99.15%, SD=0.4
• worked best when all three CEs used, local and cumulative
CEs weighted equally and chunks small

Line of fifths worked just as well as the spiral
array
PS13s1
(Meredith, 2003,2005,2006)



Pitch name implied by a tonic is one that is closest to the
tonic on the line of fifths
Strength with which tonic implied proportional to
frequency of occurrence
Strength with which pitch name implied proportional to
sum of frequencies of occurrence of tonics implying pitch
name
PS13s1 - Results

Takes two parameters:
• Precontext (Kpre): number of notes preceding note to be
spelt included in context
• Postcontext (Kpost): number of notes following note to be
spelt included in context

PS13 run with all values of Kpre and Kpost
between 1 and 50
• PS13s1 run with 17 best values obtained with PS13
• Made 15-19% fewer errors than PS13 for these parameter
values

Some results:
• NA = 99.44%, SD = 0.49 (Kpre=10, Kpost=42)
• NA = 99.44%, SD = 0.45 (Kpre=33, Kpost=25)
• NA = 99.19%, SD = 0.51 (Kpre=40, Kpost=1)
Summary of pitch spelling
results
Algorithm
PS13s1x
Temperley*
Chew and Chen+
Cambouropoulos+
Longuet-Higgins§
xK =
pre
Clean corpus Noisy corpus
NA%
SD NA%
SD
99.44 0.45 99.41 0.50
99.27 1.30 97.43 1.69
99.15
99.15
98.21
0.42
0.47
1.79
99.12
99.07
98.25
33, Kpost= 25
*Two-pass, natural tempo corpus, without enh. change
+New optimized versions
§Only when music processed a voice at a time.
0.47
0.53
1.71
Future work

Pattern discovery and pattern matching
• Compare SIA algorithms with methods developed in other
more mature fields (e.g., computer vision, graph matching)
• Improve time complexity of SIA algorithms with advanced
algorithmic techniques (e.g., randomized projection,
hashing)
• Adapt algorithms for approximate matching and scaling
(matching at different tempi)
• Adapt SIA and SIATEC for early pruning of uninteresting
patterns

Pitch spelling
• Incorporate PS13s1 into complete MIDI-to-notation
transcription system
• Use PS13s1 for key-tracking and harmonic analysis
• Use PS13s1 for feature extraction on audio data
Acknowledgements
and further details

Thanks to
• Chris Bishop and Stephen Robertson for inviting
me to give a talk
• Geraint Wiggins for suggesting SIAM
• Kjell Lemstrom for developing SIAM further
• Raphael Clifford for developing SIAM further still
• EPSRC for funding
• GR/S17253/02, GR/N08049/01

Further details:

http://www.titanmusic.com