Diapositiva 1

Download Report

Transcript Diapositiva 1

LANGUAGE AND INTELLIGENCE
UNIVERSITY OF PISA
DEPARTMENT OF COMPUTER SCIENCE
Automatic Transcription of Piano Music
Sara Corfini
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
1
INTRODUCTION
Trascribing recordings of piano music into a MIDI rapresentation

MIDI provides a compact representation of musical data

Score-following for computer-human interactive performance

“Signal-to-score” problem
A hidden Markov model approach to piano music transcription

A “state of nature” can be realized through a wide range of
data configurations

Probabilistic data representation

Automatically learning this probabilistic relationship is more
flexible than optimizing a particular model

Rules describing the musical structure can be more accurately
represented as tendencies
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
2
THE MODEL
The acustic signal is segmented into a sequence of frames
(“snapshots” of sound)
For each frame a feature vector
y1,…,yN
is computed
Goal  to assign a label to each frame describing its content
A generative probabilistic framework (a hidden Markov model)
outputs  the observed sequence of features vectors y1,…,yN
hidden variables  labels
A Hidden Markov model is composed of two processes
X = X1,…,XN
and
Y = Y1,…,YN
X is the hidden (or label) process and describes the way a sequence
of frame labels can evolve (a Markov chain)
We do not observe the
vector data
X process directly, but rather the feature
The likelihood of a given feature vector depends only on the
corresponding label
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
3
THE LABEL PROCESS
GOAL  to assign a label to each frame where each label ∈ L
Components of the label

the pitch configuration (chord)

“attack”, “sustain”, “rest” portions of a chord
We define a random process (a Markov chain) X1,…,XN that takes
value in the label set L
The probability of the process occupying a certain state (label) in a
given frame depends only on the preceding state (label)
where
p(x’|x) is the transition probability matrix and X1n = (X1,…,XN)
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
4
THE LABEL PROCESS
Markov model for a single chord
Markov model for recognition problem

the final state of each chord model
is connected to the initial state of
each chord model

a silence model is constructed for
the recorder space before and after
the performance
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
5
THE OBSERVABLE PROCESS
Rather than observe the label process x1,…,xN, we observe feature
vector data y1,…,yN (probabilistically related to labels)
Assumption of HMM  each visited state
Xn produces a feature
vector Yn from a distribution that is characteristic of that state
Hence, given Xn, Yn, is conditionally independent of all other frame
labels and all other feature vectors
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
6
THE OBSERVABLE PROCESS
We compute a vector of features for each frame y
= (y1,…,yK)
The components of this vector are conditionally independent given
that state
The state are tied  different states share the same feature
distributions
Where the Tk(x) is constructed by hand
Hence we have
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
7
THE OBSERVABLE PROCESS
Tk(x) can be clarified by describing the computed features
y1  measures the total energy in the signal (to distinguish between
the times when the pianist plays and when there is silence)

for the silence and rest states

T1(x) = 0
T1(x) = 1

Two probabilistic distributions:



for the remaining states
p(yk|T1(x)=0)
p(yk|T1(x)=1)
Partition of the label set generated by

T1(x) : {x ∈ L : T1(x)=0}, {x ∈ L : T1(x)=1}
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
8
THE OBSERVABLE PROCESS
y2  measures the local burstiness of the signal (to distinguish
between note “attacks” and steady state behaviour)
y2 computes several measures of burstiness (is a vector)
For this features, states can be partioned in three groups

T2(x) = 0
 states at the beginning of each note (high
burstiness)

T2(x) = 1

T2(x) = 2
 states corresponding to steady state behaviour
(relatively low burstiness)
 silence states
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
9
THE OBSERVABLE PROCESS
y3,…,yK  concerns the problem of distinguishing between the many
possible pitch configuration
Each features of y3,…,yK is computed from a small frequency interval of
the Fourier transformed frame data
For each window we compute

the empirical mean  location of the harmonic (when there is a
single harmonic in the window)

the empirical variance  to distinguish probabilistically when there
is a single harmonic (low variance) and when there is not (high
variance)
State can be partinioned as

Tk(x) = 0  states in which no notes contain energy in the window
 Tk(x) = 1  states having several harmonics in the window
 Tk(x) = t  states having a single harmonic at approximately the
same frequency in the window
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
10
TRAINING THE MODEL
Since the HMM formulation, the probability distribution can be trained
in an unsupervised fashion
An iterative procedure (Baum-Welch algorithm) allows to
automatically train from signal-score pairs
When the score is known, we can build a model for the hidden process
The algorithm

Starts from a neutral starting place (we begin with uniformly
distributed output distributions)

Iterates the process of finding a probabilistic correspondence
between model states and data frames

Next, we retrain the probability distribution using this
corrispondence
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
11
TRAINING THE MODEL
Output distributions on feature vectors are represented through
decision trees
For each distribution p(yk|Tk(x)) we form a binary tree

Each non terminal node corresponds to a question yk,v
(where
<c
yk,v is the vth component of feature k)

An observation yk can be associated with a non terminal node by
dropping the observation down the tree (evaluating the root
question)

The process continues until it arrives at a terminal node, denoted
by Qk(yk)
As the training procedure evolves, the trees are re-estimated at each
iteration to produce more informative probability distributions
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
12
RECOGNITION
The traditional HMM approaches to recognitions seeks the most likely
labeling of frames, given the data, through dynamic programming
This corresponds to find the best path through the graph, where the
reward in going from state xn-1 to xn in the nth iteration is given by
The Viterbi algorithm constructs the optimal paths of lenght
n from the
optimal paths of length n-1
The computational complexity grows with the square of the state-space
which is completely intractable in this case
The state space is on the order of 108 (under restrictive assumptions on
the possible collection of pitches and the number of notes in a chord)
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
13
RECOGNITION
We use the data model constructed in the training phase to produce a
condensed version of the state graph
For each frame n we perform a greedy search that seeks a plausible
collection of state
x ∈ L for that frame
This is accomplished by searching for states
x giving large value to
p(yn|x). The search is performed by

Finding the mostly likely 1-note hypotheses

Then considering 2-note hypotheses and so on
Each frame n will be associated with a possible collection of states
An
The state are blended by letting
The graph is constructed by restricting the full graph to the
Bn set
Disadvantage  if the true state at frame n is not captured by Bn, then
it cannot be recovered during recognition
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
14
EXPERIMENTS
The hidden Markov model has been trained by using data taken from
various Mozart piano sonatas
The result concerns a performance of Sonata 18, K.570
Objective measure of performance  edit distance
Recognition erroe rates are reported as
Note error rate  39% (184 substitutions, 241 deletions, 108 insertions)

If two adjacent recognized chords have a pitch in common, it is
assumed that the note is not rearticulated

Inability to distinguish between chord homonyms
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
15
CONCLUSION
Recognition results leave room for improvements
Results may be useful in a number of Music Information Retrieval
applications tolerant of errorful representaions
The current system works with no knowledge of the plausibility of
various sequences of chord probability of chord sequence

Probabilistic model that models the likelihood of chord sequences
The current system makes almost no effort to model the acoustic
characteristics of the highly informative note onsets

A more sophisticated “attack” model would help in recognizing
the many repeated notes which the system currently misses
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
16
REFERENCES
Christopher Raphael
Automatic transcription of piano music
In Proceedings of the 3rd Annual International Symposium on Music
Information Retrieval (ISMIR), Michael Fingerhut, Ed., pp. 15-19,
IRCAM - Centre Pompidou, Paris, France, October 2002.
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
17