ramon_lopez_MusicaCBR20062505

Download Report

Transcript ramon_lopez_MusicaCBR20062505

Performing expressive music using
Case-Based Reasoning
Ramon López de Mántaras
IIIA - CSIC
[email protected]
www.iiia.csic.es/~mantaras
Outline
• Reminding CBR & Introducing Saxex main
components
• Case representation
– The musical knowledge
• Retrieval using perspectives
• Reuse
– Fuzzy combination
• SaxEx Results
• TempoExpress
• Conclusions and future work
Case-based reasoning (CBR)
Solving problems by means of examples of
already solved similar problems
(reasoning from precedents)
The task of our system is to infer, via CBR and
musical knowledge, a set of expressive
transformations to be applied to the notes of
inexpressive musical phrases given as input
The precedents are examples of expressive
human interpretations
Saxex Components
Output
Input
Expressive phrase
Inexpressive
Score phrase
Affective
Labels .mid
.snd
.snd
analysis
synthesis
SMS
.sms
.sco
Noos
CBR method Musical
Cases
models
SMS Snapshot
Saxex-CBR
Saxex-CBR
Retrieve
Reuse
Revise
Retain
Identify&Select
Identify
Construct
perspectives
Search
Select
Rank
Retrieve
precedents
using
perspectives using persp.
and pref.
Apply
expressive
transform.
Propose
expressive
performances
Memorize
new solved
case
Outline
• Reminding CBR & Introducing Saxex main
components
• Case representation
– The musical knowledge
• Retrieval using perspectives
• Reuse
– Fuzzy combination
• SaxEx Results
• TempoExpress
• Conclusions and future work
Case representation
• Score
• Musical knowledge
– implication-realization, metrical structure, time-span
reduction & prolongational reduction
• Performance representation (solution description)
• sound transformation operations:
– eg: high dynamics, medium rubato, very legato, etc.
S
O
L
U
T
I
O
N
Transformations
• Transformations (for each note)
– Dynamics (5 possible values)
– Rubato (5 possible values)
– Vibrato (5 possible values)
-----> 1250 possibilities
– Articulation (5 possible values)
– Attack (2 possible values)
Vibr.
Din.
Rub
Art.
Vibr.
Score
Musical knowledge
• Implication/Realization model (Narmour)
– Basic structures:
– Melodic direction, durational cumulation
• GTTM theory (Lerdahl & Jackendoff)
– Metrical structure (metrical strength of notes)
– Time-span reduction (relative importance of notes
within phrases or sub-phrases)
– Prolongational reduction (tensions, relaxations)
• Jazz Theory
– Harmonic Progressions (duration, harmonic stability)
Implication/Realization Model
GTTM Theory
Performance
Outline
• Reminding CBR & Introducing Saxex main
components
• Case representation
– The musical knowledge
• Retrieval using perspectives
• Reuse
– Fuzzy combination
• SaxEx Results
• TempoExpress
• Conclusions and future work
A Retrieval Perspective
Retrieval Example
Identify
Search
Problem
Case Memory
Select
Outline
• Reminding CBR & Introducing Saxex main
components
• Case representation
– The musical knowledge
• Retrieval using perspectives
• Reuse
– Fuzzy combination
• SaxEx Results
• TempoExpress
• Conclusions and future work
Saxex-Reuse
• Transformations
– Dynamics
– Rubato
– Vibrato
– Articulaction
– Attack
• Criteria
– Most similar
– Majority
– Minority
– Continuity
– Random
– Fuzzy combination (DEFAULT)
Vibr.
Din.
Rub
Art.
Vibr.
Saxex-Reuse Example
Problem
Single case retrieved
Din.
Rub
Art.
Din.
Rub
Art.
Saxex-Reuse (Fuzzy Combination)
The notes in the human-performed musical phrases are qualified by means of
five ordered linguistic values. Those for rubato are:
1
Very
Low
Low
Medium
0
High
Very
High
20
320
Tempo
Assume that SaxEx has retrieved and selected two notes whose rubato values are
72 and 190 respectively. The fuzzy combination followed by a defuzzification gives
the rubato value to be applied to the input note:
0.9
0.7
Low
72
COA
123
Medium
190
Outline
• Reminding CBR & Introducing Saxex main
components
• Case representation
– The musical knowledge
• Retrieval using perspectives
• Reuse
– Fuzzy combination
• SaxEx Results
• TempoExpress
• Conclusions and future work
Saxex Results
Autumn Leaves
Expressive
Output phrase
Inexpressive
Input phrase
SaxEx
Affective Labels
• Three orthogonal dimensions
– Tender-Aggressive
– Sad-Joyful
– Calm-Restless
• Relating to notions such as
– activity
– tension vs. relaxation
– Brightness
...
SaxEx Results
All of me
Inexpressive
Input phrase
Joyful
SaxEx
Aff. values
Sad
• Reminding CBR & Introducing Saxex main
components
• Case representation
– The musical knowledge
• Retrieval using perspectives
• Reuse
– Fuzzy combination
• SaxEx Results
• TempoExpress
• Conclusions and future work
TempoExpress
Goal:
– Changing the original performing tempo of a melody,
preserving expressiveness, in the context of jazz
standards.
Application:
Audio editing software
Video / Audio post-production (video constrains audio)
Why not applying uniform time stretching to the audio?
Timing of notes w.r.t. beat may have to change
Other expressive phenomena (e.g. ornamentations,
consolidations, fragmentations) may have to change as
a function of the tempo
TempoExpress
Musical explanation: Expressivity is a result of the
conception of the music by the performer, and
this conception changes with tempo [Desain &
Honing, 1994]
Melody: “Up Jumped Spring”
Uniform time stretching
Recording
Original tempo (180 )
Transformed tempo (90)
Expressive Transformations
Some basic music performance concepts and their relations
Onset deviations at different tempos
(Body and Soul A1)
Approches to expressive music generation
• “Hand crafted”
– Let a music expert formulate rules for music performance
(Friberg, CMJ 1991, Friberg et al. CMJ 2000)
• Machine learned
– Derive expressivity rules automatically from examples (Widmer,
ICMC 2000, JNMR 2002)
• Eager approach: Builds a model based on many training
examples and uses the learned model to solve new problems
– Imitate expressivity using examples of concrete human
performances by means of CBR (Arcos & Lopez de Mantaras,
JNMR 1998, Lopez de Mantaras & Arcos, AI Mag 2002)
• Lazy approach: Take the solution of the training example that
resembles most to the new problem, and adapt it to solve it
“That an expressive effect is applied only once does not mean it is insignificant”
(Sundberg, MP 2001)
TempoExpress Architecture
Desired Tempo
Performance Annotation
Expressivity in jazz is more than timing / dynamics deviations. It is also
spontaneous note ornamentations, fragmentations, etc.
To model this, we define a set of Performance Events:
And we use them as edit operations to obtain an edit-distance-based alignment
between the score and the performance
Goal of the annotation process
– Automatic case base acquisition
Comparing Score vs recordings
Examples
Body and Soul
I
Once I Loved
I
F
C
C
Edit (Levenshtein) distance
• Goal: Assessing the distance between
two sequences <S1 , S2 >
– Calculated as the minimal cost of
transforming S1 into S2
– Requires:
• Edit operations
• Cost functions
Edit (Levenshtein) distance
di  1, j  w(ai,)
(deletion)

(insertion)
di, j  1  w(,bj )
di, j  mindi  1, j  1  w(ai,bj )
(replacement)
di  1, j  k  w(ai,bj  k  1, ...,bj ),2  k  j (fragmentation)

di  k, j  1  w(ai  k  1, ...,ai,bj ),2  k  i (consolidation)
R
I
R
R
Annotation examples (I)
Case
T
T
F
Annotation examples (II)
Case
T T
C
T T T
C
Annotation examples (III)
Case
I
T
T T T
Representing melodic context
• Rationale: the expressivity of a performed note is not just
determined by the note itself.
• Ergo: Some representation of the melodic context of the note is
needed
• We use the Implication / Realization model of melodic structure
(Narmour, 1990)
– It captures the pattern of fulfillment / violation of expectations created by the
melodic surface
– Groups notes based on gestalt principles
Case Representation
Repeated for each
tempo
Retrieval
• 1. Filter cases by tempo: keep cases containing performances at
relevant tempos (one of the tempos is similar to the original tempo of
the target melody and there is another performed tempo similar to the
desired tempo to which the target melody has to be transformed)
• 2. Rank the cases that passed the previous filter by I/R similarity to
the score of the target melody (using edit-distance)
• 3. Partition the phrases of the most similar cases into segments using
the I/R parser or any other melodic segmentation algorithm (for
instance Temperley, 2001)
• 4. Form a “new” case base containing the obtained segments (space
of partial solutions) as cases
Reuse
•
Solutions for the target melody are generated segment-wise via
a best first search through the space of partial solutions
(segments)
•
Procedure:
Retrieve best matching segment (using edit-distance)
Align target melody and retrieved segment
Transfer performance events
1.
2.
3.
For aligned notes T and R, let Ti(R) -----> To(R) represent the tempo
transformation of note R; use the annotations differences between Ti(R)
and To(R) to generate the solution To(T) from Ti(T)
4.
For non-aligned target notes use UTS to transform Ti(T) into
To(T)
TempoExpress overall view
Example of TempoExpress Result
55
100 bpm
Uniform time stretching
CBR
Human
Experimental comparison to UTS
• Four jazz standards recordings by a professional
musician (12 tempos for each: 48 recordings)
• 14 different phrases containing a total of 64
different melodic segments
• More than 8000 tempo-transformation problems
in the case base
TempoExpress vs. UTS as a function of the ratio of original tempo to transformed tempo.
The lower plot shows the probability of incorrectly rejecting the hypothesis (that there is
no difference between TempoExpress and UTS) for the Wilcoxon signed-rank test.
Conclusions & Future
•
•
•
•
•
•
•
CBR is a powerful technique to imitate human solutions (performances):
Human-like output
SaxEx successfully retrieves relevant cases
Fuzzy combination increases output variation
SaxEx as a pedagogical tool:
– Users can experiment with the system
– Helps understanding how to use the different expressive resources
TempoExpres: an application to audio post-production that clearly
outperforms UTS
Further TempoExpress experimentation with fast tempos (more example
cases at fast tempos are needed)
Add within-note descriptions:
– Energy envelpe features: attack, sustain, decay, tremolo
– Pitch envelope features: vibrato, glissando
•
Add between-notes descriptions:
– Articulation (legato,, staccato)