transcription - grfia - Universidad de Alicante

Download Report

Transcript transcription - grfia - Universidad de Alicante

A Multimodal Music Transcription Prototype
First steps in an interactive prototype development
PROJEC
T
Tomás Pérez-García, José M. Iñesta, Pedro J. Ponce de León, Antonio Pertusa
Universidad de Alicante, Spain
Description and Retrieval of Music and Sound Information
Descripción y Recuperación de Información Musical y Sonora
What is automatic music transcription? Aim: This prototype is conceived as a research platform for
Transforming an audio signal of a music performance
in a symbolic representation (MIDI or score).
developing
and applying interactive and multimodal techniques to the monotimbral
transcription task.
Multimodality: it uses three different sources of information to detect notes in a musical
audio excerpt: signal, note onsets, and rhythm information.
State-of-the-art techniques are far from being accurate, specially in the case of
polyphony and multitimbral sounds. So nothing even close to 100% can be expected
 User corrections are needed.
Problem decomposition (summary):
Interactive: Designed to make use of user feedback on onsets, beats, and notes in a left-toright validation approach:
a user interaction validates what remains at the left-hand side,
interactions are used to re-compute the rest of the output.
Transcription modes:
Frame-based transcription:
Spectrogram  Frames  Set of pitch candidates  Selection by “salience”  Smoothing in short context  Set of pitches by frame
AUDIO
Signal
Very short notes can be filtered out by merging or deleting them by parameters controlled by the user.
Onset-based transcription:
Signal  Rate of change of pitched energy  Threshold  Onsets  Segmentation  Segment transcription  Set of pitches by segment
F0 frame by frame
estimation
F0 (in Hz)
Pulse-based transcription:
Signal  Energy fluctuations  Pulses  Beats and Tempo  Quantization  Quantized transcription  Notes (pitch and duration)
Piano roll
Note pitch detection
Note durations acquire musical meaning. Required if a music score is aimed as the final output, otherwise only a piano roll can be obtained.
Interactions:
Implemented or planned: onsets (add, remove, edit), pulses (modify beat and meter), notes (add, remove, edit), and harmony (chord segmentation).
Transcription
Music score
Operation diagram:
More accurate problem decomposition
(multimodal & interactive):
PROJECT
ANALYSIS
(information source)
INTERACTION
with
TRANSCRIPTION
based on
Onsets
Spectrogram
Frames
Physical
level
SIGNAL
Envelope
(amplitude)
F0 frame by frame
SCORES
Notes
(off-line)
New
Onsets
Music models
Note
onsets
Onsets
Text /
Harmony
Pulses
Musical
level
Pulses
Rhythm:
tempo + meter
off-line
melodic and harmonic
models
Note pitch detection
Tempo
Interface structure:
Meter
Play
Interaction
assistance
Menus
Tonality
Transcription
Audio
properties
Markers &
timing area
Audio signal area
Rhythm
properties
Structure overview
XML file
Tempo and
meter area
Transcription area:
piano roll / score
Keyboard
/ staves
reference
Text
properties
Interactions allowed
Chord
segmentation
area
Textual transcription area
Tonality
Screencast:
Raw (frame by frame)
transcription:
Rhythm
Spectrogram
Based just on harmonic energies in the
spectrogram.
Smoothed by a frame context.
Filtered by a length threshold (in frames).
Many short false psoitives and negatives.
Pitches in
frames
Onset-based transcription:
Spectrogram
Pitches
Onsets
Pulse-based transcription:
More information:
Spectrogram
Notes
At http://miprcv.iti.upv.es/ a video screencast and an on-line demo are available.
Warning:
This is a project in its very early stage, so there are many functionalities still not implemented and it
is far from being bug-free.
Pulses
Onsets impose a segmentation.
Only at onsets notes can change.
Times are still physical.
Transcription is much more
accurate.
Interaction with onsets affect
the transcription
A false negative is
corrected by the user:
Beat, tempo and meter are derived from
pulses.
Transcription is driven by them using a
division of beat.
Times are now musical.
Transcription is score-oriented.
Harmonic analysis (chord
segmentation) is provided
Trasncription is
recomputed with
the new onset
Changes are
propagated
This correction
solves other FN:
Acknowledgements:
This work is supported by the Consolider Ingenio 2010
research programme (project MIPRCV, CSD2007-00018),
the project DRIMS (TIN2009-14247-C02), and the
PASCAL2 Network of Excellence (IST-2007-216886).
The authors want to thanks to the people that is involved
in this project, specially those who do not appear as
Authors of this paper, like Carlos Pérez-Sancho, David Rizo,
Javier Sober, José Bernabeu, or Gabriel Meseguer.