transcription - grfia - Universidad de Alicante
Download
Report
Transcript transcription - grfia - Universidad de Alicante
A Multimodal Music Transcription Prototype
First steps in an interactive prototype development
PROJEC
T
Tomás Pérez-García, José M. Iñesta, Pedro J. Ponce de León, Antonio Pertusa
Universidad de Alicante, Spain
Description and Retrieval of Music and Sound Information
Descripción y Recuperación de Información Musical y Sonora
What is automatic music transcription? Aim: This prototype is conceived as a research platform for
Transforming an audio signal of a music performance
in a symbolic representation (MIDI or score).
developing
and applying interactive and multimodal techniques to the monotimbral
transcription task.
Multimodality: it uses three different sources of information to detect notes in a musical
audio excerpt: signal, note onsets, and rhythm information.
State-of-the-art techniques are far from being accurate, specially in the case of
polyphony and multitimbral sounds. So nothing even close to 100% can be expected
User corrections are needed.
Problem decomposition (summary):
Interactive: Designed to make use of user feedback on onsets, beats, and notes in a left-toright validation approach:
a user interaction validates what remains at the left-hand side,
interactions are used to re-compute the rest of the output.
Transcription modes:
Frame-based transcription:
Spectrogram Frames Set of pitch candidates Selection by “salience” Smoothing in short context Set of pitches by frame
AUDIO
Signal
Very short notes can be filtered out by merging or deleting them by parameters controlled by the user.
Onset-based transcription:
Signal Rate of change of pitched energy Threshold Onsets Segmentation Segment transcription Set of pitches by segment
F0 frame by frame
estimation
F0 (in Hz)
Pulse-based transcription:
Signal Energy fluctuations Pulses Beats and Tempo Quantization Quantized transcription Notes (pitch and duration)
Piano roll
Note pitch detection
Note durations acquire musical meaning. Required if a music score is aimed as the final output, otherwise only a piano roll can be obtained.
Interactions:
Implemented or planned: onsets (add, remove, edit), pulses (modify beat and meter), notes (add, remove, edit), and harmony (chord segmentation).
Transcription
Music score
Operation diagram:
More accurate problem decomposition
(multimodal & interactive):
PROJECT
ANALYSIS
(information source)
INTERACTION
with
TRANSCRIPTION
based on
Onsets
Spectrogram
Frames
Physical
level
SIGNAL
Envelope
(amplitude)
F0 frame by frame
SCORES
Notes
(off-line)
New
Onsets
Music models
Note
onsets
Onsets
Text /
Harmony
Pulses
Musical
level
Pulses
Rhythm:
tempo + meter
off-line
melodic and harmonic
models
Note pitch detection
Tempo
Interface structure:
Meter
Play
Interaction
assistance
Menus
Tonality
Transcription
Audio
properties
Markers &
timing area
Audio signal area
Rhythm
properties
Structure overview
XML file
Tempo and
meter area
Transcription area:
piano roll / score
Keyboard
/ staves
reference
Text
properties
Interactions allowed
Chord
segmentation
area
Textual transcription area
Tonality
Screencast:
Raw (frame by frame)
transcription:
Rhythm
Spectrogram
Based just on harmonic energies in the
spectrogram.
Smoothed by a frame context.
Filtered by a length threshold (in frames).
Many short false psoitives and negatives.
Pitches in
frames
Onset-based transcription:
Spectrogram
Pitches
Onsets
Pulse-based transcription:
More information:
Spectrogram
Notes
At http://miprcv.iti.upv.es/ a video screencast and an on-line demo are available.
Warning:
This is a project in its very early stage, so there are many functionalities still not implemented and it
is far from being bug-free.
Pulses
Onsets impose a segmentation.
Only at onsets notes can change.
Times are still physical.
Transcription is much more
accurate.
Interaction with onsets affect
the transcription
A false negative is
corrected by the user:
Beat, tempo and meter are derived from
pulses.
Transcription is driven by them using a
division of beat.
Times are now musical.
Transcription is score-oriented.
Harmonic analysis (chord
segmentation) is provided
Trasncription is
recomputed with
the new onset
Changes are
propagated
This correction
solves other FN:
Acknowledgements:
This work is supported by the Consolider Ingenio 2010
research programme (project MIPRCV, CSD2007-00018),
the project DRIMS (TIN2009-14247-C02), and the
PASCAL2 Network of Excellence (IST-2007-216886).
The authors want to thanks to the people that is involved
in this project, specially those who do not appear as
Authors of this paper, like Carlos Pérez-Sancho, David Rizo,
Javier Sober, José Bernabeu, or Gabriel Meseguer.