Works by Masataka Goto
Download
Report
Transcript Works by Masataka Goto
Works by Masataka Goto
Dr. Masataka Goto (* The photo is taken from Goto’s Home page)
The National Institute of Advanced Industrial Science and
Technology (AIST)
Home page: http://staff.aist.go.jp/m.goto/
Presented by Beinan Li, Music Tech @ McGill, 2005-2-10
Content
Goto’s personal info
MIR / Music understanding
Speech Interface
Speech Completion
Speech Spotter
Interactive music system
Real-time Beat Tracking System for Musical Acoustic Signals
Real-time F0 Estimation of Melody and Bass Lines in Musical Audio Signals
SmartMusicKIOSK: Music Listening Station with Chorus-Search Function
A Distributed Cooperative System to Play MIDI Instruments
Interactive Performance of a Music-controlled CG Dancer
VirJa Session (A Virtual Jazz Session System)
Music database
Masataka Goto
A researcher working at the National Institute of
Advanced Industrial Science and Technology (AIST), a
newborn Japanese public research organization (15 former)
A researcher of Precursory Research for Embryonic
Science and Technology (PRESTO) ("Information and
Human Activity" research area), Japan Science and
Technology Corporation (JST)
Doctor degree from Waseda University, 1998.
Research interests.
Real-time Beat Tracking System (2001)
(Next)
Can recognize a hierarchical beat structure
(quarter-note, half-note, and measure levels ) in
real-world audio signals sampled from popularmusic compact discs.
With or without drums
Time-signature 4/4 ; tempo is roughly constant
Using selected musical knowledge (heuristics)
Succeeded in 43 out of 45 songs
Real-time Beat Tracking System
Main issues of beat tracking from acoustic signal:
detecting beat-tracking cues in audio signals
interpreting the cues to infer the beat structure
dealing with the ambiguity of interpretation
Cues:
Onset times of different frequency ranges
Chord-change possibilities based on provisional time strips
Drum patterns for Bass/Snare drums
Quantitative rhythmic difficulty: Power transition
Multi-agent based hypothesis evaluation
Real-time Beat Tracking System
Chord-change possibility, from dominant frequency by histogram
peak within a period of time. (Picture taken from Goto, 2001)
Selectively Used Musical Knowledge
Onset time:
(a-1) “A frequent inter-onset interval is likely to be the inter-beat interval.”
(a-2) “Onset times tend to coincide with beat times (i.e.,
sounds are likely to occur on beats).”
Chord change:
(b-1) “Chords are more likely to change on beat times than
on other positions.”
(b-2) “….on half-note times than on other positions of beat
times.”
(b-3) “….at the beginnings of measures than at other positions of half-note
times.”
Drum pattern: (re-evaluate hypothesis)
(c-1) “The beginning of the input drum pattern indicates a half-note time.”
(c-2) “The input drum pattern has the appropriate inter-beat interval.”
Real-time F0 Estimation of Melody and Bass Lines
(2004) (Next)
Music Scene Description based on subsymbolic
representation
Find a predominant harmonic structure instead of a single
fundamental frequency (within a restricted range).
Melody lines: by a voice or a single-tone mid-range
instrument; Bass lines: by a bass guitar or contrabass
The average detection rate: 88.4% for the melody line and
79.9% for the bass
Real-time F0 Estimation of Melody and Bass Lines
Main problem:
Which F0 (in polyphonic) -> melody / bass ?
Unknown number of sound sources.
Select from several candidates.
Assumptions:
Melody / bass have a harmonic structure, regardless F0
Melody / bass have a frequency range for most predominant
harmonic structure (“MPHS”)
Melody / bass line have temporally continuous trajectories (F0),
during a musical note.
Real-time F0 Estimation of Melody and Bass Lines
Method:
Limit the frequency range:
melody : middle- and high-frequency regions
Bass: low frequency
whether the F0 is within the limited range or not.
Find the MPHS and its F0
View the observed frequency components as a weighted mixture of all
possible harmonic-structure tone models without assuming the number of
sound sources
Deal with ambiguity
Considers candidates’ temporal continuity and selects the most
dominant and stable trajectory of the F0
Music Listening Station with Chorus-Search
Function (2004)
Music-playback interface for trial listening and general
music selecting / sampling.
Function for jumping to the chorus section
Visualizing song structure.
(Picture taken from Goto’s home page)
Speech Completion (2002)
Helps the user recall uncertain phrases and saves labor
when the input phrase is long.
Based on the phenomenon:
Human hesitates by lengthening a vowel (a filled pause is
uttered): e.g. “Er…”
Displays completion candidates acoustically resemble the
uttered fragment for user to choose.
Filled pause: small fundamental frequency (voice pitch)
transitions and small spectral envelope deformations.
Vocabulary tree, HMM-based speech recognizer .
English with Japanese accent? (vowel -> consonant)
Speech Spotter (2004)
Allow user to enter voice commands into a speech
recognizer in the midst of natural human-human
conversation.
Filled-pause / High-pitch detection based (voice cue).
On-demand information system for assisting humanhuman conversation (e.g. weather inquiry during talk)
Music-playback system for enriching telephone
conversation (i.e. BGM judebox)
A Distributed Cooperative System to Play
MIDI Instruments (2002)
Remote
Music Control Protocol (RMCP)
Extension
of MIDI. Network symbolized multimedia
information transmission.
UDP / IP, client-server communication
Ethernet / Internet
Information sharing by broadcast and time scheduling
using time stamps
Interactive Performance of a Musiccontrolled CG Dancer (1997)
CG
character to enhance musician communication
in real jam session via visual attention.
A successful CG dance depends on interactions
between each musician and CG character.
E.g. If the guitarist plays, CGC does not move unless the
drummer determines the motion timing
VirJa Session (A Virtual Jazz Session
System) (1999)
Enable
distributed computer players to listen to
other computer players' performances as well as
human's performance and to interact with each
other.
On top of the last two techniques.
RWC Music Database (2002- )
(back)
RWC (Real World Computing) Music Database
Copyright-cleared
Common foundation for research. Benchmark.
Built by the RWC Music Database Sub-Working Group (Goto as
the chair) of the Real World Computing Partnership (RWCP) of
Japan.
World's first large-scale music database specifically for research
purposes.
Six original collections, 315 pieces with:
original audio signals
MIDI files,
text files of lyrics
individual sounds at half-tone intervals
variations of playing styles, dynamics, etc.
References
Goto’s
home page:
http://staff.aist.go.jp/m.goto/
Masataka Goto: A Real-time Music-scene-description System:
Predominant-F0 Estimation for Detecting Melody and Bass Lines in
Real-world Audio Signals, Speech Communication (ISCA Journal),
Vol.43, No.4, pp.311-329, 2004.
Masataka Goto: An Audio-based Real-time Beat Tracking System for
Music With or Without Drum-sounds, Journal of New Music
Research, Vol.30, No.2, pp.159-171, June 2001.