Eidetic Design - Uni

Download Report

Transcript Eidetic Design - Uni

Rethinking Algorithm Design and Development
in Speech Processing
T. Stadelmann, Y.Wang, M. Smith, R. Ewerth, and B. Freisleben
Universities of Marburg and Hannover,
Germany
Problem statement
Eidetic Design
What to do if algorithms do not behave as expected?
•Reimplementation does not reach published results
•Adaptation to new data & problem does not work
•Implementation does not show what theory suggests
•Other disciplines naturally gain intuition via visualization
•But visualization is not enough – it is just one possible
transformation to the data in order to perceive meaning
due to natural human abilities
How to select competing techniques and parameters?
•Effect of particular choice on hole process unclear
•Effect of specific parameter combination unknown
How to arrive at a promising hypothesis?
•Conceptualize a method like “know your data”
from data mining – for speech processing
•Create methodology for making failure s in complex
speech processing algorithms graspable by humans
Use intuition – but how?
Instead: recast algorithmic sub-results…
•…to the specific perceptual domain in which humans are
experts in intuitively grasping the context, the character
and the reasons of the issue at hand
•I.e., visualization, audibilization, “perceptualization”, …
Implement a culture of perceptually motivated
speech research [Hill, 2007]
•Motivate the use of intuition beyond visualization
•Facilitate its use by conceptualizing a workflow
•Enable the use intuition by providing free tools
1. existing algorithm/process
2. unexpected outcome
=> question/problem
step 1
step 2
…
methodology
step n
data 1
result
prerequisites
Proposed workflow
data
data 2
…
=?
data n
3. generate data from intermediate results
4. find suitable domain
5. use transformation tool
&
intuition
suitable domain
=
Case study
Initial question: why does MFCC+GMM not work reliably for speaker
clustering whereas it does for speaker identification?
•Algorithm: MFCC extraction and GMM building algorithm
•Problem: techniques seem not expressive enough for the more difficult task
=> where is the bottleneck?
•Data: MFCC matrix, GMM parameter vectors
•Suitable domain: features and models originate from auditory domain
=> resynthesize to domain of auditory perception to hear if they include
what makes up a voice
Available
tools
•WebVoice: resynthesize speechand speaker features and models
•PlotGMM: plot Gaussian mixture
models
•Visit http://www.informatik.uni-
marburg.de/~stadelmann/eidetic.html
Result: found bottleneck in missing time coherence information in GMM,
improved DER by 56% in experiment w/ prototyp [Stadelmann et al. 2009]
ICPR‘2010 - 20th International Conference on Pattern Recognition, 23.-26. August, Istanbul, Turkey