Investigation on Inter-Speaker Variability in The Feature Space

Download Report

Transcript Investigation on Inter-Speaker Variability in The Feature Space

Investigation on Inter-Speaker
Variability in The Feature Space
Presenter : 陳彥達
Reference

R. Haeb-Umbach, “Investigation on InterSpeaker Variability in The Feature Space”,
ICASSP 99.
Outline




Introduction
A measure of inter-speaker variability
Vocal tract normalization
Cepstral mean and variance normalization
Introduction

Adaptation

Reduce mismatch by adapting feature vectors
or model parameters to the target
environment.

Introduction(2)
Normalization

Compute feature or model parameters that are
insensitive to undesired variations of the
speech signal.
Introduction(3)

Fisher discriminant analysis


An early assessment of a feature set without
running recognition first
The ratio of feature variability due to
different phonemes and due to different
speakers
A measure of
inter-speaker variability

Good feature vector space


Close together when belonging to the same
phoneme class
Separated from each other when belonging to
the different phoneme class
A measure of
inter-speaker variability(2)
x(t ) : cepstral feature vectors
1
 r ,c 
x(t )

: cepstral mean feature vector
N r ,c x (t )r , ph.c
1
mc 
 r ,c ; c  1,, C

: class mean vector
N c {r|r  ph.c}
1 C
1 C
m     r ,c   N c mc : total mean vector
N c 1 {r|r  ph.c}
N c 1
A measure of
inter-speaker variability(3)
 r ,c
: cepstral mean
feature vector
mc
m
C
Nc
SB  
(mc  m)(mc  m)T
c 1 N
: class mean vector
: total mean vector
: between class covariance
matrix
: within class
T
 ( r ,c  mc )( r ,c  mc ) covariance matrix
1 C
SW  
N c1 {r|r  ph.c}
A measure of
inter-speaker variability(4)

Fisher variate analysis



tr(S W1S B ) = the sum of the eigenvalues
of SW1S B
The radius of the scattering volume
1
Higher tr(S W S B )
lower recognition error rate
Vocal tract normalization
Reduce inter-speaker variability by a speakerspecific frequency warping
Differences in vocal tract length are compensated
for by a linear warping factor



f Hz (kf mel ) 
1

700(10
kf mel
2595
 1)
Vocal tract normalization(2)
42 male + 42 female
42 male
Vocal tract normalization(3)
a normalization on a per sentence basis performs
better than a normalization on a per speaker basis
Cepstral mean and
variance normalization
xd (t )  xd (t )
yd (t ) 
; d  1, , D
ˆ d (t )
xd (t ) : input cepstral feature
xd (t ) : estimate of the mean of the input cepstral
feature
ˆ d (t ) : estimate of the standard deviation of the
input cepstral feature
yd (t ) : the mean and variance normalized feature
D : number of features
Cepstral mean and
variance normalization(2)
42 male + 42 female
42 male