Investigation on Inter-Speaker Variability in The Feature Space
Download
Report
Transcript Investigation on Inter-Speaker Variability in The Feature Space
Investigation on Inter-Speaker
Variability in The Feature Space
Presenter : 陳彥達
Reference
R. Haeb-Umbach, “Investigation on InterSpeaker Variability in The Feature Space”,
ICASSP 99.
Outline
Introduction
A measure of inter-speaker variability
Vocal tract normalization
Cepstral mean and variance normalization
Introduction
Adaptation
Reduce mismatch by adapting feature vectors
or model parameters to the target
environment.
Introduction(2)
Normalization
Compute feature or model parameters that are
insensitive to undesired variations of the
speech signal.
Introduction(3)
Fisher discriminant analysis
An early assessment of a feature set without
running recognition first
The ratio of feature variability due to
different phonemes and due to different
speakers
A measure of
inter-speaker variability
Good feature vector space
Close together when belonging to the same
phoneme class
Separated from each other when belonging to
the different phoneme class
A measure of
inter-speaker variability(2)
x(t ) : cepstral feature vectors
1
r ,c
x(t )
: cepstral mean feature vector
N r ,c x (t )r , ph.c
1
mc
r ,c ; c 1,, C
: class mean vector
N c {r|r ph.c}
1 C
1 C
m r ,c N c mc : total mean vector
N c 1 {r|r ph.c}
N c 1
A measure of
inter-speaker variability(3)
r ,c
: cepstral mean
feature vector
mc
m
C
Nc
SB
(mc m)(mc m)T
c 1 N
: class mean vector
: total mean vector
: between class covariance
matrix
: within class
T
( r ,c mc )( r ,c mc ) covariance matrix
1 C
SW
N c1 {r|r ph.c}
A measure of
inter-speaker variability(4)
Fisher variate analysis
tr(S W1S B ) = the sum of the eigenvalues
of SW1S B
The radius of the scattering volume
1
Higher tr(S W S B )
lower recognition error rate
Vocal tract normalization
Reduce inter-speaker variability by a speakerspecific frequency warping
Differences in vocal tract length are compensated
for by a linear warping factor
f Hz (kf mel )
1
700(10
kf mel
2595
1)
Vocal tract normalization(2)
42 male + 42 female
42 male
Vocal tract normalization(3)
a normalization on a per sentence basis performs
better than a normalization on a per speaker basis
Cepstral mean and
variance normalization
xd (t ) xd (t )
yd (t )
; d 1, , D
ˆ d (t )
xd (t ) : input cepstral feature
xd (t ) : estimate of the mean of the input cepstral
feature
ˆ d (t ) : estimate of the standard deviation of the
input cepstral feature
yd (t ) : the mean and variance normalized feature
D : number of features
Cepstral mean and
variance normalization(2)
42 male + 42 female
42 male