Publications -> Conference Papers

Sparse Representation for Frequency Warping Based Voice Conversion


Authors: X. Tian, Z. Wu, S.-W. Lee, Q. H. Nguyen, E. S. Chng, and M. Dong
Title: Sparse Representation for Frequency Warping Based Voice Conversion
Abstract: This paper presents a sparse representation framework for weighted frequency warping based voice conversion. In this method, a frame-dependent warping function and the corresponding spectral residual vector are first calculated for each source-target spectrum pair. At runtime conversion, a source spectrum is factorised as a linear combination of a set of source spectra in the training data. The linear combination weight matrix, which is constrained to be sparse, is used to interpolate the frame-dependent warping functions and spectral residual vectors. In this way, the proposed method not only avoids the statistical averaging caused by GMM but also preserves the high-resolution spectral details for high-quality converted speech. Experiments are conducted on the VOICES database. Both objective and subjective results confirmed the effectiveness of the proposed method. In particular, the spectral distortion dropped from 5.55 dB of the conventional frequency warping approach to 5.0 dB of the proposed method. Compare to the state-of-the-art GMM-based conversion with global variance (GV) enhancement, our method achieved 68.5 % in an AB preference test.
Keywords: Voice conversion; Frequency warping; Sparse representation; Exemplar; Residual compensation
Conference Name: 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15)
Location: Brisbane, Australia
Publisher: IEEE
Year: 2015
Accepted PDF File: Sparse_Representation_for_Frequency_Warping_Based_Voice_Conversion_accepted.pdf
Permanent Link: http://dx.doi.org/10.1109/ICASSP.2015.7178769
Reference: X. Tian, Z. Wu, S.-W. Lee, Q. H. Nguyen, E. S. Chng, and M. Dong, “Sparse representation for frequency warping based voice conversion,” in Proceedings of the 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, April 2015, pp. 4235–4239.
bibtex: 
@inproceedings{LILY-c33, 
   author	= {Tian, Xiaohai and Wu, Zhizheng and Lee, Siu-Wa and Nguyen, Quy Hy and Chng, Eng Siong and Dong, Minghui},
   title	= {Sparse Representation for Frequency Warping Based Voice Conversion},  
   booktitle	= {Proceedings of the 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'15)}, 
   year		= {2015}, 
   month	= {April}, 
   pages	= {4235-4239}, 
   location	= {Brisbane, Australia},
   publisher	= {IEEE},
}