Publications -> Conference Papers

Correlation-based Frequency Warping for Voice Conversion


Authors: X. Tian, Z. Wu, S. W. Lee, and E. S. Chng
Title: Correlation-based Frequency Warping for Voice Conversion
Abstract: Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nevertheless, speaker timbre and identity greatly depend on vocal tract peaks and valleys of spectrum. In this paper, we propose a method to define the warping function by maximizing the correlation between the converted and target spectra. Different from the conventional warping methods, the correlation-based optimization is not determined by the magnitude of the spectra. Instead, both spectral peaks and valleys are considered in the optimization process, which also improves the performance of amplitude scaling. Experiments were conducted on VOICES database, and the results show that after amplitude scaling our proposed method reduced the mel-spectral distortion from 5.85 dB to 5.60 dB. The subjective listening tests also confirmed the effectiveness of the proposed method.
Keywords: Speech synthesis; Voice conversion; Frequency warping; Correlation
Conference Name: 9th International Symposium on Chinese Spoken Language Processing (ISCSLP'14)
Location: Singapore, Singapore
Publisher: IEEE
Year: 2014
Accepted PDF File: Correlation-based_Frequency_Warping_for_Voice_Conversion_accepted.pdf
Permanent Link: http://dx.doi.org/10.1109/ISCSLP.2014.6936725
Reference: X. Tian, Z. Wu, S. W. Lee, and E. S. Chng, “Correlation-based frequency warping for voice conversion,” in Proceedings of the 9th International Symposium on Chinese Spoken Language Processing (ISCSLP’14). IEEE, September 2014, pp. 211–215.
bibtex: 
@inproceedings{LILY-c29, 
   author	= {Tian, Xiaohai and Wu, Zhizheng and Lee, Siu Wa and Chng, Eng Siong}, 
   title	= {Correlation-based Frequency Warping for Voice Conversion}, 
   booktitle	= {Proceedings of the 9th International Symposium on Chinese Spoken Language Processing (ISCSLP'14)}, 
   year		= {2014}, 
   month	= {September}, 
   pages	= {211-215}, 
   location 	= {Singapore, Singapore},
   publisher	= {IEEE},
}