Publications -> Conference Papers

Spoofing Speech Detection using Temporal Convolutional Neural Network


Authors: X. Tian, X. Xiao, E. S. Chng, and H. Li
Title: Spoofing Speech Detection using Temporal Convolutional Neural Network
Abstract: Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based spoofing attacks. In this work, we propose to use a temporal convolutional neural network (CNN) based classifier for spoofing speech detection. The temporal CNN first convolves the feature trajectories with a set of filters, then extract the maximum responses of these filters within a time window using a max-pooling layer. Due to the use of max-pooling, we can extract useful information from a long temporal span without concatenating a large number of neighbouring frames, as in feedforward deep neural network (DNN). Five types of feature are employed to access the performance of proposed classifier. Experimental results on ASVspoof 2015 corpus show that the temporal CNN based classifier is effective for synthetic speech detection. Specifically, the proposed method brings a significant performance boost for the unit selection based spoofing speech detection.
Keywords: 
Conference Name: 8th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA’16)
Location: Jeju, South Korea
Publisher: IEEE
Year: 2016
Accepted PDF File: Spoofing_Speech_Detection_using_Temporal_Convolutional_Neural_Network_accepted.pdf
Permanent Link: https://dx.doi.org/10.1109/APSIPA.2016.7820738
Reference: X. Tian, X. Xiao, E. S. Chng, and H. Li, “Spoofing speech detection using temporal convolutional neural network,” in Proceedings of the 8th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA’16). IEEE, December 2016, pp. 1–6.
bibtex: 
@inproceedings{LILY-c108, 
    author	= {Tian, Xiaohai and Xiao, Xiong and Chng, Eng Siong and Li, Haizhou},
    title	= {Spoofing Speech Detection using Temporal Convolutional Neural Network},  
    booktitle	= {Proceedings of the 8th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA'16)}, 
    year		= {2016}, 
    month	= {December}, 
    pages	= {1-6}, 
    location	= {Jeju, South Korea},
    publisher	= {IEEE},
 }