Publications -> Journal Papers

PSDVec: A toolbox for incremental and scalable word embedding


Authors: S. Li, J. Zhu, and C. Miao
Title: PSDVec: A toolbox for incremental and scalable word embedding
Abstract: PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semi-definite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without re-learning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners.
Keywords: Word embedding; Matrix factorization; Incremental learning
Journal Name: Neurocomputing, vol. 237
Publisher: Elsevier
Year: 2017
Accepted PDF File: PSDVec_A_toolbox_for_incremental_and_scalable_word_embedding_accepted.pdf
Permanent Link: https://dx.doi.org/10.1016/j.neucom.2016.05.093
Reference: S. Li, J. Zhu, and C. Miao, “PSDVec: A toolbox for incremental and scalable word embedding,” Neurocomputing, vol. 237, pp. 405–409, May 2017.
bibtex: 
@article {LILY-j49,
    author 	= {Li, Shaohua and Zhu, Jun and Miao, Chunyan},
    title 	= {{PSDV}ec: A toolbox for incremental and scalable word embedding},
    journal 	= {Neurocomputing},
    year 	= {2017},
    month 	= {May},
    volume 	= {237},
    number 	= {},
    pages 	= {405-409},
    publisher 	= {Elsevier},
 }