Incremental fuzzy clustering for document categorization

Authors: J.-P. Mei, Y. Wang, L. Chen, and C. Miao
Abstract: Incremental clustering has been proposed to handle large datasets which can not fit into memory entirely. Single pass fuzzy c-means (SpFCM) and Online fuzzy c-means (OFCM) are two representative incremental fuzzy clustering methods. Both of them extend the scalability of fuzzy c-means (FCM) by processing the dataset chunk by chunk. However, due to the data sparsity and high-dimensionality, SpFCM and OFCM fail to produce reasonable results for document data. In this study, we work on clustering approaches that take care of both the large-scale and high-dimensionality issues. Specifically, we propose two methods for incrementally clustering of document data. The first method is a modification of the existing FCM-based incremental clustering with a step to normalize the centroids in each iteration, while the other method is incremental clustering, i.e., Single-Pass or Online, with weighted fuzzy co-clustering. We use several benchmark document datasets for experimental study. The experimental results show that the proposed approaches achieved significant improvements over existing SpFCM and OFCM in document clustering.
Keywords: Document handling; Fuzzy set theory; Pattern clustering
Conference Name: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’14)
Location: Beijing, China
Publisher: IEEE
Year: 2014
