Citation-based Text Classification

CS-TR-05-7

Authors:Minh Duc Cao, Xiaoying Gao and Mengjie Zhang
Source: GZipped PostScript (54kb); Adobe PDF (154kb)


Abstract: The paper describes a number of approaches to exploiting citation structures for scientific document classification. Three methods, linear labelling update, probabilistic labelling update and neural networks, are developed. The results show that those methods significantly improve classification in comparison with using only document contents. Even though linear labelling update is a static model, it performs well especially when few or no training documents available. Both probabilistic labelling update and neural networks methods require training documents to train a Bayesian network and a neural network, respectively. Experiments show that they outperform linearly labelling update when sufficiently large training set is used.


[Up to Computer Science Technical Report Archive: Home Page]