Authors: Mengjie Zhang, Xiaoying Gao, Weijun Lou
Source: GZipped PostScript (54kb); Adobe PDF (154kb)
This paper describes an approach to the use of citation links to improve the scientific paper classification performance. In this approach, we develop two refinement functions, a linear label refinement (LLR) and a probabilistic label refinement (PLR), to model the citation link structures of the scientific papers for refining the class labels of the documents obtained by the content-based Naive Bayes classification method. The approach with the two new refinement models is examined and compared with the content-based Naive Bayes method on a standard paper classification data set with increasing training set sizes. The results suggest that both refinement models can significantly improve the system performance over the content-based method for all the training set sizes and that PLR is better than LLR when the training examples are sufficient.
Keywords: Document Classification, Basian Networks, Citation Links