word combination kernel for text categorization
文献类型:期刊论文
作者 | Zhang Lujiang ; Hu Xiaohui ; Qin Shiyin |
刊名 | Journal of Digital Information Management
![]() |
出版日期 | 2012 |
卷号 | 10期号:3页码:202-211 |
关键词 | Algorithms Learning systems Support vector machines |
ISSN号 | 0972-7272 |
中文摘要 | We proposed a novel kernel for text categorization. This kernel is an inner product in the feature space generated by all word combinations of specified length. A word combination is a collection of different words co-occurring in the same sentence. The word combination of length k is weighted by the k-th root of the product of the inverse document frequencies (IDF) of its words. A computationally simple and efficient algorithm was proposed to calculate this kernel. By restricting the words of a word combination to the same sentence and considering multi-word combinations, the word combination features can capture similarity at a more specific level than single words. By discarding word order, the word combination features are more compatible with the flexibility of natural language and the dimensionality this kernel can be reduced significantly compared to the word-sequence kernel. We conducted a series of experiments on the Reuters-21578 dataset and 20 Newsgroups dataset. This kernel consistently achieves better performance than the classical word kernel and word-sequence kernel on the two datasets. We also assessed the impact of word combination length on performance and compared the computing efficiency of this kernel to those of the word kernel and word-sequence kernel. |
英文摘要 | We proposed a novel kernel for text categorization. This kernel is an inner product in the feature space generated by all word combinations of specified length. A word combination is a collection of different words co-occurring in the same sentence. The word combination of length k is weighted by the k-th root of the product of the inverse document frequencies (IDF) of its words. A computationally simple and efficient algorithm was proposed to calculate this kernel. By restricting the words of a word combination to the same sentence and considering multi-word combinations, the word combination features can capture similarity at a more specific level than single words. By discarding word order, the word combination features are more compatible with the flexibility of natural language and the dimensionality this kernel can be reduced significantly compared to the word-sequence kernel. We conducted a series of experiments on the Reuters-21578 dataset and 20 Newsgroups dataset. This kernel consistently achieves better performance than the classical word kernel and word-sequence kernel on the two datasets. We also assessed the impact of word combination length on performance and compared the computing efficiency of this kernel to those of the word kernel and word-sequence kernel. |
收录类别 | EI |
语种 | 英语 |
公开日期 | 2013-09-17 |
源URL | [http://ir.iscas.ac.cn/handle/311060/15034] ![]() |
专题 | 软件研究所_软件所图书馆_期刊论文 |
推荐引用方式 GB/T 7714 | Zhang Lujiang,Hu Xiaohui,Qin Shiyin. word combination kernel for text categorization[J]. Journal of Digital Information Management,2012,10(3):202-211. |
APA | Zhang Lujiang,Hu Xiaohui,&Qin Shiyin.(2012).word combination kernel for text categorization.Journal of Digital Information Management,10(3),202-211. |
MLA | Zhang Lujiang,et al."word combination kernel for text categorization".Journal of Digital Information Management 10.3(2012):202-211. |
入库方式: OAI收割
来源:软件研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。