中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Smoothing LDA Model for Text Categorization

文献类型:会议论文

作者Li Wenbo ; Le Sun ; Yuanyong Feng ; Dakun Zhang
出版日期2008
会议名称待定
会议日期39766
会议地点Harbin,China
关键词Text Categorization Latent Dirichlet Allocation Smoothing Graphical Model
页码83-94
中文摘要Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.
收录类别EI,ISTP
会议录Lecture Notes in Computer Science
会议录出版者科学出版社
学科主题固体力学
会议录出版地北京
语种英语
ISSN号1234-5678
源URL[http://124.16.136.157/handle/311060/808]  
专题软件研究所_基础软件国家工程研究中心_会议论文
推荐引用方式
GB/T 7714
Li Wenbo,Le Sun,Yuanyong Feng,et al. Smoothing LDA Model for Text Categorization[C]. 见:待定. Harbin,China. 39766.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。