Smoothing LDA Model for Text Categorization
文献类型:会议论文
| 作者 | Li Wenbo ; Le Sun ; Yuanyong Feng ; Dakun Zhang |
| 出版日期 | 2008 |
| 会议名称 | 待定 |
| 会议日期 | 39766 |
| 会议地点 | Harbin,China |
| 关键词 | Text Categorization Latent Dirichlet Allocation Smoothing Graphical Model |
| 页码 | 83-94 |
| 中文摘要 | Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora. |
| 收录类别 | EI,ISTP |
| 会议录 | Lecture Notes in Computer Science
![]() |
| 会议录出版者 | 科学出版社 |
| 学科主题 | 固体力学 |
| 会议录出版地 | 北京 |
| 语种 | 英语 |
| ISSN号 | 1234-5678 |
| 源URL | [http://124.16.136.157/handle/311060/808] ![]() |
| 专题 | 软件研究所_基础软件国家工程研究中心_会议论文 |
| 推荐引用方式 GB/T 7714 | Li Wenbo,Le Sun,Yuanyong Feng,et al. Smoothing LDA Model for Text Categorization[C]. 见:待定. Harbin,China. 39766. |
入库方式: OAI收割
来源:软件研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。

