中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
博客标签推荐系统相关问题的研究

文献类型:学位论文

作者刘一岑
学位类别工学博士
答辩日期2010-05-30
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师杨青
关键词博客标签推荐系统 推荐系统 标签系统 标签特征 推荐算法 系统增量更新 Tag Recommender System for Weblogs Recommender System Tagging System Tag Feature Recommendation Algorithms System Incremental Update
其他题名Research on Tag Recommender System for Weblogs
学位专业模式识别与智能系统
中文摘要随着用户自创内容的快速增长,越来越多的Web2.0网站都使用标签管理用户发布内容,标签推荐系统的研究也得到了广泛的关注。博客作为一个以文本内容为主要信息的网络新兴媒体,其标签推荐系统也与传统的推荐系统有所不同。近年来,微博客的迅速流行,对标签推荐系统的实时性和规模性也提出了更高的要求。本文围绕大规模数据下实时博客标签推荐系统的设计与实现展开讨论,从标签推荐的四个实际问题出发,详细讨论了博客标签推荐系统在实时推荐处理、高效标签推荐、大规模数据处理以及增量更新的解决方案,并给出系统的整体架构及设计思想,实现了一个博客标签推荐系统的原型。本文的主要贡献如下: 1.提出了一个基于标签特征的推荐算法,用于实现博客标签的实时推荐。本文根据博客标签在博客正文中的出现情况以及标记情况定义了不同类型的博客标签,以不同标签在博客正文中的出现规律分别定义了标签的文本特征和语法特征,将标签推荐问题简化为一个分类问题,提出基于标签特征的推荐算法,并实现了基于Naive Bayes分类和基于Logistic回归的实时标签推荐算法。 2.将基于邻域的潜语义模型(SVD++)引入博客标签推荐系统,提升标签推荐性能。本文首先定义了博客作者对标签的正负评价,并以此构造描述用户历史标记行为的用户-标签矩阵,同时,引入协同过滤推荐系统中的SVD++模型完成对博客作者和博客标签的参数化向量表示,最后提出融合用户标记行为的标签推荐算法,实现对推荐算法性能的提升。 3. 提出解决大规模博客数据下标签推荐问题的算法架构。本文先用马尔可夫图聚类(MCL)算法对标签进行聚类,然后利用标签-博客无向图将标签的聚类信息传播到博客数据中实现对博客聚类,通过在每个数据子集上实现一个标签推荐算法,将对博客的标签推荐分解到各个子数据集完成,从而解决大规模数据上的标签推荐问题。 4. 提出对系统数据集的增量更新算法。本文通过递归扩充系统中受更新影响的博客和标签,更新其类别信息,完成对系统数据集的增量更新,并将系统的增量更新和对数据集的分割统一在同一个算法框架之下,从而将数据的分割问题与更新问题统一起来,使得系统在更新代价和更新质量之间达到平衡。
英文摘要With the rapid growth of the user-created content, more and more Web2.0 sites have used tags to manage their content, and many researchers have paid more attention on tag recommender system. Weblog, which is a new network media, contains text messages as its main content, and its tag recommender system is different from the traditional recommender systems. In recent years, the popularity of micro-blog such as Twitter puts forward higher requirements on the real-time and large-scale tag recommendation. The thesis designs a real-time tag recommender system on a large-scale data set, and focuses on four practical problems of tag recommendation, which are the real-time recommendation, the high performance, the large-scale data set process and the system incremental update. Finally, a tag recommender system for weblogs is implemented. The major contributions of this thesis are as follows: 1.We propose a tag feature based recommendation algorithm to implement the real-time tag recommendation. In the thesis, we define different types of the tags and their text feature and grammar feature by their occurrence and tagging situation in a blog post, and then the tag recommendation can be solved by the classification algorithm. Finally, a tag feature based real-time recommendation algorithm is implemented by naive bayes classification and logistic regression. 2.We introduce the neighborhood-based latent factor model (SVD++) into the weblog tag recommender system to improve the performance. In the thesis, we firstly define the positive and negative evaluation of tags, which are used to construct a Author-Tag matrix. Then the SVD++ model is applied for the decomposition of the Author-Tag matrix, and each weblog author is represented as a parameterized vector to describe the tagging behavior, so is each tag. Lastly, the weblog tag recommendation algorithm with SVD++ model is implemented to improve the recommendation performance. 3.We propose a tag recommendation algorithm architecture on a large-scale weblog data set. In the thesis, Markov cluster (MCL) algorithm is used for the tag clustering firstly. In addition, the category of the tags are propagated to the blog posts on the Tag-Weblog undirected graph so that the weblog data set is partitioned into several data subset. Finally, the tag recommendation can be solved on a data subset by implementing a recommendation algorithm on each data subset. 4.We propose an incremental update algorithm for system data set. ...
语种中文
其他标识符200718014628053
源URL[http://ir.ia.ac.cn/handle/173211/6264]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
刘一岑. 博客标签推荐系统相关问题的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2010.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。