中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers

文献类型:期刊论文

作者Fu, Tianjun2; Abbasi, Ahmed3; Zeng, Daniel1; Chen, Hsinchun4
刊名ACM TRANSACTIONS ON INFORMATION SYSTEMS
出版日期2012-11-01
卷号30期号:4
关键词Algorithms Experimentation Design Performance Web crawlers focused crawlers sentiment analysis opinion mining classification graph similarities random walk path
英文摘要Despite the increased prevalence of sentiment-related information on the Web, there has been limited work on focused crawlers capable of effectively collecting not only topic-relevant but also sentiment-relevant content. In this article, we propose a novel focused crawler that incorporates topic and sentiment information as well as a graph-based tunneling mechanism for enhanced collection of opinion-rich Web content regarding a particular topic. The graph-based sentiment (GBS) crawler uses a text classifier that employs both topic and sentiment categorization modules to assess the relevance of candidate pages. This information is also used to label nodes in web graphs that are employed by the tunneling mechanism to improve collection recall. Experimental results on two test beds revealed that GBS was able to provide better precision and recall than seven comparison crawlers. Moreover, GBS was able to collect a large proportion of the relevant content after traversing far fewer pages than comparison methods. GBS outperformed comparison methods on various categories of Web pages in the test beds, including collection of blogs, Web forums, and social networking Web site content. Further analysis revealed that both the sentiment classification module and graph-based tunneling mechanism played an integral role in the overall effectiveness of the GBS crawler.
WOS标题词Science & Technology ; Technology
类目[WOS]Computer Science, Information Systems
研究领域[WOS]Computer Science
关键词[WOS]GRAPH EDIT DISTANCE ; WEB ; CLASSIFICATION ; NETWORKS
收录类别SCI
语种英语
WOS记录号WOS:000312428900005
源URL[http://ir.ia.ac.cn/handle/173211/3598]  
专题自动化研究所_复杂系统管理与控制国家重点实验室_先进控制与自动化团队
作者单位1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
2.Google Inc, Mountain View, CA 94043 USA
3.Univ Virginia, Informat Technol Area, Charlottesville, VA 22904 USA
4.Univ Arizona, Dept Management Informat Syst, Tucson, AZ 85721 USA
推荐引用方式
GB/T 7714
Fu, Tianjun,Abbasi, Ahmed,Zeng, Daniel,et al. Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS,2012,30(4).
APA Fu, Tianjun,Abbasi, Ahmed,Zeng, Daniel,&Chen, Hsinchun.(2012).Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers.ACM TRANSACTIONS ON INFORMATION SYSTEMS,30(4).
MLA Fu, Tianjun,et al."Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers".ACM TRANSACTIONS ON INFORMATION SYSTEMS 30.4(2012).

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。