中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
juicer: scalable extraction for thread meta-information of web forum

文献类型:会议论文

作者Guo Yan ; Wang Yu ; Ding Guodong ; Cao Donglin ; Zhang Gang ; Lv Yi
出版日期2009
会议名称Pacific Asia Workshop on Intelligence and Security Informatics, PAISI 2009
会议日期April 27,
会议地点Bangkok, Thailand
关键词Mining
页码143-148
英文摘要In Web forum, thread meta-information contained in list-ofthread of board page provide fundamental data for the further forum mining. This paper describes a complete system named Juicer which was developed as a subsystem for an industrial application that involves forum mining. The task of Juicer is to extract thread meta-information from board pages of a great many of large scale online Web forums, which implies that scalable extraction is required with high accuracy and speed, and minimal user effort for maintenance. Among so many existed approaches about information extraction, we can not find any approach to fully satisfy the requirements, so we present simple scalable extraction approach behind Juicer to achieve the goal. Juicer is constituted by four modules: Template generation, Specifying labeling setting, Automatic extraction, Label assignment. Both experiments and practice show that Juicer successfully satisfied the requirements.
收录类别EI,ACM
会议录Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
会议录出版地Germany
语种英语
ISSN号3029743
ISBN号9783642013928
源URL[http://124.16.136.157/handle/311060/8478]  
专题软件研究所_软件所图书馆_2009年期刊/会议论文
推荐引用方式
GB/T 7714
Guo Yan,Wang Yu,Ding Guodong,et al. juicer: scalable extraction for thread meta-information of web forum[C]. 见:Pacific Asia Workshop on Intelligence and Security Informatics, PAISI 2009. Bangkok, Thailand. April 27,.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。