中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Innovating web page classification through reducing noise

文献类型:期刊论文

作者Li, XL; Shi, ZZ
刊名JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
出版日期2002
卷号17期号:1页码:9-17
关键词web page classification similarity measure classification algorithm without noise
ISSN号1000-9000
英文摘要This paper presents a new method that eliminates noise in Web page classification. It first describes the presentation of a Web page based on HTML tags. Then through a novel distance formula, it eliminates the noise in similarity measure. After carefully analyzing Web pages, we design an algorithm that can distinguish related hyperlinks from noisy ones. We can utilize non-noisy hyperlinks to improve the performance of Web page classification (the CAWN algorithm). For any page, we can classify it through the text and category of neighbor pages related to the page. The experimental results show that our approach improved classification accuracy.
WOS研究方向Computer Science
语种英语
WOS记录号WOS:000173631200002
出版者SCIENCE CHINA PRESS
源URL[http://119.78.100.204/handle/2XEOYT63/13541]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Li, XL
作者单位1.Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100080, Peoples R China
推荐引用方式
GB/T 7714
Li, XL,Shi, ZZ. Innovating web page classification through reducing noise[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2002,17(1):9-17.
APA Li, XL,&Shi, ZZ.(2002).Innovating web page classification through reducing noise.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,17(1),9-17.
MLA Li, XL,et al."Innovating web page classification through reducing noise".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 17.1(2002):9-17.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。