Innovating web page classification through reducing noise
文献类型:期刊论文
作者 | Li, XL; Shi, ZZ |
刊名 | JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
![]() |
出版日期 | 2002 |
卷号 | 17期号:1页码:9-17 |
关键词 | web page classification similarity measure classification algorithm without noise |
ISSN号 | 1000-9000 |
英文摘要 | This paper presents a new method that eliminates noise in Web page classification. It first describes the presentation of a Web page based on HTML tags. Then through a novel distance formula, it eliminates the noise in similarity measure. After carefully analyzing Web pages, we design an algorithm that can distinguish related hyperlinks from noisy ones. We can utilize non-noisy hyperlinks to improve the performance of Web page classification (the CAWN algorithm). For any page, we can classify it through the text and category of neighbor pages related to the page. The experimental results show that our approach improved classification accuracy. |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000173631200002 |
出版者 | SCIENCE CHINA PRESS |
源URL | [http://119.78.100.204/handle/2XEOYT63/13541] ![]() |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Li, XL |
作者单位 | 1.Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore 2.Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100080, Peoples R China |
推荐引用方式 GB/T 7714 | Li, XL,Shi, ZZ. Innovating web page classification through reducing noise[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,2002,17(1):9-17. |
APA | Li, XL,&Shi, ZZ.(2002).Innovating web page classification through reducing noise.JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY,17(1),9-17. |
MLA | Li, XL,et al."Innovating web page classification through reducing noise".JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 17.1(2002):9-17. |
入库方式: OAI收割
来源:计算技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。