|
作者 | ZHAO Yan
; SHI Hui
|
刊名 | chinese journal of library and information science
 |
出版日期 | 2012-12-25
|
卷号 | 5期号:4页码:77-92 |
关键词 | Chinese-English mixed documents
String matching
Accuracy of automatic indexing
Cybernetics
Dedicated hepatitis B virus (HBV) database
|
ISSN号 | 1674-3393
|
通讯作者 | yan zhao (e-mail: zhaoyan2000@shisu.edu.cn)
|
中文摘要 | purpose: the thrust of this paper is to present a method for improving the accuracy of automatic indexing of chinese-english mixed documents. design/methodology/approach: based on the inherent characteristics of chinese-english mixed texts and the cybernetics theory, we proposed an integrated control method for indexing documents. it consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of chinese-english mixed documents. an experiment was conducted to investigate the effect of our proposed method. findings: this method distinguishes chinese and english documents in grammatical structures and word formation rules. through the implementation of this method in the three phases of automatic indexing for the chinese-english mixed documents, the results were encouraging. the precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%. research limitations: the indexing method is relatively complicated and the whole indexing process requires substantial human intervention. due to pattern matching based on a bruteforce (bf) approach, the indexing efficiency has been reduced to some extent. practical implications: the research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to chinese-english mixed documents). the proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas. originality/value: so far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. this study will provide insights into the automatic indexing of multilingual documents, especially chinese-english mixed documents. |
英文摘要 | purpose: the thrust of this paper is to present a method for improving the accuracy of automatic indexing of chinese-english mixed documents. design/methodology/approach: based on the inherent characteristics of chinese-english mixed texts and the cybernetics theory, we proposed an integrated control method for indexing documents. it consists of "feed-forward control", "in-progress control" and "feed-back control", aiming at improving the accuracy of automatic indexing of chinese-english mixed documents. an experiment was conducted to investigate the effect of our proposed method. findings: this method distinguishes chinese and english documents in grammatical structures and word formation rules. through the implementation of this method in the three phases of automatic indexing for the chinese-english mixed documents, the results were encouraging. the precision increased from 88.54% to 97.10% and recall improved from 97.37% to 99.47%. research limitations: the indexing method is relatively complicated and the whole indexing process requires substantial human intervention. due to pattern matching based on a bruteforce (bf) approach, the indexing efficiency has been reduced to some extent. practical implications: the research is of both theoretical signifi cance and practical value in improving the accuracy of automatic indexing of multilingual documents (not confined to chinese-english mixed documents). the proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas. originality/value: so far, few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. this study will provide insights into the automatic indexing of multilingual documents, especially chinese-english mixed documents. |
学科主题 | 编辑出版
|
原文出处 | http://www.chinalibraries.net
|
公开日期 | 2012-12-11
|
源URL | [http://ir.las.ac.cn/handle/12502/5628]  |
专题 | 文献情报中心_Journal of Data and Information Science_Chinese Journal of Library and Information Science-2012
|
推荐引用方式 GB/T 7714 |
ZHAO Yan,SHI Hui. A method for improving the accuracy of automatic indexing of Chinese-English mixed documents[J]. chinese journal of library and information science,2012,5(4):77-92.
|
APA |
ZHAO Yan,&SHI Hui.(2012).A method for improving the accuracy of automatic indexing of Chinese-English mixed documents.chinese journal of library and information science,5(4),77-92.
|
MLA |
ZHAO Yan,et al."A method for improving the accuracy of automatic indexing of Chinese-English mixed documents".chinese journal of library and information science 5.4(2012):77-92.
|