中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Identifying language origin of named entity with multiple information sources

文献类型:期刊论文

作者You, Jia-Li1; Chen, Yi-Ning2; Chu, Min2; Soong, Frank K.2; Wang, Jin-Lin1
刊名Ieee transactions on audio speech and language processing
出版日期2008-08-01
卷号16期号:6页码:1077-1086
ISSN号1558-7916
关键词Language identification Named entity Web search
DOI10.1109/tasl.2008.2001110
通讯作者You, jia-li(youjiali@mails.gucas.ac.cn)
英文摘要To identify the language origin of a named entity, morphological information associated, with its letter spelling, such as letter n-grams, is commonly employed. however, with this information only, named entities with similar spellings but. from different language origins are difficult to differentiate. in this paper, a measure of "popularity," in terms of frequency or page count of the named entity in language-specific web search, is proposed for identifying its language origin. morphological information, including letter or letter-chunk n-grams, is used to enhance the performance of language identification in conjunction with web-based page counts. six languages, including english, german, french, portuguese, chinese, and japanese (chinese and japanese named entities are shown in their corresponding phonetic alphabets, i.e., pinyin and romaji), are tested. experiments show that when classifying four latin languages, including english, german, french, and portuguese, which are written in latin alphabets, features from different information sources yield substantial performance improvements in the classification accuracy over a letter 4-gram-based baseline system. the accuracy increases from 75.0% to 86.3%, or a 45.2% relative error reduction.
WOS研究方向Acoustics ; Engineering
WOS类目Acoustics ; Engineering, Electrical & Electronic
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:000258286800001
URI标识http://www.irgrid.ac.cn/handle/1471x/2392725
专题中国科学院大学
通讯作者You, Jia-Li
作者单位1.Chinese Acad Sci, Inst Acoust, Grad Sch, Beijing 100864, Peoples R China
2.Microsoft Res Asia, Beijing 100080, Peoples R China
推荐引用方式
GB/T 7714
You, Jia-Li,Chen, Yi-Ning,Chu, Min,et al. Identifying language origin of named entity with multiple information sources[J]. Ieee transactions on audio speech and language processing,2008,16(6):1077-1086.
APA You, Jia-Li,Chen, Yi-Ning,Chu, Min,Soong, Frank K.,&Wang, Jin-Lin.(2008).Identifying language origin of named entity with multiple information sources.Ieee transactions on audio speech and language processing,16(6),1077-1086.
MLA You, Jia-Li,et al."Identifying language origin of named entity with multiple information sources".Ieee transactions on audio speech and language processing 16.6(2008):1077-1086.

入库方式: iSwitch采集

来源:中国科学院大学

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。