Identifying language origin of named entity with multiple information sources
文献类型:期刊论文
作者 | You, Jia-Li1; Chen, Yi-Ning2; Chu, Min2; Soong, Frank K.2; Wang, Jin-Lin1 |
刊名 | Ieee transactions on audio speech and language processing |
出版日期 | 2008-08-01 |
卷号 | 16期号:6页码:1077-1086 |
ISSN号 | 1558-7916 |
关键词 | Language identification Named entity Web search |
DOI | 10.1109/tasl.2008.2001110 |
通讯作者 | You, jia-li(youjiali@mails.gucas.ac.cn) |
英文摘要 | To identify the language origin of a named entity, morphological information associated, with its letter spelling, such as letter n-grams, is commonly employed. however, with this information only, named entities with similar spellings but. from different language origins are difficult to differentiate. in this paper, a measure of "popularity," in terms of frequency or page count of the named entity in language-specific web search, is proposed for identifying its language origin. morphological information, including letter or letter-chunk n-grams, is used to enhance the performance of language identification in conjunction with web-based page counts. six languages, including english, german, french, portuguese, chinese, and japanese (chinese and japanese named entities are shown in their corresponding phonetic alphabets, i.e., pinyin and romaji), are tested. experiments show that when classifying four latin languages, including english, german, french, and portuguese, which are written in latin alphabets, features from different information sources yield substantial performance improvements in the classification accuracy over a letter 4-gram-based baseline system. the accuracy increases from 75.0% to 86.3%, or a 45.2% relative error reduction. |
WOS研究方向 | Acoustics ; Engineering |
WOS类目 | Acoustics ; Engineering, Electrical & Electronic |
语种 | 英语 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
WOS记录号 | WOS:000258286800001 |
URI标识 | http://www.irgrid.ac.cn/handle/1471x/2392725 |
专题 | 中国科学院大学 |
通讯作者 | You, Jia-Li |
作者单位 | 1.Chinese Acad Sci, Inst Acoust, Grad Sch, Beijing 100864, Peoples R China 2.Microsoft Res Asia, Beijing 100080, Peoples R China |
推荐引用方式 GB/T 7714 | You, Jia-Li,Chen, Yi-Ning,Chu, Min,et al. Identifying language origin of named entity with multiple information sources[J]. Ieee transactions on audio speech and language processing,2008,16(6):1077-1086. |
APA | You, Jia-Li,Chen, Yi-Ning,Chu, Min,Soong, Frank K.,&Wang, Jin-Lin.(2008).Identifying language origin of named entity with multiple information sources.Ieee transactions on audio speech and language processing,16(6),1077-1086. |
MLA | You, Jia-Li,et al."Identifying language origin of named entity with multiple information sources".Ieee transactions on audio speech and language processing 16.6(2008):1077-1086. |
入库方式: iSwitch采集
来源:中国科学院大学
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。