中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications

文献类型:期刊论文

作者Qiuzi Zhang; Qikai Cheng; Yong Huang; Wei Lu
刊名journal of data and information science
出版日期2016-03-17
卷号9期号:1页码:69-85
关键词Data-usage statements extraction Information extraction Bootstrapping Unsupervised learning Academic text-mining
通讯作者wei lu (e-mail: weilu@whu.edu.cn).
中文摘要
purpose: our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

design/methodology/approach: the method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. in each iteration, new patterns are constructed and added to the pattern list based on their calculated score. three seed-selection strategies are also proposed in this paper.

findings: the performance of the method is verified by means of experiments on real data collected from computer science journals. the results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

research limitations: while the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. additional features that can address complex sentences should thus be explored in the future.

practical implications: data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

originality/value: to the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
英文摘要
purpose: our study proposes a bootstrapping-based method to automatically extract datausage statements from academic texts.

design/methodology/approach: the method for data-usage statements extraction starts with seed entities and iteratively learns patterns and data-usage statements from unlabeled text. in each iteration, new patterns are constructed and added to the pattern list based on their calculated score. three seed-selection strategies are also proposed in this paper.

findings: the performance of the method is verified by means of experiments on real data collected from computer science journals. the results show that the method can achieve satisfactory performance regarding precision of extraction and extensibility of obtained patterns.

research limitations: while the triple representation of sentences is effective and efficient for extracting data-usage statements, it is unable to handle complex sentences. additional features that can address complex sentences should thus be explored in the future.

practical implications: data-usage statements extraction is beneficial for data-repository construction and facilitates research on data-usage tracking, dataset-based scholar search, and dataset evaluation.

originality/value: to the best of our knowledge, this paper is among the first to address the important task of automatically extracting data-usage statements from real data.
学科主题新闻学与传播学 ; 图书馆、情报与文献学
收录类别其他
原文出处http://www.chinalibraries.net
语种英语
公开日期2016-03-29
源URL[http://ir.las.ac.cn/handle/12502/8479]  
专题文献情报中心_Journal of Data and Information Science_Journal of Data and Information Science-2016
作者单位School of Information Management, Wuhan University, Wuhan 430072, China
推荐引用方式
GB/T 7714
Qiuzi Zhang,Qikai Cheng,Yong Huang,et al. A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications[J]. journal of data and information science,2016,9(1):69-85.
APA Qiuzi Zhang,Qikai Cheng,Yong Huang,&Wei Lu.(2016).A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications.journal of data and information science,9(1),69-85.
MLA Qiuzi Zhang,et al."A Bootstrapping-based Method to Automatically Identify Data-usage Statements in Publications".journal of data and information science 9.1(2016):69-85.

入库方式: OAI收割

来源:文献情报中心

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。