中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection

文献类型:期刊论文

作者Lin, Liwei1,2; Wang, Xiangdong1; Liu, Hong1; Qian, Yueliang1
刊名IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
出版日期2020
卷号28页码:1466-1478
关键词Sound event detection (SED) machine learning weakly-supervised learning attention pooling
ISSN号2329-9290
DOI10.1109/TASLP.2020.2989575
英文摘要In this article, a special decision surface for the weakly-supervised sound event detection (SED) and a disentangled feature (DF) for the multi-label problem in polyphonic SED are proposed. We approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with a pooling module to solve it. General MIL approaches include two kinds: the instance-level approaches and embedding-level approaches. We present a method of generating instance-level probabilities for the embedding level approaches which tend to perform better than the instance-level approaches in terms of bag-level classification but can not provide instance-level probabilities in current approaches. Moreover, we further propose a specialized decision surface (SDS) for the embedding-level attention pooling. We analyze and explained why an embedding-level attention module with SDS is better than other typical pooling modules from the perspective of the high-level feature space. As for the problem of the unbalanced dataset and the co-occurrence of multiple categories in the polyphonic event detection task, we propose a DF to reduce interference among categories, which optimizes the high-level feature space by disentangling it based on class-wise identifiable information and obtaining multiple different subspaces. Experiments on the dataset of DCASE 2018 Task 4 show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by $\mathbf {6.6}$ percentage points.
资助项目Beijing Natural Science Foundation[4172058]
WOS研究方向Acoustics ; Engineering
语种英语
WOS记录号WOS:000538078300003
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
源URL[http://119.78.100.204/handle/2XEOYT63/15262]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Wang, Xiangdong
作者单位1.Chinese Acad Sci, Bejing Key Lab Mobile Comp & Pervas Device, Inst Comp Technol, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Lin, Liwei,Wang, Xiangdong,Liu, Hong,et al. Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2020,28:1466-1478.
APA Lin, Liwei,Wang, Xiangdong,Liu, Hong,&Qian, Yueliang.(2020).Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,28,1466-1478.
MLA Lin, Liwei,et al."Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 28(2020):1466-1478.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。