Stacking More Linear Operations with Orthogonal Regularization to Learn Better
文献类型:会议论文
作者 | Xu WX(许伟翔)1,2![]() ![]() |
出版日期 | 2022-03 |
会议日期 | 2022-7 |
会议地点 | 线上会议 |
英文摘要 | How to improve the generalization of CNN models has been a long-lasting problem in the deep learning community. This paper presents a runtime parameter/FLOPs-free method to strengthen CNN models by stacking linear convolution operations during training. We show that overparameterization with appropriate regularization can lead to a smooth optimization landscape that improves the performance. Concretely, we propose to add a 1×1 convolutional layer before and after the original k × k convolutional layer respectively, without any non-linear activations between them. In addition, QuasiOrthogonal Regularization is proposed to maintain the added 1 × 1 filters as orthogonal matrixes. After training, those two 1 × 1 layers can be fused into the original k × k layer without changing the original network architecture, leaving no extra computations at inference, i.e. parameter/FLOPs-free. |
语种 | 英语 |
源URL | [http://ir.ia.ac.cn/handle/173211/52091] ![]() |
专题 | 类脑芯片与系统研究 |
作者单位 | 1.中国科学院大学 2.中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Xu WX,Cheng J. Stacking More Linear Operations with Orthogonal Regularization to Learn Better[C]. 见:. 线上会议. 2022-7. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。