中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
A fine-tuned multimodal large model for power defect image-text question-answering

文献类型:期刊论文

作者Wang, Qiqi1; Zhang, Jie2; Du, Jianming2; Zhang, Ke3; Li, Rui2; Zhao, Feng4; Zou, Le1; Xie, Chengjun2
刊名SIGNAL IMAGE AND VIDEO PROCESSING
出版日期2024-09-28
关键词Power system Defect detection Multimodal large model LoRA Q-former
ISSN号1863-1703
DOI10.1007/s11760-024-03539-w
通讯作者Zhang, Jie(zhangjie@iim.ac.cn) ; Xie, Chengjun(cjxie@iim.ac.cn)
英文摘要In power defect detection, the complexity of scenes and the diversity of defects pose challenges for manual defect identification. Considering these issues, this paper proposes utilizing a multimodal large model to assist power professionals in identifying power scenes and defects through image-text interactions, thereby enhancing work efficiency. This paper presents a fine-tuned multimodal large model for power defect image-text question-answering, addressing challenges such as training difficulties and the lack of image-text knowledge specific to power defects. This paper utilizes the YOLOv8 to create a dataset for multimodal power defect detection, enriching the image-text information in the power defect domain. By integrating the LoRA and Q-Former methods for model fine-tuning, the algorithm enhances the extraction of visual and semantic features and aligns visual and semantic information. The experimental results demonstrate that the proposed multimodal large model significantly outperforms other popular multimodal models in the domain of power defect question-answering.
资助项目Major Science and Technology Project of Anhui Province[202203a05020023] ; Anhui Provincial Natural Science Foundation[2108085UD12]
WOS研究方向Engineering ; Imaging Science & Photographic Technology
语种英语
WOS记录号WOS:001320983900001
出版者SPRINGER LONDON LTD
资助机构Major Science and Technology Project of Anhui Province ; Anhui Provincial Natural Science Foundation
源URL[http://ir.hfcas.ac.cn:8080/handle/334002/135622]  
专题中国科学院合肥物质科学研究院
通讯作者Zhang, Jie; Xie, Chengjun
作者单位1.Hefei Univ, Sch Artificial Intelligence & Big Data, Hefei 230601, Peoples R China
2.Chinese Acad Sci, Inst Intelligent Machines, Hefei Inst Phys Sci, Hefei 230031, Peoples R China
3.Anhui Nari Jiyuan Power Grid Technol Co Ltd, Hefei 230088, Peoples R China
4.Univ Sci & Technol China, Hefei 230026, Peoples R China
推荐引用方式
GB/T 7714
Wang, Qiqi,Zhang, Jie,Du, Jianming,et al. A fine-tuned multimodal large model for power defect image-text question-answering[J]. SIGNAL IMAGE AND VIDEO PROCESSING,2024.
APA Wang, Qiqi.,Zhang, Jie.,Du, Jianming.,Zhang, Ke.,Li, Rui.,...&Xie, Chengjun.(2024).A fine-tuned multimodal large model for power defect image-text question-answering.SIGNAL IMAGE AND VIDEO PROCESSING.
MLA Wang, Qiqi,et al."A fine-tuned multimodal large model for power defect image-text question-answering".SIGNAL IMAGE AND VIDEO PROCESSING (2024).

入库方式: OAI收割

来源:合肥物质科学研究院

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。