A fine-tuned multimodal large model for power defect image-text question-answering
文献类型:期刊论文
作者 | Wang, Qiqi1; Zhang, Jie2; Du, Jianming2; Zhang, Ke3; Li, Rui2; Zhao, Feng4; Zou, Le1; Xie, Chengjun2![]() |
刊名 | SIGNAL IMAGE AND VIDEO PROCESSING
![]() |
出版日期 | 2024-09-28 |
关键词 | Power system Defect detection Multimodal large model LoRA Q-former |
ISSN号 | 1863-1703 |
DOI | 10.1007/s11760-024-03539-w |
通讯作者 | Zhang, Jie(zhangjie@iim.ac.cn) ; Xie, Chengjun(cjxie@iim.ac.cn) |
英文摘要 | In power defect detection, the complexity of scenes and the diversity of defects pose challenges for manual defect identification. Considering these issues, this paper proposes utilizing a multimodal large model to assist power professionals in identifying power scenes and defects through image-text interactions, thereby enhancing work efficiency. This paper presents a fine-tuned multimodal large model for power defect image-text question-answering, addressing challenges such as training difficulties and the lack of image-text knowledge specific to power defects. This paper utilizes the YOLOv8 to create a dataset for multimodal power defect detection, enriching the image-text information in the power defect domain. By integrating the LoRA and Q-Former methods for model fine-tuning, the algorithm enhances the extraction of visual and semantic features and aligns visual and semantic information. The experimental results demonstrate that the proposed multimodal large model significantly outperforms other popular multimodal models in the domain of power defect question-answering. |
资助项目 | Major Science and Technology Project of Anhui Province[202203a05020023] ; Anhui Provincial Natural Science Foundation[2108085UD12] |
WOS研究方向 | Engineering ; Imaging Science & Photographic Technology |
语种 | 英语 |
WOS记录号 | WOS:001320983900001 |
出版者 | SPRINGER LONDON LTD |
资助机构 | Major Science and Technology Project of Anhui Province ; Anhui Provincial Natural Science Foundation |
源URL | [http://ir.hfcas.ac.cn:8080/handle/334002/135622] ![]() |
专题 | 中国科学院合肥物质科学研究院 |
通讯作者 | Zhang, Jie; Xie, Chengjun |
作者单位 | 1.Hefei Univ, Sch Artificial Intelligence & Big Data, Hefei 230601, Peoples R China 2.Chinese Acad Sci, Inst Intelligent Machines, Hefei Inst Phys Sci, Hefei 230031, Peoples R China 3.Anhui Nari Jiyuan Power Grid Technol Co Ltd, Hefei 230088, Peoples R China 4.Univ Sci & Technol China, Hefei 230026, Peoples R China |
推荐引用方式 GB/T 7714 | Wang, Qiqi,Zhang, Jie,Du, Jianming,et al. A fine-tuned multimodal large model for power defect image-text question-answering[J]. SIGNAL IMAGE AND VIDEO PROCESSING,2024. |
APA | Wang, Qiqi.,Zhang, Jie.,Du, Jianming.,Zhang, Ke.,Li, Rui.,...&Xie, Chengjun.(2024).A fine-tuned multimodal large model for power defect image-text question-answering.SIGNAL IMAGE AND VIDEO PROCESSING. |
MLA | Wang, Qiqi,et al."A fine-tuned multimodal large model for power defect image-text question-answering".SIGNAL IMAGE AND VIDEO PROCESSING (2024). |
入库方式: OAI收割
来源:合肥物质科学研究院
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。