中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
基于多媒体数据的案件智能串并系统的研究与实现

文献类型:学位论文

作者吴迪
学位类别工程硕士
答辩日期2014-05-27
授予单位中国科学院大学
授予地点中国科学院自动化研究所
导师李子青
关键词案件串并 LDA模型 文本建模 图像检索 相似性 Case-Merging LDA model Text Modeling Image Retrieval Similarity
其他题名The Research and Implementation of Intelligent Case Merging System Based on Multimedia Data
学位专业计算机技术
中文摘要摘要 随着公安信息系统的不断建设,目前公安案件数据库已经积累了海量数据,包括文本、图像等等。传统的案件串并系统通常只能对单一文本类型数据进行串并,也无法分析数据潜在的关键信息。如何利用这些不同类型的数据用于案件内在的关联分析,对案件数据进行更深层更准确的挖掘,帮助公安业务人员快速高效的在海量案件库中找到相似的案件进行串并,成为了本文待解决的问题。 本文工作主要包括以下三个方面: 第一,详细介绍了案件串并系统中用到的关键技术,如文本预处理、文本建模、图像检索、主题识别等方法,并总结了这些关键技术的优缺点及相关领域的研究进展。 第二,将LDA(Latent Dirichlet Allocation)主题模型引入案件串并领域,对案件文本进行LDA建模,挖掘案件潜在的语义信息,提高案件串并质量。在此基础上利用图像检索算法,提出了一种融合LDA文本和图像信息的案件串并方法,提高串并结果的准确率。 第三,利用SharePoint和SQL Server 2013开发平台,集成上述算法开发出案件串并系统,并利用网络爬虫爬取案件数据,验证算法,形成应用供用户使用。 在案件数据上的实验结果表明,本文利用LDA主题模型进行案件串并的方法优于传统的词袋方法,准确率达到了72%,在融合了图像检索算法后其结果提升了1%-4%,证明了LDA主题模型算法以及融合算法在案件串并上都是合理有效的。最终开发完成的系统包括案件统计、案件串并、数据爬取和存储等,交互性良好,具备了完整系统的要素。
英文摘要Abstract With the construction of public security information system, the current public security database has accumulated massive data, including text, images and so on. The traditional case-merging systems are usually designed for text data only and can’t analyze potential information of case data. Investigators need to find helpful data in a large number of case data quickly and efficiently. How to use these different types of data to do latent semantic analysis and mining case data deeper and more accurately has become a problem to be solved. The main contribution of this paper includes the following four aspects: Firstly, this paper investigates the key technologies in the field of case-merging, such as text preprocessing, text modeling, image retrieval and topic recognition, and summarizes the advantages and disadvantages of these technologies and research progress. Secondly, this paper introduces the LDA model to the field of case-merging, which can help improve the quality of case-merging. On the basis of image retrieval algorithm, the author proposed a case-merging method of combining text and image information to improve the accuracy of case-merging. Lastly, the author use SharePoint Server and SQL Server to integrate these algorithms into one system and use web-spider to get case data to prove the validation of algorithm and allow users to use this system. The experimental results on the case data show that, the LDA model is better than the traditional bag-of-words method on case data, which achieves the accuracy of 72%. After the integration of image retrieval algorithm, the accuracy of results raises 1%-4%. The results prove our algorithm is effective on case data. The system we accomplished includes the functions of case statistics, case merging and data capture and can interact with users well.
语种中文
其他标识符2011E8009061001
源URL[http://ir.ia.ac.cn/handle/173211/7729]  
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
吴迪. 基于多媒体数据的案件智能串并系统的研究与实现[D]. 中国科学院自动化研究所. 中国科学院大学. 2014.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。