关联规则挖掘及其在化工生产中的应用
文献类型:学位论文
作者 | 秦丽君 |
学位类别 | 硕士 |
答辩日期 | 2011-06-02 |
授予单位 | 中国科学院研究生院 |
授予地点 | 北京 |
导师 | 王宏安 |
关键词 | 计算机应用 关联规则 加权关联规则 加权频繁项集 信息熵 合成氨 |
学位专业 | 计算机应用技术 |
中文摘要 | 化工生产中产生大量的过程数据,这些数据蕴含着生产优化、质量管理、过程控制的相关信息。已有的关联规则挖掘应用领域涉及到商业、金融、电信、零售等行业,这些行业的过程以事务处理为背景,这些事务本身比较简单,目的也比较明确,其关联规则挖掘方法不适用于化工生产过程中数据维数多、变量间耦合和非线性等特点。本文将关联规则挖掘应用到化工行业中,挖掘出化工行业中的关联规则从而指导生产、提高生产质量和效率等。 本文以化工生产过程数据分析为研究背景,根据化工生产过程复杂的特点,提出了适用于化工生产的过程变量权值计算方法和关联规则挖掘算法,实现了一个关联规则挖掘原型系统,并成功应用于化工生产中,给出了合成氨产量降低的原因。 论文的主要工作包括: l 提出了一种基于信息熵的权值计算方法。历史数据中蕴含了有关生产规律的信息,利用信息熵理论计算变量的信息增益,得到各过程变量对目标产品的影响程度,即各变量的权值,使其满足化工生产中维数多、变量间耦合和非线性的特点,符合化工生产实际。 l 提出了一种基于动态项集计数的加权频繁项集算法。该算法通过将基于信息熵得到的过程变量权值引入加权频繁项集的算法中,使挖掘出的加权频繁项集满足化工生产中维数多、变量间耦合和非线性的特点。大量实验表明我们的算法比经典算法具有较高的性能。 l 设计并开发了一个适用于化工生产的关联规则挖掘原型系统。以影响合成氨产量降低的原因分析为例,详细介绍化工生产过程中合成氨生产关联规则挖掘的过程,验证本文研究成果的正确性及其应用价值。 |
英文摘要 | Chemical industry production processes have accumulated large amounts of data, which contains a number of parameter optimization, product quality and production management-related information. Applications of association rules have been involved in business, finance, telecommunication, retail and other industries, in which transaction processing is of great importance. However, in these industries, transactions are relatively simple, the purpose is relatively clear, and the mining method cannot be applied directly to the complex chemical production processes which have high dimensions and nonlinear characteristics. In this paper, we aim to apply association rule mining in chemical industry, mine the association rules to guide the production and improve production quality and efficiency. This paper proposes a weight calculation method and an association rule mining algorithm which are suitable for chemical production, designs and implements a prototype system and takes synthesis ammonia as an application example to verify the proposed method and algorithm. Our contributions lie in the following three aspects: 1. A weight calculation method based on information entropy is proposed for calculating the weight of process variables. Historical data contains the production pattern and the relationship between the variables. We calculate the information gain of variables using information entropy and historical data, and get the weight of the variables by computing its influence on the target variable. The weights are consistent with the actual chemical industry production. The method meets the characteristics that the data of chemical industry production is high dimensional and nonlinear. 2. We propose a weighted frequent itemset algorithm: Weighted frequent itemset mining based on dynamic itemset counting. The algorithm mines weighted frequent itemsets based on dynamic itemset counting, where the weights are calculated using the weight calculation method based on information entropy to meet the features of high dimensions and nonlinear characteristics in the chemical industry. Experiments show that the algorithm has high efficiency and performance than the classic algorithms. 3. We implement a prototype system for mining association rules of chemical industry. We verify the correctness and application value of our study by applying the association rule mining to the production process of synthesis ammonia. Experiments show that our method is useful to mine the causes leading to low production of synthesis ammonia. |
学科主题 | 计算机应用 |
公开日期 | 2011-06-10 |
源URL | [http://124.16.136.157/handle/311060/10434] ![]() |
专题 | 软件研究所_人机交互技术与智能信息处理实验室_学位论文 |
推荐引用方式 GB/T 7714 | 秦丽君. 关联规则挖掘及其在化工生产中的应用[D]. 北京. 中国科学院研究生院. 2011. |
入库方式: OAI收割
来源:软件研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。