基于关系数据库插件的化学结构数据库
文献类型:学位论文
作者 | 王玉玲 |
学位类别 | 硕士 |
答辩日期 | 2011-05-26 |
授予单位 | 中国科学院研究生院 |
导师 | 温浩 |
关键词 | 化学结构数据库 子结构检索 关系数据库插件 预筛选 |
其他题名 | Chemical Structure Database Based on the Relational Database Cartridge |
学位专业 | 应用化学 |
中文摘要 | 化学结构数据库的建立与应用历来广受研究人员重视,现已成为化学各学科研究的基本支撑平台。本文使用关系数据库插件技术设计、构建了化学结构数据库,并对化学结构信息的存储、子结构检索效率进行了研究。本文在数据库管理系统Oracle 11g的环境下,安装OrChem和Bingo关系数据库插件,以PubChem Compound SDF文件为数据源,设计化学结构数据库化合物基本结构信息表,建立了包含约40万种化合物的化学结构数据库;通过使用JDBC技术实现远程数据库B/S访问,并测试使用ROWID方法实现高效的分页查询。对关系数据库插件OrChem和Bingo表征和存储化合物二维结构信息的方式进行了对比。测试结果表明,对于包含40万种化合物的化学结构数据库,存储Molfile,Bingo比OrChem总的存储空间节省了32.5%;对于Bingo,采用SMILES和Binary时总的存储空间比Molfile节省了81.3%和78.3%;同时在功能上,Bingo还支持三维结构检索和高亮显示子结构,以及包含化学结构共振形式、互变异构体特征的多条件查询。采用关系数据库插件能够实现分子指纹的生成、建立索引和化合物子结构检索。本文从分子指纹的构成和索引策略两方面讨论了OrChem和Bingo的主要差异,并选取10个特征化合物进行子结构检索测试。对存储40万种化合物的化学结构数据库的测试结果显示,OrChem可满足用户检索响应,Bingo则更为准确和快捷。对于存储2600万种化合物的化学结构数据库,针对Bingo通过优化Oracle数据库内存管理、数据表结构、子结构预筛选参数,显著提高了化合物子结构检索的效率。 |
英文摘要 | Chemical structure database is of great value to chemical researchers and has become the basic platform of research. In this paper, we built the chemical structure database based on the relational database cartridges, and conducted a study on the storage and substructure retrieval efficiency of it. OrChem and Bingo cartridges were used in this work to construct the chemical structure database by employing PubChem organic compound Molfile of 400 thousand compounds in the Oracle 11g environment. JDBC technology was applied to realize the remote database visiting B/S system. And, paging display of database query result was improved by ROWID SQL method. The storage of two-dimensional structure information of OrChem and Bingo were compared. It was found that Bingo improved the storage efficiency up to 32.5% of MOL format compared to OrChem, to 81.3% of SMILES format and to 78.3% of Binary format compared to MOL format in the chemical structure database containing 400 thousand compounds. Three-dimensional structure retrieval and highlighting substructure functions were supported by Bingo. And, it also supported multi-condition combined query with compound structure resonance and tautomer forms. Fingerprint and index strategy for two-dimensional substructure searching applied by OrChem and Bingo were discussed. The efficiency of OrChem and Bingo on substructure searching was tested by using 10 typical query structures on the chemical structure databases of 400 thousand compounds. Either OrChem or Bingo operated well enough for practical service, while Bingo shows higher efficiency and accuracy. After Oracle memory management, table structure and substructure pre-screening parameters settings were configured, Bingo operated successfully with higher substructure retrieval efficiency on the database of 26 million compounds. |
语种 | 中文 |
公开日期 | 2013-09-23 |
页码 | 63 |
源URL | [http://ir.ipe.ac.cn/handle/122111/1697] ![]() |
专题 | 过程工程研究所_研究所(批量导入) |
推荐引用方式 GB/T 7714 | 王玉玲. 基于关系数据库插件的化学结构数据库[D]. 中国科学院研究生院. 2011. |
入库方式: OAI收割
来源:过程工程研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。