中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
基于HDFS的数据交换共享平台的设计与实现

文献类型:学位论文

作者罗后启
学位类别硕士
答辩日期2011-06-01
授予单位中国科学院研究生院
授予地点北京
导师范国闯
关键词计算机应用::计算机信息管理系统
学位专业计算机软件与理论
中文摘要在企业、政府中存在大量不同时期、由不同厂商、在不同平台上建设而成的系统,由于缺少统一规划和标准,彼此之间很难实现信息共享,形成了大量孤岛式的业务应用系统。因此,如何在各个信息系统之间建立统一、规范的接口,实现对分布、独立、异构数据的交换和共享,已经成为新型信息化应用的主要工作重点。 数据交换共享平台的产生正是针对上述需求,它基于统一的中间件平台,通过提供前置节点代理部署在应用系统上实现数据抽取、转换,并将数据传输到数据共享中心,由数据共享中心对分散的数据进行统一存储、管理、分发。数据交换共享平台在应用中主要呈现出星形结构的部署方式和交换数据类型多样化的特点,由于有众多的节点要和数据共享中心进行大量的数据交换,这给其在数据吞吐量和可靠性方面带来了巨大挑战。为满足数据交换共享平台大数据量存储和多连接并发数据传输的需求,本文提出了一个基于HDFS的架构。在该架构中,数据交换过程被分解成元数据交换和数据文件交换两个过程,通过将数据交换请求分流到集群中的各个存储服务器上,实现数据文件的分布式、可靠存储。同时,针对数据交换共享平台的应用场景,本文还使用了基于数据访问热度的动态数据副本管理技术,动态调整热点数据的副本数,减少热点数据交换的消耗时间;面向小文件的索引优化机制,提高小文件交换效率;数据交换故障恢复机制,使得数据交换的可靠性和效率得到提高。最后,论文给出了HDFS数据交换共享平台的设计与实现,并进行了相关实验验证了该系统的实际性能。
英文摘要

In the enterprise, there are a lot of information systems that built by various vendors on heterogeneous platforms. Due to the lack of unified planning and standards, data sharing and exchanging between them is really a troublesome work. Therefore, how to establish uniform, standard interface to archive data exchange and sharing within distribution, independent, heterogeneous systems is becoming a hot issue.

Data exchange and sharing platform is a solution for this problem, it is based on the unified middleware platform, realizing heterogeneous data exchanging and sharing by providing client API and front lead switching nodes deployed in the application system. In a majority of cases, data exchange platform is star topology deployed and the data type varied acutely from the different application domain. In this scenarios, a large number of nodes need to exchange huge volume data with central node directly or indirectly, this pose a significant challenge to the central node’s throughput and reliability.

To improve the throughput and reliability, this paper proposes a HDFS based data exchange and sharing architecture. In this model, data exchange process is broken down into the meta-data exchange and data exchange two processes, this mechanism can broke data exchange requests to the storage cluster which provide distributed and reliable data storage. Meanwhile, to acclimatize the application scenarios, this paper propose a heat based replica management model, which dynamically adjust the number of hot replica to reduce hot data exchange time; a small file index optimization to improve the efficiency of small file storage; a reliable data exchange mechanism that handle the client and server-side failure.

The design and implementation of the HDFS data exchange and sharing platform is given at the end of this thesis, and we carry out an experiment to verify the system's actual performance.
语种中文
公开日期2011-06-09
源URL[http://124.16.136.157/handle/311060/10228]  
专题软件研究所_软件工程技术研究开发中心 _学位论文
推荐引用方式
GB/T 7714
罗后启. 基于HDFS的数据交换共享平台的设计与实现[D]. 北京. 中国科学院研究生院. 2011.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。