中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
热门
一种Hadoop小文件存储和读取的方法

文献类型:期刊论文

作者张春明 ; 芮建武 ; 何婷婷
刊名计算机应用与软件
出版日期2012
卷号29期号:11页码:95-100
关键词HDFS 小文件 HIFM 分层索引 索引预加载 数据预取
ISSN号1000-386X
其他题名n approach for storing and accessing small files on hadoop
中文摘要HDFS(Hadoop Distributed File System)凭借其高容错、可伸缩和廉价存储的优点,在当前面向云计算的应用场景中得到了广泛应用。然而,HDFS设计的初衷是存储超大文件,对于海量小文件,由于NameNode内存开销等问题,其存储和读取性能并不理想。提出一种基于小文件合并的方法 HIFM(Hierarchy Index File Merging),综合考虑小文件之间的相关性和数据的目录结构,来辅助将小文件合并成大文件,并生成分层索引。采用集中存储和分布式存储相结合的方式管理索引文件,并实现索引文件预加载。此外,HIFM采用数据预取的机制,提高顺序访问小文件的效率。实验结果表明,HIFM方法能够有效提高小文件存储和读取效率,显著降低NameNode和DataNode的内存开销,适合应用在有一定目录结构的海量小文件存储的应用场合。
英文摘要Benefiting from its advantages of high fault-tolerance, scalability and low-cost storage capability, HDFS (Hadoop distributed file system) has been gaining widely application in current cloud computing-based applied scenes. However, HDFS is primarily designed for streaming access of ultra-large files and suffers the performance penalty in both storage and accessing while managing massive small files due to the memory overhead problem of NameNode. In this paper, an approach based on combining small files, called HIFM (hierarchy index file merging), is proposed. In it, the correlations between small files and the directory structure of data are comprehensively considered to assist the small files to be merged into large ones and to generate hierarchical index. Centralised storage and distributed storage methods are jointly used in index files management, and the preload of index files is implemented. Besides, in order to improve the efficiency of sequentially ?accessing? the small files, HIFM adopts data prefetching mechanism. Experimental results show that HIFM can improve the efficiency of ?storing? and accessing small files effectively, and mitigate the memory overhead of NameNode and DataNode obviously. It is suitable for the applications which have massive structured small files storage.
学科主题Computer Science (provided by Thomson Reuters)
收录类别CNKI ; WANFANG ; CSCD
资助信息新闻出版重大科技工程项目(0610-1041BJNF2328/23)|国家科技支撑计划课题(2011BAH14B02)|中国科学院知识创新工程方向性项目课题(KGCX2-YW-174)
语种中文
CSCD记录号CSCD:4690768
公开日期2013-09-17
源URL[http://ir.iscas.ac.cn/handle/311060/15297]  
专题软件研究所_软件所图书馆_期刊论文
推荐引用方式
GB/T 7714
张春明,芮建武,何婷婷. 一种Hadoop小文件存储和读取的方法[J]. 计算机应用与软件,2012,29(11):95-100.
APA 张春明,芮建武,&何婷婷.(2012).一种Hadoop小文件存储和读取的方法.计算机应用与软件,29(11),95-100.
MLA 张春明,et al."一种Hadoop小文件存储和读取的方法".计算机应用与软件 29.11(2012):95-100.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。