面向媒体应用的处理器体系结构研究与设计
文献类型:学位论文
作者 | 刘鸿瑾 |
学位类别 | 博士 |
答辩日期 | 2008-05-30 |
授予单位 | 中国科学院声学研究所 |
授予地点 | 声学研究所 |
关键词 | 体系结构 媒体处理 设计空间搜索 数据通路 浮点乘法 DWT |
其他题名 | Research and Design of Microprocessor Architecture for Multimedia Application |
学位专业 | 信号与信息处理 |
中文摘要 | 媒体应用的迅猛发展对微处理器设计提出了新的要求和挑战:处理器必须具备强大的数据处理能力和控制能力,必须提供足够的灵活性来适应快速发展的多媒体处理标准和支持各种形式的多媒体应用。 为了在设计的早期完成处理器指令集体系结构的设计和验证,本文提出了基于体系结构描述语言的设计空间搜索技术,可以针对不同应用快速定制满足需求的处理器体系结构。本论文以SuperV3处理器设计为基础,采用体系结构空间搜索技术研究面向媒体应用的处理器体系结构,探索媒体处理增强的关键技术。 本文研究了多媒体处理算法的特点,综合运用VLIW和SIMD技术,开发了面向媒体应用的指令集,提高了数据处理能力,减少了媒体应用程序的指令空间;采用开放式总线结构,提高了数据处理带宽及数据供给能力;采用两个专门的地址运算单元,提供循环缓冲、位反序等多种寻址方式,增强了数据供应能力;采用数据排队缓冲提供存储器到寄存器组的高速数据传送;针对媒体应用中的关键算法,设计了加速运动估计的SAD指令等媒体专用指令,媒体处理性能得到了进一步提升。 为了提高媒体数据处理能力,本文研究了高性能浮点乘法器结构,提出了浮点乘法器中的快速舍入方法。快速舍入方法通过预测和选择来得到最终的尾数值,避免了传统舍入方法中大位宽加法器的使用,逻辑简单,硬件开销和关键路径延时显著减少。应用快速舍入方法设计的单、双精度浮点乘法器与采用传统舍入方法的浮点乘法器相比,性能提高20%左右,并且精度越高,性能提高的空间越大。 为了对图像压缩编码进行硬件加速,本文提出了一种基于提升算法的低功耗并行的二维离散小波变换VLSI结构。该结构行、列并行处理,不需要额外的缓存来存储中间变换系数;共享提升结构中的主要运算部件,同时处理两行数据,运算部件一直处于工作状态,硬件利用率接近100%;采用嵌入式边界扩展电路,减少了片上需要的缓存及对外部存储器的访问,有效地降低了整个设计的功耗。整个结构采用0.18μm CMOS工艺设计,关键路径延时为5.6ns。该模块可以作为IP嵌入到媒体处理器中对媒体应用进行硬件加速。 |
英文摘要 | With the rapid development of multimedia applications,the microprocessor designers are experiencing new challenges : Microprocessors must supply more powerful abilities of data processing and systematic control; the architecture should be more flexible and can support more kinds of media applications to meet the rapid development of multimedia standards. An accurate architecture with the parameters of power, area and performance achieved is greatly guidable for processors to be designed, which can reduce the iterative times and the modification when implemented later, can shorten the time to market, and can depress the cost. In order to design and verify the architectures of processors earlier, the architecture description language (ADL) is introduced in the dissertation, and ADL driven design space exploration methodologies (including software toolkit generation and exploration, generation of hardware implementation, top-down validation) are presented, that can customize the microprocessor architectures rapidly, aimed at different applications. The dissertation gropes for key techniques to improve the abilities of multimedia applications, based on the project of SuperV3 microprocessor designment. The ADL driven methodologies are adapted to research high performace microprocessor architectures for multimedia applications. The instruction set architecture (ISA) for media applications is presented in the dissertation, after algorithms and standards of multimedia, and the features of multimedia application programs are analyzed. The developed ISA can impove the abilities of processing multimedia applications greatly, increase the efficiencies of multimedia application programs, reduce the code size of multimedia application programs. Aimed at multimedia applications, the extensibility of SuperV3 architecture is reasearched in the disseration and the SAD instruction is designed to accelerate the computation of motion estimation in the CODEC, and other special mulimedia instructions are customized to support more multimedia applications. In order to improve the efficiency of data-path, the structures of high performance multipliers and floating-point multipliers are researched in the dissertation. And one kind of fast rounding method for floating-point multipliers is presented. The fast rounding method based on the prediction and selection can decrease the complexity of the rounding logic, reduce the hardware consumption and the critical path delay when implemented by prediction and rounding digits selection. The performance of the floating-point multipliers is improved greatly, when the fast-rounding method is applied to the floating-point multipliers. Specially, the higher the precision of the floating-point multipliers is, the greater the optimization of the performance is. In order to improve the image compressing performance of processors, the architectures of Discret Wavelet Transform are researched in the dissertation. And one kind of low-power memory-efficient lifting-based VLSI architecture for 2-D DWT is presented. The proposed architecture processes the row and column transforms simultaneously, eliminates the memory buffer for the column transform coefficients. The architecture can process two independent data streams together by using shared arithmetic functional blocks, the hardware utilization is speeded up to 100%. And the embedded boundary extension circuit is exploited to optimize the architecture. Compared to previous architectures, the proposed architecture has more efficiency on critical path, power consumption, temporal storage and hardware utilization. The proposed VLSI architecture pipelined with four-stage is implemented with 0.18um CMOS technology, and the critical timing path is 5.6 ns. The proposed VLSI architecture can be embedded into processors to accelerate the image compressing and encoding. |
语种 | 中文 |
公开日期 | 2011-05-07 |
页码 | 135 |
源URL | [http://159.226.59.140/handle/311008/316] ![]() |
专题 | 声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文 |
推荐引用方式 GB/T 7714 | 刘鸿瑾. 面向媒体应用的处理器体系结构研究与设计[D]. 声学研究所. 中国科学院声学研究所. 2008. |
入库方式: OAI收割
来源:声学研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。