accelerating viola-jones facce detection algorithm on gpus
文献类型:会议论文
作者 | Jia Haipeng ; Zhang Yunquan ; Wang Weiyan ; Jia Haipeng ; Xu Jianliang |
出版日期 | 2012 |
会议名称 | IEEE 14th International Conference on High Performance Computing and Communications (HPCC) / IEEE 9th International Conference on Embedded Software and Systems (ICESS) |
会议日期 | JUN 25-27, 2012 |
会议地点 | Liverpool, ENGLAND |
关键词 | Viola-Jones Imbalanced Computation Persistent Threads Local Queues Global Queues |
页码 | 396-403 |
中文摘要 | The Viola-Jones face detection algorithm represents a class of parallel algorithms that both memory accesses and work distributions are irregular, thereby hard to obtain high performance on GPUs. Furthermore, conventional GPU programming wisdom usually guides us on how to optimize data parallel workloads with regular inputs and outputs. While how to efficiently write task-level parallelism programs with irregular workloads have not much material to reference. In this paper, we present an OpenCL-implementation of Viola-Jones face detection algorithm with high performance on both NVIDIA and AMD GPUs through five main techniques: warp size work granularity, persistent threads, Uberkernel, local and global queues. We also demonstrate the high performance of our implementation by comparing it with a well-optimized CPU version from OpenCV library. Experiment results show that the speedup reaches up to 5.193 similar to 35.08 times (16.91 on average) and 5.85 similar to 32.641 times (17.535 on average) on AMD and NVIDIA GPU respectively. |
英文摘要 | The Viola-Jones face detection algorithm represents a class of parallel algorithms that both memory accesses and work distributions are irregular, thereby hard to obtain high performance on GPUs. Furthermore, conventional GPU programming wisdom usually guides us on how to optimize data parallel workloads with regular inputs and outputs. While how to efficiently write task-level parallelism programs with irregular workloads have not much material to reference. In this paper, we present an OpenCL-implementation of Viola-Jones face detection algorithm with high performance on both NVIDIA and AMD GPUs through five main techniques: warp size work granularity, persistent threads, Uberkernel, local and global queues. We also demonstrate the high performance of our implementation by comparing it with a well-optimized CPU version from OpenCV library. Experiment results show that the speedup reaches up to 5.193 similar to 35.08 times (16.91 on average) and 5.85 similar to 32.641 times (17.535 on average) on AMD and NVIDIA GPU respectively. |
收录类别 | ISTP ; EI |
会议主办者 | IEEE, IEEE Comp Soc, Univ Bradford, IEEE Tech Comm Scalable Comp (TCSC) |
会议录 | Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
![]() |
语种 | 英语 |
ISBN号 | 978-0-7695-4749-7 |
源URL | [http://ir.iscas.ac.cn/handle/311060/15807] ![]() |
专题 | 软件研究所_软件所图书馆_会议论文 |
推荐引用方式 GB/T 7714 | Jia Haipeng,Zhang Yunquan,Wang Weiyan,et al. accelerating viola-jones facce detection algorithm on gpus[C]. 见:IEEE 14th International Conference on High Performance Computing and Communications (HPCC) / IEEE 9th International Conference on Embedded Software and Systems (ICESS). Liverpool, ENGLAND. JUN 25-27, 2012. |
入库方式: OAI收割
来源:软件研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。