The Pochoir Parallel Stencil Compiler (To Appear)

Yuan Tang, Rezaul Alam Chowdhury, Bradley Kuszmaul, Chi-Keung Luk, and Charles Leiserson

Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2011), San Jose, California, , June 4-6, 2011

A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Efficient parallel cache-oblivious stencil algorithms are known, but ordinary programmers find them difficult to write. Pochoir provides a domain-specific stencil language embedded in C++ which the Pochoir compiler can translate into high-performing Cilk Plus code. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. Since the Pochoir language is embedded in C++, however, it can be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks demonstrates that Pochoir outperforms standard parallel-loop implementations on a variety of multicore machines, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient "trapezoidal decomposition" algorithms by making simultaneous parallel space cuts, which yields more parallelism for the same cache efficiency.

Download (copyright restrictions may apply): PSPDF