Provably Good Multicore Cache Performance for Divide-and-Conquer Algorithms

Provably Good Multicore Cache Performance for Divide-and-Conquer Algorithms

Guy E. Blelloch, Rezaul Alam Chowdhury, Phillip B. Gibbons, Vijaya Ramachandran, Shimin Chen, and Michael Kozuch
Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2008), San Francisco, California, pp. 501-510, 2008

Abstract

This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-and-conquer algorithms and present a new on-line scheduler, CONTROLLED-PDF, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L1 and L2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptotically-optimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.

Download (copyright restrictions may apply): PS, PDF

<— BACK