Efficient Cache-oblivious String Algorithms for Bioinformatics

Rezaul Alam Chowdhury, Hai-Son Le, and Vijaya Ramachandran

(under review)


We present theoretical and experimental results on cache-efficient and parallel algorithms for some well-studied string problems in bioinformatics: global pairwise sequence alignment and median (both with affine gap costs), and RNA secondary structure prediction with simple pseudoknots. For each problem we present cache-oblivious algorithms that match the best-known time complexity, match or improve the best-known space complexity, improve significantly over the cache-efficiency of earlier algorithms, and have efficient parallel implementations.

We present experimental results that show that these cache-oblivious algorithms run significantly faster than currently available software.

Our methods are applicable to several other problems including local alignment, generalized global alignment with intermittent similarities, multiple sequence alignment under several scoring functions such as ‘sum-of-pairs’ objective function and RNA secondary structure prediction with simple pseudoknots using energy functions based on adjacent base pairs.

Download: PSPDF