The Algorithm Design Manual
About the Book
Programming Challenges

The Stony Brook Algorithm Repository

Steven Skiena
Stony Brook University
Dept. of Computer Science

1.7.5 Text Compression

Problem Input | Problem Output

INPUT                    OUTPUT

Input Description: A text string S.

Problem: A shortest text string S' such that S can be reconstructed from S'.

Excerpt from The Algorithm Design Manual: Secondary storage devices fill up quickly on every computer system, even though their capacity doubles each year. Decreasing storage prices have only increased interest in data compression, since there is now more data to compress than ever before. Data compression is the algorithmic problem of finding alternative, space-efficient encodings for a given data file. With the rise of computer networks, a new mission for data compression has arisen, that of increasing the effective bandwidth of networks by reducing the number of bits before transmission.

Data compression is a problem for which practical people like to invent ad hoc methods, designed for their particular applications. Sometimes these outperform general methods, but often they do not.


  • GZIP: GNU zip (C) (rating 10)
  • JPEG (C) (rating 3)
  • Netlib / TOMS -- Collected Algorithms of the ACM (FORTRAN) (rating 2)

  • Recommended Books

    Introduction to Algorithms by T. Cormen and C. Leiserson and R. Rivest and C. Stein Text Compression by Timothy C. Bell, Ian H. Witten, and John Cleary Introduction to Algorithms by U. Manber
    Data compression: methods and theory by J. Storer Data Structures and Algorithms by A. Aho and J. Hopcroft and J. Ullman Graph Algorithms by S. Even

    Related Links

  • Matt Mahoney's Data Compression Programs Comparison
  •'s Lossless data compression software benchmarks / comparisons

    Related Problems

    Discrete Fourier Transform
    Shortest Common Superstring

    This page last modified on 2008-07-10 .