Maybe any compression algorithm experts can help me here, but one thing I don't understand about pucrunch is that it appears to implement two different methods of compression: run-length encoding and LZ77. But I was under the impression that RLE could be implemented by LZ77 anyway, so this seems wasteful.
As I understand it, the basis of LZ77 is that it's possible to insert a token at any point in the bitstream which contains two elements: a number of characters back in the bitstream, and a length. So if you have the string:
Code:
Mellow yellow
...this could be represented in LZ77 like this:
Code:
Mellow y{-7,5}
But if you only step back one character, and put a longer length, this effectively does RLE for free:
Code:
I'm freeeee!
becomes
Code:
I'm fre{-1,4}!
So have I missed something here? Is the RLE compression of pucrunch doing something different, or is it wasting code and compression ratio? It seems to me that you could probably write a better compression/decompression routine if you weren't wasting bits to specify whether a token was RLE or LZ77, not to mention having to implement different code to decompress each type.