-3

Most of the Data Compression Algorithms are based on 'Pattern'. But I'm looking for a Data Compression Algorithm which is not based on 'Pattern'

1 Answers1

3

The answer to your question is pretty much "no". The reasoning is complicated but I'll try to explain it:

The simplest way to define a "(lossless) Data Compression Algorithm" is as a function that can transform a sequence of bytes into a new sequence of bytes in a reversible way, such that the new byte sequence will usually be shorter than the original.

The word 'usually' is in there, because there is no algorithm that can compress every possible file. Because compression has to be reversible, every different input file must map to a different output file. For any given length N, there are only so many files of length N or less. Therefore, if a compressor maps any input file that is longer than N to an output file that is N bytes or shorter, then it must also map a shorter file to one that is longer than N, because there just aren't enough possible shorter outputs to compress them all.

So, at its best, a compression algorithm is a permutation of files. It can't compress every file. It can't compress "random" files, because the output of the permutation would still be random.

The question then because "how can these compressors possibly work?" They work by trying to assign the most likely input files to the shortest output files, so that on average the output will be shorter than the input. It's like it has a great big list of all files in probability order, which it just matches against a list of all files in length order.

In order to do that, the compressor needs to have some model of which files are more likely to be used. LZ-based compressors basically assume that files we use in real life tend to have more repeated strings than random data. Input files with more repeated strings are therefore assigned to shorter output files than files with no repetition. Huffman and Arithmetic compressors assume that files tend to have skewed distributions of input symbols.

So every compressor essentially has a probability model -- a pattern that it expects files to match more often than not. Files that match the pattern well compress well, and files that don't do not.

Matt Timmermans
  • 36,921
  • 2
  • 27
  • 59