Say, I have a number of strings which are quite similar but no absolutely identical.
They can differ more or less, but similarity can be seen by the naked eye.
All lengths are equal, each is 256 bytes. The total number of strings is less than 2^16.
What would be the best compression method for such case?
UPDATE (data format):
I can't share the data but I can describe it quite close to reality:
Imagine the notation (like LOGO language) which is the sequence of commands for some device for moving and drawing on plane. Such as:
U12 - move up 12 steps
D64 - move down 64 steps
C78 - change drawing color to 78
P1 - pen down (start drawing)
and so on.
The whole vocabulary of this language doesn't exceed the size of English alphabet.
The string then describes a whole picture: "U12C6P1L74D74R74U74P0....".
Imagine now the class of ten thousand children who were told to draw some very specific image with the help of this language: like the flag of their country. We will get 10K of strings which are all different and all alike at the same time.
Our task is compress the whole bunch of strings as good as possible.
My suspicion here is that there is a way to exploit this similarity and common length of the strings, while, Huffman e.g. wont use it explicitly.