-1

I tried to find some library (C++) or algorithm which could compress array of bits with these properties:

There are seqences of zero bits and sequences of bits, which carry the information (1 or 0). The sequences are usually 8-24 bits long. I need a loseless compression which would take advantage of those zero bits.

How did I come to such sequences:

I serialize various variables into byte array. I do this quite often to create snapshots, so these variables usually don't change much. I want to use this fact for compression. I don't know the type of those variables, just byte length. So I take the bytes and create diff information with the previous snapshot using XOR. If the variable changed just a bit, there will usually be many zero bits. That's the zero bit sequence. The rest of the bits carry the information, that's the information sequence. For every variable, there will probably be 1 zero bit sequence and 1 information sequence.

EDIT: So far I was considering these algorithms:

RLE - the information sequences would mess up the result

Some symbol coding (Huffman etc.) - the data probably won't share much "symbols", it's not a text and the sequences are short. The whole array will be usually around 1000 bytes long.

  • 1
    Huffman and LZ77 give nice compression for most things. For a quick test, write your data to a file and put it in a zip archive. If the compression is quite good, then Huffman and LZ77 will most likely work well for you because that's what ZIP archives use. [I wrote an answer previously on a similar topic.](http://stackoverflow.com/a/16469857/1520907) – user123 May 12 '13 at 14:48
  • 1
    Thanks for response. The problem is, that there won't be many common "symbols", because of the nature of the information sequences. The sequences are relatively short for something like deflate. But I will try it. – user2375015 May 12 '13 at 14:55

1 Answers1

0

If the ~1000 byte sequence has a lot of zero bytes, then just use a standard byte-oriented compression algorithm, such as zlib. You will get compression.

Mark Adler
  • 79,438
  • 12
  • 96
  • 137