1

I have a string of 1's and 0's in which the number of 1's and 0's is the same. I would like to compress this into a number that is smaller in terms of the number of bits needed to store it. Also, converting between the compressed form and non compressed form needs to not require a lot of work.

For example, ordering all possible strings and numbering them off and letting this number be the compressed data would be too much work.

An easy solution would be to allow the compressed data to be just the first n-1 characters of the string where the string is of length n. Converting between the compressed and decompressed data would be easy but this offers little compression, only one bit per string.

I would like an algorithm that would compress a string with this property (same number of ones and zeros) that can be generalized to a string with any even length. I would also like it to compress more than the method described above.

Thanks for help.

Mathew
  • 894
  • 1
  • 17
  • 39
  • "For example, ordering all possible strings and numbering them off and letting this number be the compressed data would be too much work." Converting a binary string to an integer is too much work? – Blorgbeard May 17 '16 at 03:05
  • [Java one-liner](http://stackoverflow.com/questions/17833463/how-do-you-convert-a-binary-number-to-a-biginteger-in-java) – Blorgbeard May 17 '16 at 03:08
  • no, but to ordering all possible strings is too much work. for example, say the strings are of length 10, you might make 0000011111 the first string so it would be compressed to 0, the second might be 0000101111 and so on. to convert between these would be a lot of work. Turning the binary string into an integer as you suggested would not compress the data, it would still take up the same amount of bits. – Mathew May 17 '16 at 05:24
  • Oh, I assumed you meant you had an actual string, one byte per character, and wanted compression from that point. – Blorgbeard May 17 '16 at 20:21
  • Can you quantify "too much work" in terms of time complexity / memory usage? – Blorgbeard May 18 '16 at 07:31
  • A small improvement: it's possible to leave more than one character off the end, depending on the first part of the string. Once you've seen `n/2` 1's or 0's, you know the rest of the string. For example, if length = 8, then `11101` can be expanded to `11101000`. – Blorgbeard May 18 '16 at 07:41
  • Answering: Can you quantify "too much work" in terms of time complexity / memory usage? Ideally the amount of memory used would be constant as the length of the string increased but if this is not possible then proportional to the length of the string would also be ok – Mathew May 20 '16 at 00:29
  • Do you just want a single number as the resulting compressed value, or is a sequence of n bytes fine? – Sergio0694 Jun 21 '17 at 18:09

1 Answers1

0

This is a combination problem, N items taken k at a time.

In your comment as an example of Length 10, taken 5 at a time, means that there are only 252 unique patterns. Which can fit into an 8 bit value, instead of a 10 bit value. SEE: WIKI: Combinations

Expanding the indexed value from the 0-251 , there are examples here:

SEE: Algorithm to return all combinations of k elements from n

While extracting, you can use the extracted value to set the Bit position in the reconstructed value, which is O(1) time per expansion. If the list is not millions+ you could pre-compute a lookup table, which is much faster to translate the index value to the decoded value. IE: build a list of all possible, and lookup the translation.

Phillip Williams
  • 396
  • 1
  • 10