0

I am practicing the huffman encoding from my programming class. I have done the almost all the encoding part. For example, I have assigned each character a code (i.e. a=100100) and convert each char in the text according to it's code. Then I parse each code into a List of Byte, like parsing 100100 into a Byte and store it into the List. However, I need to write all the Bytes into a .txt file. I realized there is a problem.

Example: one character has the code "1001" and it will be written into the .txt file as 1 byte instead of just 4 bits.

I know that after huffman encoding, characters are stored in a format like: "11100111101011111101011011111000010000101" but now my situation is each character take 1 byte of size, which has no different in size with the original input file before encoding.

Is there any way to store the code in the format like "11100111101011111101011011111000010000101"?

Sorry for my English, I tried my best to explain my confusion.

Ming
  • 69
  • 1
  • 5
  • You have to write each byte as binary string. [This](http://stackoverflow.com/questions/4421400/how-to-get-0-padded-binary-representation-of-an-integer-in-java) topic can help you. – Andrew Kolpakov May 06 '16 at 10:45
  • I just tried your method, if i write string instead of byte into the file, the file size would be 10 times larger. – Ming May 06 '16 at 11:32
  • That's the difference between txt and binary format. – Andrew Kolpakov May 11 '16 at 12:06

2 Answers2

0
try (FileWriter fw = new FileWriter("out.txt")) {
    try (BufferedWriter bfw = new BufferedWriter(fw)) {
        char[] buffer = str.toCharArray();
        for (int i = 0; i < buffer.length; i++) {
            bfw.write(Integer.valueOf(Byte.valueOf((byte) buffer[i]).intValue()).toBinaryString());
        }       
    }
}
Lee
  • 728
  • 3
  • 13
0

You could use a BitSet object if you intend to keep all bits in memory.

BitSet bits = new BitSet();
bits.set(7000, true);
if (bits.get(7000)) { ... }
byte[] bytes = bits.toByteArray();

Path path = Paths.get("C:/Temp/huffman.bin");
Files.writeBytes(path, bytes);

Using bytes immediately is feasible.

However you cannot write char's; there is a conversion which messes things up. Mind char is 16 bits UTF-16 formatted to contain Unicode.

This writes binary data, not text.

For trailing bits, I do not know how Huffman deals with that, do a bit of research; I think bits 0 will do and not generate artifacts. Maybe add the first 0-7 bits of longer code. Padding is the key word.

Joop Eggen
  • 96,344
  • 7
  • 73
  • 121