How to set a flag in image bits to mark the end of an audio file

Question

Currently, I'm trying to hide audio files of wave formats inside images of bitmap format.

I transformed both of them into binary form and now I'm trying to set some kind of flag to mark the end of the audio file so that when I'm extracting the audio from the image I know where to stop.

Notes to consider 1: the image is 24bit.

2: the audio data is PCM 16 bit.

3: I'm using LSB so each 16 bit of audio needs at least 5 pixels and an element of the sixth pixel (5*3 = 15 + 1 =16).

I had considered leaving some pixels without any alternation so that I can use them to write the size of the audio file in them but as you said it would be too visible plus I would need to set a maximum size for the audio file so that I know how many pixels I should leave for it at the beginning of the image data. — Allen, Jul 13 '17 at 23:59
You've described both an [end-of-file marker and a header for the file size](https://stackoverflow.com/questions/44484791/python-steganographer-file-handling-error-for-non-plain-text-files), but I don't understand what your problem is. Can't you hide this extra information one bit at a time to not make it visible? — Reti43, Jul 14 '17 at 07:11
umm my problem is as i have said 'im transforming the audio to bits and hiding them one bit at a time when I'm trying to decode the image and extract the audio out of it when should i stop as an example if I'm the receiver the of the image that contains the audio and I'm trying to decode it to extract the audio and write it as a wave file to listen to it how do I know when should I stop trying to extract, when did the file end, the only solution would be to write the size of the audio file but that would be visible I can't think of other solutions — Allen, Jul 14 '17 at 13:28
the way I extract the audio is that when I receive the image I get the value of the LSB and once 16 values are taken that means I constructed a unit of audio so the problem is when should I stop the process of extracting. when I searched for a solution some people who hide text in the image they said use 8 0s continuously but that won't work in my case and hiding it the size at the start is too visible — Allen, Jul 14 '17 at 13:30
*hiding it the size at the start is too visible*. This. Can you elaborate what you mean by too visible? I honestly don't see a problem. For example, say your bitstream is 16000 bits and you add another 16 bits at the front to say how many bits to read. You're now embedding 16016 bits. How is that too visible? — Reti43, Jul 14 '17 at 13:40
that was my mistake I didn't explain clearly I apologize for that. What I meant was first of all 16 bytes might not be enough to hold the size of the bitstream if I'm not mistaken 16 bit could be used to hold numbers up to 65000sh if starting from 0 or 32000sh if starting from -32000sh. as an example if the size of audio file was up to 1 megabyte the number of bits would be around 8.3m so 16 bits to hold the size wouldn't be enough and what I mean by too visible that this is also for security purposes so if I was trying to decode that wouldn't I think that the size is at the beginning — Allen, Jul 14 '17 at 15:55
i know that LSB isn't exactly the best algorithm for this but it is what I could implement right now and when I saw other people marking the end of the encrypted item which was text using 8 consecutive 0s I wanted to do something similar in order not to make a constraint on the size of the audio file. and also it's an attempted to improve the security even if by a little. but then again I can't use the same idea the people who hide text inside image do because a lot of the audio values would contain 8 consecutive 0s or even 16 — Allen, Jul 14 '17 at 16:01
Okay, I understand your issues now and I will write up an answer. — Reti43, Jul 14 '17 at 16:18
@Reti43 um hello thank you for help before as a matter of fact I have another question some of the audio values that I converted to bits and hide them in image were negative values the conversion of negative values to binary went fine but when I'm trying to extract them from the image it's not going well because I can't convert the binary that represented the negative values back to their original numbers any idea for a solution? — Allen, Jul 16 '17 at 22:23
This is a different question, which should be asked in its own post. However, I can't see why you can't read your audio file, or any file for that matter, as a binary stream and embed that as is. — Reti43, Jul 17 '17 at 19:23
@Reti43 well the reason is that some values before I encoded them to the image were negative such as -9 I can convert it to binary but i can\t convert it back — Allen, Jul 17 '17 at 22:03
Create a new question and show your code. There's probably something wrong with your conversion to and/or from binary. — Reti43, Jul 17 '17 at 22:45

score 0 · Accepted Answer · answered Jul 14 '17 at 17:59

You are correct that you can't rely on an end-of-file marker, because all byte values are probable in your bytestream. Therefore, you have to use a header. This has to be implemented in an unambiguous, so that the decoder knows where and how to look to extract its information. This simply translates to having the header at the beginning of your message and in this case, it'd be convenient, though not necessary, to have it as a fixed size.

If your message can be longer than 65536 bits, then your message size will obviously need more than 16 bits, for example, 32 bits. Now, you may argue that if you're embedding a very small message, a 32-bit message size header will have too many zeros at the beginning and it will be overkill. However, even if you're planning to embed 1 kB of data, an extra 32 bits will barely add any extra noise.

By Kerckhoff's principle, you should indeed assume that an attacker has full familiarity of your scheme and if you embed your message in a static way, it should be straightforward for them to extract it. Instead, you can use a password or key as the seed to a PRNG and then use that to shuffle the order of your pixels. For example, let's call the RGB components of the first pixel by the numbers 1-3 respectively and the components of the second pixel by 4-6. By generating the array [1, 2, 3, 4, 5, 6] and shuffling it, you may get the order [4, 3, 2, 6, 5, 1]. So, when you embed your secret (including the message size), the first bit is hidden in the red component of the second pixel, the second bit is hidden in the blue component of the first pixel, etc. Nobody can say this is "visible", because without the correct pixel order they can't meaningfully extract the message.

However, you have to remember that steganography is the art of concealing information in another medium, without someone even suspecting its presence. In comparison, cryptography is about publicly transferring your message, but encrypted in such a way that only the intended participant can know its contents. What this means for steganography is that one doesn't necessarily have to extract your message to defeat the purpose of it. You're busted just for the fact you're secretly trying to transmit information, regardless of whether its contents can be read. You are aware that LSB pixel modifications method are broken, so trying to make them ever so slightly more secure is pointless. Instead, just worry about playing with the method to get your hands dirty and learn a bit about steganography.

first of all, thank you for the wonderful answer and the explanation it was very good and easy to understand and sure as you suggested I will try to learn a bit more about steganography but I will still implement this one first. again thank you for help — Allen, Jul 14 '17 at 18:19

How to set a flag in image bits to mark the end of an audio file

1 Answers1