This answer is written in C, but can be compiled as C++ and works the same in both. I quote from the C11 standard; there are equivalent definitions in the C++ standards.
There isn't a good way to pass null bytes to a program's arguments
C11 §5.1.2.2.1 Program startup:
If the value of argc
is greater than zero, the array members argv[0]
through argv[argc-1]
inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.
C11 §7.1.1 Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null character.
That means that each argument passed to main()
in argv
is a null-terminated string. There is no reliable data after the null byte at the end of the string — searching there would be accessing out of bounds of the string.
So, as noted at length in the comments to the question, it is not possible in the ordinary course of events to get null bytes to a program via the argument list because null bytes are interpreted as being the end of each argument.
By special agreement
That doesn't leave much wriggle room. However, if both the calling/invoking program and the called/invoked program agree on the convention, then, even with the limitations imposed by the standards, you can pass arbitrary binary data, including arbitrary sequences of null bytes, to the invoked program — up to the limits on the length of an argument list imposed by the implementation.
The convention has to be along the lines of:
- All arguments (except
argv[0]
, which is ignored, and the last argument, argv[argc-1]
) consist of a stream of non-null bytes followed by a null.
- If you need adjacent nulls, you have to provide empty arguments on the command line.
- If you need trailing nulls, you have to provide empty arguments as the last arguments on the command line.
This could lead to a program such as (null19.c
):
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void hex_dump(const char *tag, size_t size, const char *buffer);
int main(int argc, char **argv)
{
if (argc < 2)
{
fprintf(stderr, "Usage: %s arg1 [arg2 '' arg4 ...]\n", argv[0]);
exit(EXIT_FAILURE);
}
size_t len_args = 0;
for (int i = 1; i < argc; i++)
len_args += strlen(argv[i]) + 1;
char buffer[len_args];
size_t offset = 0;
for (int i = 1; i < argc; i++)
{
size_t arglen = strlen(argv[i]) + 1;
memmove(buffer + offset, argv[i], strlen(argv[i]) + 1);
offset += arglen;
}
assert(offset != 0);
offset--;
hex_dump("Argument list", offset, buffer);
return 0;
}
static inline size_t min_size(size_t x, size_t y) { return (x < y) ? x : y; }
static void hex_dump(const char *tag, size_t size, const char *buffer)
{
printf("%s (%zu):\n", tag, size);
size_t offset = 0;
while (size != 0)
{
printf("0x%.4zX:", offset);
size_t count = min_size(16, size);
for (size_t i = 0; i < count; i++)
printf(" %.2X", buffer[offset + i] & 0xFF);
putchar('\n');
size -= count;
offset += count;
}
}
This could be invoked using:
$ ./null19 '1234' '5678' '' '' '' '' 'def0' ''
Argument list (19):
0x0000: 31 32 33 34 00 35 36 37 38 00 00 00 00 00 64 65
0x0010: 66 30 00
$
The first argument is deemed to consist of 5 bytes — four digits and a null byte. The second is similar. The third through sixth arguments each represent a single null byte (it gets painful if you need large numbers of contiguous null bytes), then there is another string of five bytes (three letters, one digit, one null byte). The last argument is empty but ensures that there is a null byte at the end. If omitted, the output would not include that final terminal null byte.
$ ./null19 '1234' '5678' '' '' '' '' 'def0'
Argument list (18):
0x0000: 31 32 33 34 00 35 36 37 38 00 00 00 00 00 64 65
0x0010: 66 30
$
This is the same as before except there is no trailing null byte in the data. The two examples in the question are easily handled:
$ ./null19 $(printf "1\x0123")
Argument list (4):
0x0000: 31 01 32 33
$ ./null19 1 23
Argument list (4):
0x0000: 31 00 32 33
$
This works strictly within the standard assuming only that empty strings are recognized as valid arguments. In practice, those arguments are already contiguous in memory so it might be possible on many platforms to avoid the copying phase into the buffer. However, the standard does not stipulate that the argument strings are laid out contiguously in memory.
If you need multiple arguments with binary data, you can modify the convention. For example, you could take a control argument of a string which indicates how many subsequent physical arguments make up one logical binary argument.
All this relies on the programs interpreting the argument list as agreed. It is not really a general solution.