10

I'm trying to read a wide char from a stream that was created using fmemopen with a char *.

char *s = "foo bar foo";
FILE *f = fmemopen(s,strlen(s),"r");

wchar_t c = getwc(f);

getwc throws a segmentation fault, I checked using GDB.

I know this is due to opening the stream with fmemopen, because calling getwc on a stream opened normally works fine.

Is there a wide char version of fmemopen, or is there some other way to fix this problem?

MD XF
  • 7,062
  • 7
  • 34
  • 64
  • Please post a proper MCVE, the `fmemopen` invocation is invalid – Antti Haapala Aug 13 '17 at 06:37
  • @AnttiHaapala Oh, whoops, I missed that part. Sorry. – MD XF Aug 13 '17 at 20:49
  • 1
    @MDXF: From the examples one might get the impression that perhaps [`iconv_open()`](http://man7.org/linux/man-pages/man3/iconv_open.3.html) and [`iconv()`](http://man7.org/linux/man-pages/man3/iconv.3.html) might be a better solution to the underlying problem. – Nominal Animal Aug 14 '17 at 07:38
  • @MDXF: In fact, at least GNU libc uses `iconv` in the background - it uses a separate buffer for already-converted data. After you have set the locale (all, or `LC_CTYPE`), you can use [`nl_langinfo(CODESET)`](http://man7.org/linux/man-pages/man3/nl_langinfo.3.html) to obtain the character set in a form you can supply to `iconv_open()`. While this is not ISO C, it is POSIX.1, and should be quite portable. (Since there is even GNU `libiconv`, this approach should be relatively easy to port across to any system using standard C, including Windows.) – Nominal Animal Aug 17 '17 at 05:34

3 Answers3

7

The second line should read FILE *f = fmemopen(s, strlen(s), "r");. As posted, fmemopen has undefined behavior and might return NULL, which causes getwc() to crash.

Changing the fmemopen() line and adding a check for NULL fixes the crash but does not meet the OPs goal.

It seems wide orientation is not supported on streams open with fmemopen(), At least for the GNU C library. Note that fmemopen is not defined in the C Standard but in POSIX.1-2008 and is not available on many systems (like OS/X).

Here is a corrected and extended version of your program:

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>

int main(void) {
    const char *s = "foo bar foo";
    FILE *f = fmemopen((void *)s, strlen(s), "r");
    wchar_t c;

    if (f == NULL) {
        printf("fmemopen failed: %s\n", strerror(errno));
        return 1;
    }
    printf("default wide orientation: %d\n", fwide(f, 0));
    printf("selected wide orientation: %d\n", fwide(f, 1));
    while ((c = getwc(f)) != WEOF) {
        printf("read %lc (%d 0x%x)\n", c, c, c);
    }
    return 0;
}

Run on linux:

default wide orientation: -1
selected wide orientation: -1

No output, WEOF is returned immediately.

Explanation for fwide(f, 0) from the linux man page:

SYNOPSIS

#include <wchar.h>
int fwide(FILE *stream, int mode);

When mode is zero, the fwide() function determines the current orientation of stream. It returns a positive value if stream is wide-character oriented, that is, if wide-character I/O is permitted but char I/O is disallowed. It returns a negative value if stream is byte oriented, i.e., if char I/O is permitted but wide-character I/O is disallowed. It returns zero if stream has no orientation yet; in this case the next I/O operation might change the orientation (to byte oriented if it is a char I/O operation, or to wide-character oriented if it is a wide-character I/O operation).

Once a stream has an orientation, it cannot be changed and persists until the stream is closed.

When mode is nonzero, the fwide() function first attempts to set stream's orientation (to wide-character oriented if mode is greater than 0, or to byte oriented if mode is less than 0). It then returns a value denoting the current orientation, as above.

The stream returned by fmemopen() is byte-oriented and cannot be changed to wide-character oriented.

chqrlie
  • 98,886
  • 10
  • 89
  • 149
  • So there's no way to `fmemopen` a string and read wide characters from it? – MD XF Aug 13 '17 at 21:02
  • @MDXF: Indeed I'm afraid the Glibc implementation does not support wide orientation. – chqrlie Aug 13 '17 at 21:21
  • `fwide` does not changes the orientation if the orientation is already defined. So the second call `fwide` has zero effect. You can try open stream this way `fmemopen(s, strlen(s), "r,ccs=UNICODE");` – vadim_hr Aug 16 '17 at 11:47
  • @VadimHryshkevich: The first call to `fwide()` is a query for the current orientation. It returns byte-oriented. The second call attempts to change the orientation to wide and indeed fails. Your proposed approach is interesting. It is non standard but classic on some systems. – chqrlie Aug 16 '17 at 16:25
  • @chqrlie: This is from `fwide()` man page: "Once a stream has an orientation, it cannot be changed and persists until the stream is closed." So the second call to `fwide()` has zero effect. P.S. 1. I have looked into the source code of `fwide()` on my linux distrib: if the stream has not zero orientation `fwide()` just exits. 2. From the source code of `fmemopen()`: there is no chance to change orientation of the stream in this function in any way. 3. It is possible to use function `freopen(NULL,"r",fmemopen(...))` to get stream without orientation, but I have tried this without luck. – vadim_hr Aug 17 '17 at 07:26
  • @vadim_hr: yes, I already quoted the man page in the answer (just added more paragraphs for clarity) and got to the same conclusion: *The stream returned by `fmemopen()` is byte-oriented and cannot be changed to wide-character oriented.* It is a pity that a stream orientation cannot be changed once set and that it is not handled generically enough to work for memory streams transparently. – chqrlie Aug 17 '17 at 10:12
3
  1. Your second line does not use the correct number of parameters, does it? corrected

    FILE *fmemopen(void *buf, size_t size, const char *mode);

  2. glibc's fmemopen does not (fully) support wide characters AFAIK. There's also open_wmemstream(), which supports wide characters but is just for writing.

  3. Is _UNICODE defined? See wchar_t reading.
    Also, have you set the locale to an encoding that supports Unicode, for example, setlocale(LC_ALL, "en_US.UTF-8");? See here.

  4. Consider using a temporary file. Consider using fgetwc / 4 instead.

I have changed my code and adopted the code from @chqrlie since it more close to the OP code but added the locale, otherwise it fails to produce correct output for extended/Unicode characters.

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>

int main(void)
{
    setlocale(LC_ALL, "en_US.UTF-8");
    const char *s = "foo $€ bar foo";
    FILE *f = fmemopen((void *)s, strlen(s), "r");
    wchar_t c;

    if (f == NULL) {
        printf("fmemopen failed: %s\n", strerror(errno));
        return 1;
    }
    printf("default wide orientation: %d\n", fwide(f, 0));
    printf("selected wide orientation: %d\n", fwide(f, 1));
    while ((c = getwc(f)) != WEOF) {
        printf("read %lc (%d 0x%x)\n", c, c, c);
    }
    return 0;
}
wp78de
  • 16,078
  • 6
  • 34
  • 56
1
  1. You can use getwc() only on unoriented or wide-oriented stream. From getwc() man page: The stream shall not have an orientation yet, or be wide-oriented.

  2. It is not possible to change stream orientation, if the stream already has orientation. From fwide() man page: Calling this function on a stream that already has an orientation cannot change it.

  3. Stream opened with glibc's fmemopen() has an byte-orientation and therefore can't be wide-oriented in any way. As described here uClibc has fmemopen() routine without this limitation.

Conclusion: You need to use uClibc or another library or make your own fmemopen().

vadim_hr
  • 473
  • 5
  • 9