18

Checking if it can be an int is easy enough -- just check that every digit is between '0' and '9'. But a float is harder. I found this, but none of the answers really work. Consider this code snippet, based on the top (accepted) answer:

float f;
int ret = sscanf("5.23.fkdj", "%f", &f);
printf("%d", ret);

1 will be printed.


Another answer suggested using strpbrk, to check if certain illegal characters are present, but that wouldn't work either because 5fin7 wouldn't be legal, but inf would.


Yet another answer suggested checking the output of strtod. But consider this:

char *number = "5.53 garbanzo beans"
char *foo;

strtod(number, &foo);

printf("%d", isspace(*foo) || *foo == '\0'));

It'll print 1. But I don't want to remove the isspace call entirely, because " 5.53 " should be a valid number.


Is there a good, elegant, idiomatic way to do what I'm trying to do?

Moonchild
  • 403
  • 4
  • 12
  • 2
    What about a regular expression? – SBS Aug 07 '17 at 19:52
  • @SBS That could work, but it seems a bit heavy for something like this. I'm actually using this for a lisp I'm writing, to infer if a value should be parsed as a float, as an int, or as an invalid; and I don't want it to slow down if you use a ton of floats. – Moonchild Aug 07 '17 at 19:57
  • 3
    @Elronnd In that case you probably want to tokenize the sequence first, meaning that there *won’t* be leading or trailing whitespace and there certainly won’t be something like `5.0 meters` which starts with a number followed by a space and then has other stuff – Daniel H Aug 07 '17 at 19:59
  • @DanielH sure, but also consider, what happens if someone writes their own program *in* the lisp which tries to read user input? Then I'll have to have a builtin which can parse a non-tokenized float. – Moonchild Aug 07 '17 at 20:02
  • 5
    Your "easy" test for int doesn't include leading `-` – stark Aug 07 '17 at 20:04
  • I saw a comment indicating that you don't consider ```" 255."``` as a float. Why is that? The `strtod()` function would accept it; it does not mandate a digit after the decimal point. – Jonathan Leffler Aug 07 '17 at 20:09
  • @JonathanLeffler oh, I didn't know that. – Moonchild Aug 07 '17 at 20:11
  • Not sure, but is this question a duplicate of https://stackoverflow.com/q/18210406/783510 ? – Philipp Claßen Aug 07 '17 at 21:07
  • I linked that question in my question and explained why none of the answers it had worked. – Moonchild Aug 07 '17 at 21:26
  • @Elronnd, a _compiled_ regex may not be as heavy as you think. Also, bits and cycles are pretty close to free in many applications. Nothing is really "too heavy" until somebody asks the question, "Why is this app so slow?" or "Why is it so big?" – Solomon Slow Aug 07 '17 at 21:36
  • 2
    As a general rule, I have found that if one is trying to write a grammar, write a grammar. I've been reliably disappointed by every attempt I have made to get around writing the grammar with some clever C code. There's a reason we invented formal representations of grammars, and the software to parse them (LEX/YACC, BISON, ANTLR, Boost.Spirit, etc.) – Cort Ammon Aug 07 '17 at 23:39
  • @CortAmmon but...I got the tokenizing done without requiring any grammars. It would be *really* shitty to have to introduce them now for something as mundane as parsing a float. – Moonchild Aug 08 '17 at 00:30
  • @Elronnd My experience is that parsing floats is anything but mundane. There's reasons you find special functions in boost.spirit which parse `5.` but do not parse `5`. – Cort Ammon Aug 08 '17 at 00:42
  • 1
    It is unclear to me if you're looking for a "float in general", or "counts as a float constant in the C language". The latter could be expected to include hexadecimal floats etc, but is perhaps unwanted in the former case. – pipe Aug 08 '17 at 06:45
  • When talking about parsing of floating point literals: what about scientific notation? – moooeeeep Aug 08 '17 at 06:46
  • @pipe I'm fine with it just being what would be considered a float in c. – Moonchild Aug 08 '17 at 07:47
  • @moooeeeep the standard library already handles those. – Moonchild Aug 08 '17 at 07:47
  • Should it also accept representations of floating point numbers which are outside of the range of a float/double? Your integer test seems not to care about size. – Carsten S Aug 08 '17 at 07:54
  • Possible duplicate of [Check if input is float else stop](https://stackoverflow.com/questions/18210406/check-if-input-is-float-else-stop) – dhein Aug 08 '17 at 08:21
  • Note: Just because the question you linked has no answer that satisfies you, doesn't make this OP a duplicate of that one. The proper way if your question already exists (what it does over there) you should put a bounty on it to gain attention on your aspect of the question you are still missing a answer for. – dhein Aug 08 '17 at 08:23

6 Answers6

13

The first answer should work if you combine it with %n, which is the number of characters read:

int len;
float ignore;
char *str = "5.23.fkdj";
int ret = sscanf(str, "%f %n", &ignore, &len);
printf("%d", ret==1 && !str[len]);

!str[len] expression will be false if the string contains characters not included in the float. Also note space after %f to address trailing spaces.

Demo

chux - Reinstate Monica
  • 113,725
  • 11
  • 107
  • 213
Sergey Kalinichenko
  • 675,664
  • 71
  • 998
  • 1,399
  • This one fails a couple of my testcases. It says that `5.5 ` and ` 5.5 ` *aren't* floats (even though they are!) And it says that ` 235.` is a float (even though it isn't!) – Moonchild Aug 07 '17 at 20:05
  • 1
    @Elronnd I added spaces around `%f` to fix this problem ([demo](http://ideone.com/GSScOz)). – Sergey Kalinichenko Aug 07 '17 at 20:11
  • Awesome. I've accepted your answer because I think it's more elegant. – Moonchild Aug 07 '17 at 20:14
  • 2
    The leading space is harmless and symmetric, but unnecessary; `%f` (like all numeric conversion specifications) skips leading white space anyway, whether you want it to do so or not. – Jonathan Leffler Aug 07 '17 at 20:17
  • Suppose `len` happened to be set to `9` (by bad luck since you don't set it to a value when you define it). Then you pass the string `"abcdefghi"` to the code. Since the `%*f` fails to convert, the `%n` won't be processed, but the value in `len` is the length of the string. Further, the suppressed assignment (`%*f`) isn't counted as a conversion, and neither is `%n`, so the return value from `scanf()` will be 0 regardless of whether there was any data there or not. Empty strings also cause trouble on this score. It's an endlessly tricky topic. – Jonathan Leffler Aug 07 '17 at 20:24
  • I think you shouldn't suppress the assignment (add a dummy variable) and you should check that you get `1` back from `sscanf()`. That avoids the problems with empty strings and not setting `len` explicitly. – Jonathan Leffler Aug 07 '17 at 20:26
  • 2
    @JonathanLeffler Wow, that's a nice catch! Thank you very much! I guess I can save the asterisk by setting `len` to `strlen(str)+1`, but that is grossly unreadable and non-intuitive, and it would force me to special-case an empty string, so I'd go with your suggestion and use an ignored variable. Thanks! – Sergey Kalinichenko Aug 07 '17 at 20:35
  • elegant, but this would be more efficient: `printf("%d", ret && str[len] == '\0');` – chqrlie Aug 07 '17 at 22:52
  • 2
    Indeed `ret` can be `EOF` if `str` is empty. Here is a fix: `printf("%d", ret == 1 && str[len] == '\0');` – chqrlie Aug 07 '17 at 23:07
  • @chqrlie Thank you very much! It continuously amazes and humbles me to see how a seemingly trivial three lines of C code could fail in so many ways. – Sergey Kalinichenko Aug 08 '17 at 00:45
  • `sscanf` has undefined behavior if there are too many digits in the input, and therefore a completely robust answer to this question cannot involve `sscanf`. [I am not making this up](http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10). – zwol Apr 22 '19 at 18:04
  • @zwol To clarify, are you interpreting "if the result of the conversion cannot be represented in the object, the behavior is undefined" from the standard at the link as "when there are too many digits in the input"? – Sergey Kalinichenko Apr 22 '19 at 18:24
  • @dasblinkenlight Yes. Notice that the _syntax_ of the input item consumed by `%f` is defined in terms of the syntax accepted by `strtod`, but the _conversion_ is _not_ defined to be as-if by calling `strtod`, and also notice that the case where the type of the object to receive the result doesn't agree at all with the format specifier is handled by the previous clause ("If this object does not have an appropriate type…the behavior is undefined"). My interpretation is therefore that this clause means, for any input where `strtod` would report under- or overflow, `scanf("%f")` has UB. – zwol Apr 22 '19 at 22:29
  • @zwol I strongly doubt this interpretation, because it would mean that end-users are always able to force undefined behavior on `scanf` by providing invalid input when `%f` is in use. This would create a severe security problem in the standard itself, rendering `%f` unusable, along with "unlimited" `%s`. – Sergey Kalinichenko Apr 23 '19 at 12:20
  • @dasblinkenlight Do you have an alternative interpretation? I do consider this severe bug in the specification of `scanf`, but since I consider `scanf` not fit for purpose _anyway_, in ways that cannot be fixed without breaking backward compatibility (e.g. it would need to return the number of _characters_ consumed before running into a parse failure) I can't be bothered to file a DR. – zwol Apr 23 '19 at 14:54
  • @dasblinkenlight N.B. that this applies to `%d` as well; in fact, I think the _only_ `scanf` conversion specifiers for which there exist no inputs that provoke UB are `%c` and (`%s` and `%[...]` with a length modifier). – zwol Apr 23 '19 at 14:59
10

You could check if - after having read a value using strtod - the remainder consists solely of white spaces. Function strspn can help here, and you can even define "your personal set of white spaces" to consider:

int main() {

    char *number = "5.53 garbanzo beans";
    char *foo;

    double d = strtod(number, &foo);
    if (foo == number) {
        printf("invalid number.");

    }
    else if (foo[strspn(foo, " \t\r\n")] != '\0') {
        printf("invalid (non-white-space) trailing characters.");
    }
    else {
        printf("valid number: %lf", d);
    }
}
Stephan Lechner
  • 33,675
  • 4
  • 27
  • 49
  • This one works for all my testcases but one. It thinks that ` 235.` is a number, but it isn't. – Moonchild Aug 07 '17 at 20:09
  • I've just been informed that a trailing `.` is acceptable. – Moonchild Aug 07 '17 at 20:10
  • Why don't you consider ```" 255."``` as a float. The `strtod()` function would accept it; it does not mandate a digit after the decimal point. It is permissible to mandate that there must be a digit before and a digit after the decimal point if the decimal point is present, but that requires extra testing because both `255.` and `.255` are legitimate floating point number in C, and the `strtod()` function will accept them. If you're going to mandate that, though (the digit before and after rule), you must state if. – Jonathan Leffler Aug 07 '17 at 20:10
  • @Elronnd 235 is a valid float, according to C compiler. – Sergey Kalinichenko Aug 07 '17 at 20:10
  • 2
    Hm - it depends: `double valid = 553.;` is valid in c. – Stephan Lechner Aug 07 '17 at 20:10
  • @Jonathan Leffler `.1` is valid and `1.` is valid as well. IMO logical – 0___________ Aug 07 '17 at 21:06
  • This answer is correct as far as it goes, but should also check for overflow and underflow, by clearing `errno` before the call and inspecting it afterward. – zwol Apr 22 '19 at 18:08
6

Is there a way to check if a string can be a float?

A problem with the sscanf(...,"%f") approach is on overflow, which is UB. Yet it is commonly handled nicely.

Instead use float strtof(const char * restrict nptr, char ** restrict endptr);

int float_test(const char *s) {
  char *ednptr;
  errno = 0;
  float f = strtof(s, &endptr);
  if (s == endptr)  {
    return No_Conversion;
  }
  while (isspace((unsigned char) *endptr)) {  // look past the number for junk
    endptr++;
  }   
  if (*endptr) {
    return Extra_Junk_At_End; 
  }

  // If desired
  // Special cases with with underflow not considered here.
  if (errno) {
    return errno; // likely under/overflow
  }  

  return Success;
}
chux - Reinstate Monica
  • 113,725
  • 11
  • 107
  • 213
3

This code is closely based on the answer by dasblinkenlight. I proffer it as food for thought. Some of the answers it gives may not be what you wanted.

#include <stdio.h>
#include <string.h>

static void test_float(const char *str)
{
    int len;
    float dummy = 0.0;
    if (sscanf(str, "%f %n", &dummy, &len) == 1 && len == (int)strlen(str))
        printf("[%s] is valid (%.7g)\n", str, dummy);
    else
        printf("[%s] is not valid (%.7g)\n", str, dummy);
}

int main(void)
{
    test_float("5.23.fkdj");        // Invalid
    test_float("   255.   ");       // Valid
    test_float("255.123456");       // Valid
    test_float("255.12E456");       // Valid
    test_float("   .255   ");       // Valid
    test_float("   Inf    ");       // Valid
    test_float(" Infinity ");       // Valid
    test_float("   Nan    ");       // Valid
    test_float("   255   ");        // Valid
    test_float(" 0x1.23P-24 ");     // Valid
    test_float(" 0x1.23 ");         // Valid
    test_float(" 0x123 ");          // Valid
    test_float("abc");              // Invalid
    test_float("");                 // Invalid
    test_float("   ");              // Invalid
    return 0;
}

Testing on a Mac running macOS Sierra 10.12.6 using GCC 7.1.0 as the compiler, I get the output:

[5.23.fkdj] is not valid (5.23)
[   255.   ] is valid (255)
[255.123456] is valid (255.1235)
[255.12E456] is valid (inf)
[   .255   ] is valid (0.255)
[   Inf    ] is valid (inf)
[ Infinity ] is valid (inf)
[   Nan    ] is valid (nan)
[   255   ] is valid (255)
[ 0x1.23P-24 ] is valid (6.775372e-08)
[ 0x1.23 ] is valid (1.136719)
[ 0x123 ] is valid (291)
[abc] is not valid (0)
[] is not valid (0)
[   ] is not valid (0)

The hexadecimal numbers are likely to be particularly problematic. The various forms of infinity and not-a-number could be troublesome too. And the one example with an exponent (255.12E456) overflows float and generates an infinity — is that really OK?

Most of the problems raised here are definitional — that is, how do you define what you want to be acceptable. But note that strtod() would accept all the valid strings (and a few of the invalid ones, but other testing would reveal those problems).

Clearly, the test code could be revised to use an array of a structure containing a string and the desired result, and this could be used to iterate through the test cases shown and any extras that you add.

The cast on the result of strlen() avoids a compilation warning (error because I compile with -Werror) — comparison between signed and unsigned integer expressions [-Werror=sign-compare]. If your strings are long enough that the result from strlen() overflows a signed int, you've got other problems pretending they're valid values. OTOH, you might want to experiment with 500 digits after a decimal point — that's valid.

This code notes the comments made to dasblinkenlight's answer:

Jonathan Leffler
  • 666,971
  • 126
  • 813
  • 1,185
3

This is a variation on the code fragment posted by dasblinkenlight that is slightly simpler and more efficient as strlen(str) could be costly:

const char *str = "5.23.fkdj";
float ignore;
char c;
int ret = sscanf(str, "%f %c", &ignore, &c);
printf("%d", ret == 1);

Explanation: sscanf() returns 1 if and only if a float was converted, followed by optional white space and no other character.

chqrlie
  • 98,886
  • 10
  • 89
  • 149
0

Maybe this? Not very good but may do the job. Returns -1 on error 0 on no conversions done and > 0 with converted numbers flags set.

#define INT_CONVERTED       (1 << 0)
#define FLOAT_CONVERTED     (1 << 1)
int ReadNumber(const char *str, double *db, int *in)
{

    int result = (str == NULL || db == NULL || in == NULL) * -1;
    int len = 0;
    char *tmp;

    if (result != -1)
    {
        tmp = (char *)malloc(strlen(str) + 1);
        strcpy(tmp, str);
        for (int i = strlen(str) - 1; i >= 0; i--)
        {
            if (isspace(tmp[i]))
            {
                tmp[i] = 0;
                continue;
            }
            break;
        }
        if (strlen(tmp))
        {
            if (sscanf(tmp, "%lf%n", db, &len) == 1 && strlen(tmp) == len)
            {
                result |= FLOAT_CONVERTED;
            }
            if (sscanf(tmp, "%d%n", in, &len) == 1 && strlen(tmp) == len)
            {
                result |= INT_CONVERTED;
            }
        }
        free(tmp);
    }
    return result;
}
0___________
  • 34,740
  • 4
  • 19
  • 48