0

I have the string "pid:8792 byr:2000 cols:hkjdp\n" and I only want to extract the number after the byr:. I thought that it could be done with extracting a formatted string with sscanf(str,"byr:%d",&number);. But unfortunately you can't do that since there are other characters before and after the number so I saw that you could use some sort of regex like in this question asked How to use regex in sscanf

so I tried something like this: sscanf(passport, "%*[^byr:]:%[^\h]%*[^\n]", byr); where byr is now defined as char *byr;. But you can't use regular regex expressions like \hfor whitespace for example. Long Story short: Is there any way for me to parse many strings using sscanf and always extract that number after byr: and where can I find a cheatsheet for all those characters to use in a formatted string? (Of course I know about the obvious %f %d %s %cand so on but these dont really do much in this case.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
Phil
  • 107
  • 6
  • 1
    Maybe you need a proper regular expression library? – tadman Jan 26 '21 at 20:26
  • 2
    Or just use other means? Such as `strstr` followed with `strtoul` – Eugene Sh. Jan 26 '21 at 20:27
  • You could use `sscanf` and a format like `"pid:%*d byr:%d cols:%*s"`. A `*` in a scanf format string like that means "scan, but don't assign to anything". – Steve Summit Jan 26 '21 at 20:29
  • 1
    But no, `scanf` does not and cannot do true regular expressions, as [that other question](https://stackoverflow.com/questions/24483075/input-using-sscanf-with-regular-expression) explains. – Steve Summit Jan 26 '21 at 20:46
  • `man scanf` contains a complete description of `scanf` conversion specifications. You won't find regular expressions there because scanf does not do regular expressions. In C, escape sequences like `\n` are turned into the corresponding character by the compiler, long before a function like `scanf` is actually executed. In a scanf format string, whitespace characters are treated identically, as described in `man scanf`; they match any sequence of whitespace characters in the input. (In most regex libraries, whitespace is `\s`, not `\h`. But neither of those are valid C escape sequences.) – rici Jan 26 '21 at 20:58
  • @tadman but how do you save a matching regex expression in a variable? I've just seen regexes in c which determine a match or not and return true or false. What libraries are there? – Phil Jan 26 '21 at 21:58
  • @SteveSummit can I also just do something like ```"%*s byr:%d %*s"```? Because this would mean that it basically scans everything but only saves byr in a variable. The thing is does %*s also include ```\n``` ? – Phil Jan 26 '21 at 22:00
  • @Phil Not in general. `%*s` will scan up to the first whitespace, which would work in this case, but not others. – Steve Summit Jan 26 '21 at 22:02
  • @Phil My recommendation is the same as Eugene's. Please read a line using `fgets`, and then search for "byr:" with `strstr`. I don't believe it's worth trying to do it with `scanf`. For just about any input-parsing problem you have, doing it using `scanf` is either (a) impossible or (b) five times harder, and less reliable, than doing it some other way. See [What can I use for input conversion instead of scanf?](https://stackoverflow.com/questions/58403537/) – Steve Summit Jan 26 '21 at 22:02
  • @Phil Or if there's a reason you simply have to use scanf, meaning that my recommendations to the contrary are not helpful, I apologize. – Steve Summit Jan 26 '21 at 22:07
  • @SteveSummit thank you for the answers. The thing is that I have to do this on many different strings thats why I think strstr won't help here. Because the data being searched for is always different but have to get the number after ```byr:``` – Phil Jan 26 '21 at 22:10

1 Answers1

0

scanf is not well-suited to this situation. It does not provide anything like the general-purpose regexp support you'd need.

If all you care about is the "byr:" part, a completely different approach is to use strstr to search for that specific string. Here is an example:

char *str = "pid:8792 byr:2000 cols:hkjdp\n";
char *tag = "byr:";
char *p = strstr(str, tag);
if(p != NULL) {
    int n = atoi(p + strlen(tag));
    printf("%s %d\n", tag, n);
}

str = "pid:8792 uid:412 byr:3000 cols:etfvq\n";
p = strstr(str, tag);
if(p != NULL) {
    int n = atoi(p + strlen(tag));
    printf("%s %d\n", tag, n);
}

tag = "uid:";
p = strstr(str, tag);
if(p != NULL) {
    int n = atoi(p + strlen(tag));
    printf("%s %d\n", tag, n);
}

Normally, atoi is disrecommended, because it quietly ignores trailing garbage, but here that's just what you want.

One drawback of this technique is that it would wrongly find, say, an "abyr:" tag.

Steve Summit
  • 29,350
  • 5
  • 43
  • 68
  • Seems to do the trick but I have one question: Why does `atoi(p + strlen);` only include the number? Shouldn't the pointer now be at the beginning of the number and then read until `\n` is found? Or does `atoi()` only read until the next non-number element? – Phil Jan 27 '21 at 17:37
  • @Phil: That's correct; that's what I was alluding to when I said it "quietly ignores trailing garbage, but here that's just what you want". – Steve Summit Jan 27 '21 at 18:06
  • Got it . Thank for the help – Phil Jan 27 '21 at 18:09