0

I have a .txt file that contains data in this format:

xxxx: 0.9467,  
yyyy: 0.9489,  
zzzz: 0.78973,  
hhhh: 0.8874,  
yyyy: 0.64351,  
xxxx: 0.8743,

and so on...

Let's say that my C program receives, as input, the string yyyy. The program should, simply, return all the instances of yyyy in the .txt file and the average of all their numerical values.

int main() {
    FILE *filePTR;
    char fileRow[100000];

    if (fopen_s(&filePTR, "file.txt", "r") == 0) {
        while (fgets(fileRow, sizeof fileRow, filePTR) != NULL) {
            if (strstr(fileRow, "yyyy") != NULL) { // Input parameter
                printf("%s", fileRow);
            }
        }
        fclose(filePTR);
        printf("\nEnd of the file.\n");
    } else {
        printf("ERROR! Impossible to read the file.");
    }
    return 0;
}

This is my code right now. I don't know how to:

  1. Isolate the numerical values
  2. actually convert them to double type
  3. average them

I read something about the strtok function (just to start), but I would need some help...

David C. Rankin
  • 69,681
  • 6
  • 44
  • 72
FydRose
  • 57
  • 4

1 Answers1

1

You have started off on the right track and should be commended for using fgets() to read a complete line from the file on each iteration, but your choice of strstr does not ensure the prefix you are looking for is found at the beginning of the line.

Further, you want to avoid hardcoding your search string as well as the file to open. main() takes arguments through argc and argv that let you pass information into your program on startup. See: C11 Standard - §5.1.2.2.1 Program startup(p1). Using the parameters eliminates your need to hardcode values by letting you pass the filename to open and the prefix to search for as arguments to your program. (which also eliminates the need to recompile your code simply to read from another filename or search for another string)

For example, instead of hardcoding values, you can use the parameters to main() to open any file and search for any prefix simply using something similar to:

#include <stdio.h>
#include <string.h>

#define MAXC 1024   /* if you need a constant, #define one (or more) */

int main (int argc, char **argv) {

    char buf[MAXC] = "", *str = NULL;   /* buffer for line and ptr to search str */
    size_t n = 0, len = 0;              /* counter and search string length */
    double sum = 0;                     /* sum of matching lines */
    FILE *fp = NULL;                    /* file pointer */

    if (argc < 3) { /* validate 2 arguments given - filename, search_string */ 
        fprintf (stderr, "error: insufficient number of arguments\n"
                "usage: %s filename search_string\n", argv[0]);
        return 1;
    }

    if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
        perror ("fopen-filename");
        return 1;
    }
    str = argv[2];                      /* set pointer to search string */
    len = strlen (str);                 /* get length of search string */
    ...

At this point in your program, you have opened the file passed as the first argument and have validated that it is open for reading through the file-stream pointer fp. You have passed in the prefix to search for as the second argument, assigned it to the pointer str and have obtained the length of the prefix and have stored in in len.

Next you want to read each line from your file into buf, but instead of attempting to match the prefix with strstr(), you can use strncmp() with len to compare the beginning of the line read from your file. If the prefix is found, you can then use sscanf to parse the double value from the file and add it to sum and increment the number of values stored in n, e.g.

    while (fgets (buf, MAXC, fp)) {             /* read each line into buf */
        if (strncmp (buf, str, len) == 0) {     /* if prefix matches */
            double tmp;                         /* temporary double for parse */
            /* parse with scanf, discarding prefix with assignment suppression */
            if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
                sum += tmp;             /* add value to sum */
                n++;                    /* increment count of values */
            }
        }
    }

(note: above the assignment suppression operator for sscanf(), '*' allows you to read and discard the prefix and ':' without having to store the prefix in a second string)

All that remains is checking if values are contained in sum by checking your count n and if so, output the average for the prefix. Or, if n == 0 the prefix was not found in the file, e.g.:

    if (n)  /* if values found, output average */
        printf ("prefix '%s' avg: %.4f\n", str, sum / n);
    else    /* output not found */
        printf ("prefix '%s' -- not found in file.\n", str);
}

That is basically all you need. With it, you can read from any file you like and search for any prefix simply passing the filename and prefix as the first two arguments to your program. The complete example would be:

#include <stdio.h>
#include <string.h>

#define MAXC 1024   /* if you need a constant, #define one (or more) */

int main (int argc, char **argv) {

    char buf[MAXC] = "", *str = NULL;   /* buffer for line and ptr to search str */
    size_t n = 0, len = 0;              /* counter and search string length */
    double sum = 0;                     /* sum of matching lines */
    FILE *fp = NULL;                    /* file pointer */

    if (argc < 3) { /* validate 2 arguments given - filename, search_string */ 
        fprintf (stderr, "error: insufficient number of arguments\n"
                "usage: %s filename search_string\n", argv[0]);
        return 1;
    }

    if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
        perror ("fopen-filename");
        return 1;
    }
    str = argv[2];                      /* set pointer to search string */
    len = strlen (str);                 /* get length of search string */

    while (fgets (buf, MAXC, fp)) {             /* read each line into buf */
        if (strncmp (buf, str, len) == 0) {     /* if prefix matches */
            double tmp;                         /* temporary double for parse */
            /* parse with scanf, discarding prefix with assignment suppression */
            if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
                sum += tmp;             /* add value to sum */
                n++;                    /* increment count of values */
            }
        }
    }

    if (n)  /* if values found, output average */
        printf ("prefix '%s' avg: %.4f\n", str, sum / n);
    else    /* output not found */
        printf ("prefix '%s' -- not found in file.\n", str);
}

Example Use/Output

Using your data file stored in dat/prefixdouble.txt, you can search for each prefix in the file and obtain the average, e.g.

$ ./bin/prefixaverage dat/prefixdouble.txt hhhh
prefix 'hhhh' avg: 0.8874

$ ./bin/prefixaverage dat/prefixdouble.txt xxxx
prefix 'xxxx' avg: 0.9105

$ ./bin/prefixaverage dat/prefixdouble.txt yyyy
prefix 'yyyy' avg: 0.7962

$ ./bin/prefixaverage dat/prefixdouble.txt zzzz
prefix 'zzzz' avg: 0.7897

$ ./bin/prefixaverage dat/prefixdouble.txt foo
prefix 'foo' -- not found in file.

Much easier than having to recompile each time you want to search for another prefix. Look things over and let me know if you have further questions.

David C. Rankin
  • 69,681
  • 6
  • 44
  • 72
  • Thank you so much for the very detailed explanation. Unfortunately, despite the quality of your answer, I'm not a C programmer... Thus I am having some difficulty understanding how to pass the two parameters to the `main()`... – FydRose Mar 04 '20 at 13:03
  • 1
    The two parameters, `int argc` (argument count) simply tells you how many command line arguments there are. (the first argument is always the program name being run, so the first user argument is `argv[1]`), and `char *argv[]` (argument vector) is an array of pointers to nul-terminated strings, with the first pointer after the last argument set to `NULL` as a *sentinel NULL*. In your case there are 2 user-arguments (3-total with the prog. name) In the 1st example `argv[1]` is `"dat/prefixdouble.txt"` (filename) and `argv[2]` is `"hhhh"` (prefix to average values for) – David C. Rankin Mar 04 '20 at 16:32
  • So, in my specific case, how should I change the code? Because, right now, it fails to compile – FydRose Mar 05 '20 at 08:35
  • Are you on an embedded (freestanding) system like Arduino or a TI-MSP432? – David C. Rankin Mar 05 '20 at 08:52
  • I'm simply writing this C program on Visual Studio 2019. As a Console App – FydRose Mar 05 '20 at 08:57
  • 1
    The is should compile fine. You will receive the `4996` warning unless you include `#define _CRT_SECURE_NO_WARNINGS` but that is just due to Microsoft implementing the [Annex K](http://port70.net/~nsz/c/c11/n1570.html#K) extensions for `scanf_s, etc..` Otherwise this is plan-Jane vanilla C. What error do you get? (and are you compiling as C or C++, you need the `/Tc` option (or `/TC` for all sources) to compile as C) – David C. Rankin Mar 05 '20 at 09:09
  • Do this, open the **VS Developer's Command Prompt** and change to the directory with your source file (say `prefixavg.c`). Then compile with `cl /nologo /W3 /wd4996 /Ox /Feprefixavg.exe /Tc prefixavg.c` (you don't need to set up a project at all to compile from the command line -- much faster and easier that way) Type `cl /?` to see all compile options. – David C. Rankin Mar 05 '20 at 09:15
  • `error C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. error C4996: 'sscanf': This function or variable may be unsafe. Consider using sscanf_s instead.` – FydRose Mar 05 '20 at 09:16
  • 1
    Exactly, include `#define _CRT_SECURE_NO_WARNINGS` at the top of the file, or disable the warning (which you have set to interpret as an error) with `/wd4996`. (this is a Microsoft VS thing, not a C thing...) Or you can use the `..._s` functions instead, but then your code will be non-portable to most compilers. See [Compiler Warning (level 3) C4996 | Microsoft Docs](https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-3-c4996) – David C. Rankin Mar 05 '20 at 09:17
  • Ok. It worked: thanks. But, of course, the result is the fprintf that says ""error: insufficient number of arguments". I still don't get how to pass arguments, in my case – FydRose Mar 05 '20 at 09:22
  • 1
    Well, open your options and set your command line options, or really, just open a **Command Prompt** (or **PowerShell**) or **VS Developer's Command Prompt** and change to your project directory (usually under `DEBUG`) and you can run your program at the command prompt with `yourprogname.exe filetoread prefix` I hate IDE's for new Programmers dealing with short 1-source programs -- because you end up dealing with these "Where to I find the Right Setting Dialog?" problems instead of actual programming problems. It's easier to use the *Command LIne* for *Command Line Programs* `:)` – David C. Rankin Mar 05 '20 at 09:26
  • See [Debugging with command-line parameters in Visual Studio](https://stackoverflow.com/questions/298708/debugging-with-command-line-parameters-in-visual-studio) for where to set them in the IDE. It's worth finding out where to set them if you will be using the VS IDE, because your alternative is to *hardcode* the filename and prefix -- which is just plain *wrong*. – David C. Rankin Mar 05 '20 at 09:29
  • That's brilliant! Thank you so much! Would you be so kind to answer one last question? The prefix is always 4 characters long. I want to be able to pass 2 characters (as parameters) and have a match if the prefix **includes** these characters. How could I do this? – FydRose Mar 06 '20 at 07:50
  • Sure, instead of `strncmp` you will want to use `strstr` (to match a substring). In this case since the prefix is always 4 characters, the portable way would simply be to copy the first 4 chars in `buf` to a temp string (e.g. `char tmpstr[8]; memcpy (tmpstr, buf, 4); tmpstr[4] = 0;` then use `if (strstr (tmpstr, prefix2char)) { double tmp; ....` (you could use `tmpstr[5]` if you wanted to sqeeze every byte, but the old adage *Don't Skimp on Buffer Size* saves a lot of *off-by-one* grief) – David C. Rankin Mar 06 '20 at 08:05