1

My project takes a filename and opens it. I need to read each line of a .txt file until the first digit occurs, skipping whitespace, chars, zeros, or special chars. My text file could look like this:

1435                 //1, nextline
0                   //skip, next line
                    //skip, nextline
(*Hi 245*) 2       //skip until second 2 after comment and count, next line
345 556           //3 and count, next line 
4                //4, nextline

My desired output would be all the way up to nine but I condensed it:

Digit Count Frequency
1:      1     .25
2:      1     .25
3:      1     .25
4:      1     .25

My code is as follows:

    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;

    int main() {

        int digit = 1;
        int array[8];
        string filename;
        //cout for getting user path
        //the compiler parses string literals differently so use a double backslash or a forward slash
        cout << "Enter the path of the data file, be sure to include extension." << endl;
        cout << "You can use either of the following:" << endl;
        cout << "A forwardslash or double backslash to separate each directory." << endl;
        getline(cin,filename);

        ifstream input_file(filename.c_str());

        if (input_file.is_open()) { //if file is open
            cout << "open" << endl; //just a coding check to make sure it works ignore

       string fileContents; //string to store contents
       string temp;
       while (!input_file.eof()) { //not end of file I know not best practice
       getline(input_file, temp);
       fileContents.append(temp); //appends file to string
    }
       cout << fileContents << endl; //prints string for test
        }
        else {
            cout << "Error opening file check path or file extension" << endl;
        }

In this file format, (* signals the beginning of a comment, so everything from there to a matching *) should be ignored (even if it contains a digit). For example, given input of (*Hi 245*) 6, the 6 should be counted, not the 2.

How do I iterate over the file only finding the first integer and counting it, while ignoring comments?

  • 1
    Why isn't there `0` in the output? And you mean the first digit, or all digits of the first integer? Moreover, you'll need two separate loops (input and output) for this. At least printing you should have had figured out. – LogicStuff Sep 02 '16 at 16:25
  • I dont understand the example, 3 appears more than once in the text – 463035818_is_not_a_number Sep 02 '16 at 16:26
  • I edited it to exclude zeros that was my fault. I just need to find and count the first digit of each line. Not as a whole number just individually. – I'm here for Winter Hats Sep 02 '16 at 16:27
  • 2
    Make a handwritten loop with `std::getline` that uses `std::isdigit`. `vector file_nums {infile_begin, eof};` does not make sense. `eof` is a completely different type of iterator, and even if it were `std::istreambuf_iterator`, you're not parsing anything. – LogicStuff Sep 02 '16 at 16:29
  • 1
    ok I think I understood what you want to do. What is the question? – 463035818_is_not_a_number Sep 02 '16 at 16:29
  • 2
    You really should be forgetting about where the line comes from for now (a file, the keyboard, doesn't matter), and write a function that given a string, returns the number that you're looking for. Then you test that function to see if actually does the job. Once you have that function tested fully, *then* you use it in your larger program. Trying to cram 3 or 4 different tasks in one shot is not the way to go about developing a program incrementally. – PaulMcKenzie Sep 02 '16 at 17:51
  • Now world be a great time to learn regular expressions. (As a bonus, there are some great T-shirts.) – Alan Stokes Sep 02 '16 at 19:49
  • right, the requirement to skip "comments" is a killer. Since we dont know what counts a s acomment its hard to give exact anwer , but this feels like a regex requirement – pm100 Sep 02 '16 at 20:57

1 Answers1

0

One way to approach your problem is the following:

  1. Create a std::map<int, int> where the key is the digit and the value is the count. This allows you to compute statistics on your digits such as the count and the frequency after you have parsed the file. Something similar can be found in this SO answer.
  2. Read each line of your file as a std::string using std::getline as shown in this SO answer.
  3. For each line, strip the comments using a function such as this:

    std::string& strip_comments(std::string & inp, 
                                std::string const& beg, 
                                std::string const& fin = "") {
      std::size_t bpos;
      while ((bpos = inp.find(beg)) != std::string::npos) {
        if (fin != "") {
          std::size_t fpos = inp.find(fin, bpos + beg.length());
          if (fpos != std::string::npos) {
            inp = inp.erase(bpos, fpos - bpos + fin.length());
          } else {
            // else don't erase because fin is not found, but break
            break;
          }
        } else {
          inp = inp.erase(bpos, inp.length() - bpos);
        }
      }
      return inp;
    }
    

    which can be used like this:

    std::string line;
    std::getline(input_file, line);
    line = strip_comments(line, "(*", "*)");
    
  4. After stripping the comments, use the string member function find_first_of to find the first digit:

    std::size_t dpos = line.find_first_of("123456789");
    

    What is returned here is the index location in the string for the first digit. You should check that the returned position is not std::string::npos, as that would indicate that no digits are found. If the first digit is found, the corresponding character can be extracted using const char c = line[dpos]; and converted to an integer using std::atoi.

  5. Increment the count for that digit in the std::map as shown in that first linked SO answer. Then loop back to read the next line.

  6. After reading all lines from the file, the std::map will contain the counts for all first digits found in each line stripped of comments. You can then iterate over this map to retrieve all the counts, accumulate the total count over all digits found, and compute the frequency for each digit. Note that digits not found will not be in the map.

I hope this helps you get started. I leave the writing of the code to you. Good luck!

Community
  • 1
  • 1
aichao
  • 6,680
  • 3
  • 11
  • 16