0

I am trying to read in from a text file a poem that contains commas, spaces, periods, and newline character. I am trying to use getline to read in each separate word. I do not want to read in any of the commas, spaces, periods, or newline character. As I read in each word I am capitalizing each letter then calling my insert function to insert each word into a binary search tree as a separate node. I do not know the best way to separate each word. I have been able to separate each word by spaces but the commas, periods, and newline characters keep being read in.

Here is my text file:

Roses are red, Violets are blue, Data Structures is the best, You and I both know it is true.

The code I am using is this:

string inputFile;
    cout << "What is the name of the text file?";
    cin >> inputFile;

    ifstream fin;
    fin.open(inputFile);

    //Input once
    string input;
    getline(fin, input, ' ');
    for (int i = 0; i < input.length(); i++)
    {
        input[i] = toupper(input[i]);
    }
    //check for duplicates
    if (tree.Find(input, tree.Current, tree.Parent) == true)
    {
        tree.Insert(input);
        countNodes++;
        countHeight = tree.Height(tree.Root);
    }

Basically I am using the getline(fin,input, ' ') to read in my input.

Matthew S.
  • 201
  • 1
  • 5
  • 17
  • You could use `getline(fin, input, '\n');` to grab each line. And then **parse** the line for words. To parse a word, you could use the `find_first_of` member function of `std::string`. – Anon Mail Dec 06 '15 at 21:04
  • Is there anyway do solve this without using outside classes? I am a student and we are not supposed to do it this way. – Matthew S. Dec 07 '15 at 05:10
  • 1
    You're already are using the `getline` function and the `std::string` class. – Anon Mail Dec 07 '15 at 05:53

4 Answers4

1

I was able to figure out a solution. I was able to read in an entire line of code into the variable line, then I searched each letter of the word and only kept what was a letter and I stored that into word.Then, I was able to call my insert function to insert the Node into my tree.

const int MAXWORDSIZE = 50;
    const int MAXLINESIZE = 1000;
    char word[MAXWORDSIZE], line[MAXLINESIZE];
    int lineIdx, wordIdx, lineLength;
    //get a line
    fin.getline(line, MAXLINESIZE - 1);
    lineLength = strlen(line);
    while (fin)
    {
        for (int lineIdx = 0; lineIdx < lineLength;)
        {
            //skip over non-alphas, and check for end of line null terminator
            while (!isalpha(line[lineIdx]) && line[lineIdx] != '\0')
                ++lineIdx;

            //make sure not at the end of the line
            if (line[lineIdx] != '\0')
            {
                //copy alphas to word c-string
                wordIdx = 0;
                while (isalpha(line[lineIdx]))
                {
                    word[wordIdx] = toupper(line[lineIdx]);
                    wordIdx++;
                    lineIdx++;
                }
                //make it a c-string with the null terminator
                word[wordIdx] = '\0';

                //THIS IS WHERE YOU WOULD INSERT INTO THE BST OR INCREMENT FREQUENCY COUNTER IN THE NODE
                if (tree.Find(word) == false)
                {
                    tree.Insert(word);
                    totalNodes++;
                    //output word
                    //cout << word << endl;
                }
                else
                {
                    tree.Counter();
                }
            }
Matthew S.
  • 201
  • 1
  • 5
  • 17
0

You can make a custom getline function for multiple delimiters:

std::istream &getline(std::istream &is, std::string &str, std::string const& delims)
{
    str.clear();

    // the 3rd parameter type and the condition part on the right side of &&
    // should be all that differs from std::getline
    for(char c; is.get(c) && delims.find(c) == std::string::npos; )
        str.push_back(c);

    return is;
}

And use it:

getline(fin, input, " \n,.");
LogicStuff
  • 18,687
  • 6
  • 49
  • 70
0

This is a good time for a technique I've posted a few times before: define a ctype facet that treats everything but letters as white space (searching for imbue will show several examples).

From there, it's a matter of std::transform with istream_iterators on the input side, a std::set for the output, and a lambda to capitalize the first letter.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
0

You can use std::regex to select your tokens

Depending on the size of your file you can read it either line by line or entirely in an std::string.

To read the file you can use :

std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
                 std::istreambuf_iterator<char>());

and this will do the matching for space separated string.

std::regex word_regex(",\\s]+");
auto what = 
    std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();

std::vector<std::string> v;
for (;what!=wend ; wend) {
    std::smatch match = *what;
    V.push_back(match.str());
}

I think to separate tokens separated either by , space or new line you should use this regex : (,| \n| )[[:alpha:]].+ . I have not tested though and it might need you to check this out.

Community
  • 1
  • 1
g24l
  • 2,819
  • 11
  • 27