0

I am using two dynamic arrays to read from a file. They are to keep track of each word and the amount of times it appears. If it has already appeared, I must keep track in one array and not add it into the other array since it already exists. However, I am getting blank spaces in my array when I meet a duplicate. I think its because my pointer continues to advance, but really it shouldn't. I do not know how to combat this. The only way I have was to use a continue; when I print out the results if the array content = ""; if (*(words + i) == "") continue;. This basically ignores those blanks in the array. But I think that is messy. I just want to figure out how to move the pointer back in this method. words and frequency are my dynamic arrays.

I would like guidance in what my problem is, rather than solutions.

I have now changed my outer loop to be a while loop, and only increment when I have found the word. Thank you WhozCraig and poljpocket.

Now this occurs. enter image description here

  • Just so you know, we have no idea whatsoever if the `words` and `frequency` arrays are properly allocated at all. And you need to look closely at what entry's word and frequency is being updated with your usage of `i`. Look *very* closely at it. – WhozCraig Jan 23 '14 at 06:40
  • Edited in the creations of the arrays. I will look into what you mean right now. –  Jan 23 '14 at 06:43
  • @WhozCraig I try to set i-- when there is a duplicate found, but when I run, it seems to go nowhere. Some infinite loop? Not sure why, either. –  Jan 23 '14 at 06:47
  • you're incorrectly using it for insertions *and* your loop control variable simultaneously. Further, is this supposed to read the entire file (as opposed just come passed-in number `count` words? – WhozCraig Jan 23 '14 at 07:18
  • @WhozCraig I will look again and see where I am goofing up. This is suppose to read the whole file. What I did is have a function that counted all the words earlier that returned that count. Thus, giving me a reasonable initial size for the dynamic array. Then I reopen the file and do the addWords. –  Jan 23 '14 at 07:24
  • @WhozCraig Updated. Can you speculate to as why the duplicates have made space in the array at the end? –  Jan 23 '14 at 08:13
  • Because there are fewer *unique* words in your file than there are words. How many entries would you expect in your array if you had a file with 1000 duplicates of the word "word". Though you sized your array for space for 1000 unique words, in fact there is only one, "word", 1000 times. – WhozCraig Jan 23 '14 at 08:41
  • Hmm, so a dynamic array will not account for that? Is this why we dynamically allocate memory? –  Jan 23 '14 at 09:08
  • So, when I print, I should only print if an input is found. –  Jan 23 '14 at 09:09
  • When done, you can return `i` (in the code I posted) from your function to let the caller know how many unique entries were found. Loop 0...n on that to report only the uniques and their frequencies. – WhozCraig Jan 23 '14 at 09:10
  • Ah! That is smart. Sorry, it has been a long day. Thank you, WhozCraig! I will report back after I compile to see if all is well :-) –  Jan 23 '14 at 09:12
  • @WhozCraig I am getting an error once the file is read. it says: Segmentation fault and stops the progam –  Jan 23 '14 at 09:21

5 Answers5

1

Instead of incrementing your loop variable [i] every loop, you need to only increment it when a NEW word is found [i.e. not one already in the words array].

Also, you're wasting time in your inner loop by looping through your entire words array, since words will only exist up to index i.

 int idx = 0;
 while (file >> hold && idx < count) {
    if (!valid_word(hold)) {
        continue;
    }

    // You don't need to check past idx because you
    // only have <idx> words so far.
    for (int i = 0; i < idx; i++) {
        if (toLower(words[i]) == toLower(hold)) {
            frequency[i]++;
            isFound = true;
            break;
        }
    }

    if (!isFound) {
        words[idx] = hold;
        frequency[idx] = 1;
        idx++;
    }
    isFound = false;
 }
cioffstix
  • 81
  • 2
0

Why not use a std::map?

void collect( std::string name, std::map<std::string,int> & freq ){
  std::ifstream file;
  file.open(name.c_str(), std::ifstream::in );
  std::string word;
  while( true ){
    file >> word; // add toLower
    if( file.eof() ) break;
    freq[word]++;
  }
  file.close();
}

The problem with your solution is the use of count in the inner loop where you look for duplicates. You'll need another variable, say nocc, initially 0, used as limit in the inner loop and incremented whenever you add another word that hasn't been seen yet.

laune
  • 30,276
  • 3
  • 26
  • 40
  • I would if I could, but this program is to work with pointers (dynamic arrays). –  Jan 23 '14 at 07:00
  • `while(!file.eof())` is wrong. You'll insert the last word twice. [See this for why](http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong). – WhozCraig Jan 23 '14 at 07:39
  • @WhozCraig Thanks - I should have checked the doc for tokenizing read. – laune Jan 23 '14 at 10:32
0

check out SEEK_CUR(). If you want to set the cursor back

Ansh David
  • 580
  • 1
  • 7
  • 24
0

The problem is a logical one, consider several situations:

  1. Your algorithm does not find the current word. It is inserted at position i of your arrays.
  2. Your algorithm does find the word. The frequency of the word is incremented along with i, which leaves you with blank entries in your arrays whenever there's a word which is already present.

To conclude, 1 works as expected but 2 doesn't.

My advice is that you don't rely on for loops to traverse the string but use a "get-next-until-end" approach which uses a while loop. With this, you can track your next insertion point and thus get rid of the blank entries.

int currentCount = 0;
while (file)
{
     // your inner for loop
     if (!found)
     {
         *(words + currentCount) = hold;
         *(frequency + currentCount) = 1;
         currentCount++;
     }
}
poljpocket
  • 258
  • 1
  • 10
0

First, to address your code, this is what it should probably look like. Note how we only increment i as we add words, and we only ever scan the words we've already added for duplicates. Note also how the first pass will skip the j-loop entirely and simply insert the first word with a frequency of 1.

void addWords(const std::string& fname, int count, string *words, int *frequency)
{
    std::ifstream file(fname);
    std::string hold;

    int i = 0;
    while (i < count && (file >> hold))
    {
        int j = 0;
        for (; j<i; ++j)
        {
            if (toLower(words[j]) == toLower(hold))
            {
                // found a duplicate at j
                ++frequency[j];
                break;
            }
        }

        if (j == i)
        {
            // didn't find a duplicate
            words[i] = hold;
            frequency[i] = 1;
            ++i;
        }
    }
}

Second, to really address your code, this is what it should actually look like:

#include <iostream>
#include <fstream>
#include <map>
#include <string>

//
// Your implementation of toLower() goes here.
//


typedef std::map<std::string, unsigned int> WordMap;

WordMap addWords(const std::string& fname)
{
    WordMap words;

    std::ifstream inf(fname);
    std::string word;

    while (inf >> word)
        ++words[toLower(word)];

    return words;
}

If it isn't obvious by now how a std::map<> makes this task easier, it never will be.

WhozCraig
  • 59,130
  • 9
  • 69
  • 128
  • I understand Maps from Java. However, this is to practice pointers since I'm new to c++. I completely understand what you mean by how much better a map is for this. –  Jan 23 '14 at 07:39
  • 1
    @Brandon granted. now look at the code before it, particularly the increment of `i`, which only happens once we know there is no prior entry matching. – WhozCraig Jan 23 '14 at 07:40
  • ah, so it seems the outer for loop was a poor choice. I should only increment when I have NOT found the word and this will be done in a while loop. hmm ok, i took a stab at it, but it seems create empty spaces again, but shove them to the end of the array. –  Jan 23 '14 at 08:00
  • 1
    For the blanks at the end of the array: look at my last comment in my answer. – poljpocket Jan 23 '14 at 18:12
  • Also, it now states segmentation fault. Do you have any idea to why it is doing that? –  Jan 23 '14 at 19:22
  • @Brandon that means you're reading-from or writing-to memory you don't own, either through indexing an array beyond its allocated limits, or using an outright invalid, indeterminate pointer. Most debuggers will break immediately upon this happening, and tell you exactly where it is happening, along with a call-stack to show you how you got to that point. Run your program under a debugger (gdb, Visual Studio or WinDbg if on Windows) and let it crash. You'll find it. – WhozCraig Jan 23 '14 at 19:35
  • @WhozCraig I used Visual Studio, and it was fine for some reason. But when I run it in the linux terminal and use -Wall, it compiles fine but the segmentation fault appears after I run. Do you recommend gbd? –  Jan 23 '14 at 19:37
  • @WhozCraig I will refer to this: http://www.cplusplus.com/articles/iwTbqMoL/ . Thank you, again!! –  Jan 23 '14 at 19:39