-1

What I'm trying to do is read in a line from a text file, break it down into the words that compose it, and then check each word against a list of "bad words" that I don't want to hash. Each "good word" that isn't in the list of bad words should get hashed and store the entire line at its index (if that makes sense). So, for example, "Ring of Fire" would be split into "Ring", "of", and "Fire". I would hash "Ring" and store "Ring of Fire" with it, I would see "of" and notice that its a bad word and skip it, and finally I would hash "Fire" and store "Ring of Fire" with it as well.

My code as is separates a line into words, compares it with bad words, and displays all good words. It then closes the file, reopens it, and displays all of the lines. What I am having trouble conceptualizing is how to combine the two to hash all the good words and the entire line at the same time so that I can store them easily. How should I go about doing this?

#include <cstring>
#include <cctype>
#include <iostream>
#include <fstream>
using namespace std;

int main()
{
    const char * bad_words[] = {"of", "the", "a", "for", "to", "in", "it", "on", "and"};
    ifstream file;
    file.open("songs.txt");
    //if(!file.is_open()) return;
    char word[50];

while(file >> word)
{
    // if word == bad word, dont hash
    // else hash and store it in my hash table
    bool badword = false;
    for(int i = 0; i < 9; ++i)
    {
        if(strcmp(word, bad_words[i]) == 0)
        {
            badword = true;
        }
    }

    if(badword) continue;
    else
    {
        // get all words in a line that are not in bad_words
        char * good_word = new char[strlen(word)+1];
        strcpy(good_word, word);
        cout << good_word << endl;  // testing to see if works      

        // hash each good_word, store good_line in both of them

        //int index = Hash(good_word);
        //Add(good_line) @ table[index];
    }
}

file.close();
file.open("songs.txt");
while(!file.eof())  // go through file, grab each whole line. store it under the hash of good_word (above)
{
    char line[50];
    file.getline(line, 50, '\n');
    char * good_line = new char[strlen(line)+1];
    strcpy(good_line, line);
    cout << good_line << endl;  // testing to see if works
}

return 0;
}
Musica
  • 1
  • 1
  • Life will be better if you switch from `char[]` to `std::string`. – Thomas Matthews Nov 19 '14 at 20:51
  • When you use the debugger, which line is causing the issue? – Thomas Matthews Nov 19 '14 at 20:52
  • If you work with files in C++, try the QT library with QFile. It realy makes your life more easy. – stupidstudent Nov 19 '14 at 20:53
  • http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong – Neil Kirk Nov 19 '14 at 20:54
  • What are you plans when two different strings generate the same hash code? If you only store the hash code, this could be a problem. – Thomas Matthews Nov 19 '14 at 20:54
  • Create badwords as a set of string and use its count function to see if a string is in it. Ditch chars and use strings. Your final code should not contain any strcpy or new. Those are the bad words :) – Neil Kirk Nov 19 '14 at 20:55
  • The program prompt asks not to use strings. I want to hash each good word so that the user can enter a word and search for the entire line and then return that line. So if I hash "Ring" and "Fire" and the user searches for "Ring" or "Fire" it will return "Ring of Fire". There is no error in the code as is. I'm having trouble figuring out how to do what I just said and was looking for some guidance. – Musica Nov 19 '14 at 21:11

2 Answers2

0

You seem to be looking for std::unordered_multimap.

I would probably also sort the set of "bad" words, and use std::binary_search to see whether it contained a particular word.

std::vector<std::string> bad { "a", "and", "for" /* ... keep sorted */};

std::unordered_multimap<std::string, std::string> index;

while (std::getline(infile, line)) {
    std::istringstream buf(line);
    std::string word;
    while (buf >> word)
       if (!binary_search(bad.begin(), bad.end(), word))
           index.insert(std::make_pair(word, line));
}
Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
0

If you really must implement your own hash table, you can find the description of Hash Table data structure here.

In its simplest form, a hash table is an array of linked lists. The array is indexed with hascode % arraySize, and the linked list takes care of hash collisions.