I have a few questions about an assignment that i need to do. It might seem that what im looking for is to get the code, however, what im trying to do is to learn because after weeks of searching for information im lost. Im really new at
C`.
Here is the assignment :
- Given 3 files (
foo.txt
,bar.txt
,foo2.txt
) they all have a different amount of words (I need to use dynamic memory).
Create a program that ask for a word and tells you if that word is in any of the documents (the result is the name of the document where it appears).
Example :
- Please enter a word: dog
- "dog" is in foo.txt and bar.txt
(I guess i need to load the 3 files, create a hash table that has the keyvalues for every word in the documents but also has something that tells you which one is the document where the word is at).
I guess i need to implement:
- A
Hash Function
that converts a word into aHashValue
- A
Hash Table
that stores theHashValue
of every word (But i think i should also store the document index?). - Use of
dynamic allocation
. - Check for
collisions
while im inserting values into the hash table (UsingQuadratic Probing
and alsoChaining
). - Also i need to know how many times the word im looking for appears in the text.
I've been searching about hashmaps implementations, hash tables , quadratic probing, hash function for strings...but my head is a mess right now and i dont really now from where i should start.
so far i`ve read :
Algorithm to get a list of all words that are anagrams of all substrings (scrabble)?
Implementing with quadratic probing
Does C have hash/dictionary data structure?
https://gist.github.com/tonious/1377667
http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)HashTables.html?highlight=(CategoryAlgorithmNotes)
Sorry for my english in advance.
Hope you can help me.
Thanks.
FIRST EDIT
- Thanks for the quick responses.
- I'm trying to put all together and try something, however @Shasha99 I cannot use the
TRIE
data structure, i'm checking the links you gave me. - @MichaelDorgan Thanks for posting a solution for beginners however i must use Hashing (It's for Algorithms and Structures Class) and the teacher told us we MUST implement a Hash Function , Hash Table and probably another structure that stores important information.
After thinking for an hour I tried the following :
- A Structure that stores the word, the number of documents where it appears and the index of those documents.
typedef struct WordMetadata {
char* Word;
int Documents[5];
int DocumentsCount;
} WordMetadata;
- A function that Initializes that structure
void InitTable (WordMetadata **Table) {
Table = (WordMetadata**) malloc (sizeof(WordMetadata) * TABLESIZE);
for (int i = 0; i < TABLESIZE; i++) {
Table[i] = (WordMetadata*) NULL;
}
}
A function that Loads to memory the 3 documents and index every word inside the hash table.
A function that index a word in the mentioned structure
A function that search for the specific word using Quadratic Probing (If i solve this i will try with the chaining one...).
A function that calculates the hash value of a word (I think i will use
djb2
or any of the ones i found here http://www.cse.yorku.ca/~oz/hash.html) but for now :
int Hash (char *WordParam) {
for (int i = 0; *WordParam != '\0';) {
i += *WordParam++;
}
return (i % TABLESIZE);}
EDIT 2
I tried to implement something, its not working but would take a look and tell me what is wrong (i know the code is a mess)
EDIT 3
This code is properly compiling and running, however , some words are not finded (maybe not indexed i' dont know), i'm thinking about moving to another hashfunction as i mentioned in my first message.
Approximately 85% of the words from every textfile (~ 200 words each) are correctly finded by the program.
The other ones are ramdom words that i think are not indexed correctly or maybe i have an error in my search function...
Here is the current (Fully functional) code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLESIZE 4001
#define LINESIZE 2048
#define DELIMITER " \t"
typedef struct TTable {
char* Word; /* The actual word */
int Documents[5]; /* Documents Index */
int DocumentsCount; /* Number of documents where the word exist */
} TTable;
int Hash (char *Word);
void Index (TTable **HashTable, char* Word, int DocumentIndex);
int Search (TTable **HashTable, char* Word);
int mystrcmp(char *s1, char *s2);
char* Documents[] = {"foo.txt","bar.txt","foo2.txt",NULL};
int main() {
FILE* file;
TTable **HashTable
int DocumentIndex;
char Line[LINESIZE];
char* Word;
char* Tmp;
HashTable = (TTable**) malloc (sizeof(TTable)*TABLESIZE);
for (int i = 0; i < TABLESIZE; i++) {
HashTable[i] = (TTable*) NULL;
}
for (DocumentIndex = 0; Documents[DocumentIndex] != NULL; DocumentIndex++) {
file = fopen(Documents[DocumentIndex],"r");
if (file == NULL) {
fprintf(stderr, "Error%s\n", Documents[DocumentIndex]);
continue;
}
while (fgets (Line,LINESIZE,file) != NULL) {
Line[LINESIZE-1] = '\0';
Tmp = strtok (Line,DELIMITER);
do {
Word = (char*) malloc (strlen(Tmp)+1);
strcpy(Word,Tmp);
Index(HashTable,Word,DocumentIndex);
Tmp = strtok(NULL,DELIMITER);
} while (Tmp != NULL);
}
fclose(file);
}
printf("Enter the word:");
fgets(Line,100,stdin);
Line[strlen(Line)-1]='\0'; //fgets stores newline as well. so removing newline.
int i = Search(HashTable,Line);
if (i != -1) {
for (int j = 0; j < HashTable[i]->DocumentsCount; j++) {
printf("%s\n", Documents[HashTable[i]->Documents[j]]);
if ( j < HashTable[i]->DocumentsCount-1) {
printf(",");
}
}
}
else {
printf("Cant find word\n");
}
for (i = 0; i < TABLESIZE; i++) {
if (HashTable[i] != NULL) {
free(HashTable[i]->Word);
free(HashTable[i]);
}
}
return 0;
}
/* Theorem: If TableSize is prime and ? < 0.5, quadratic
probing will always find an empty slot
*/
int Search (TTable **HashTable, char* Word) {
int Aux = Hash(Word);
int OldPosition,ActualPosition;
ActualPosition = -1;
for (int i = 0; i < TABLESIZE; i++) {
OldPosition = ActualPosition;
ActualPosition = (Aux + i*i) % TABLESIZE;
if (HashTable[ActualPosition] == NULL) {
return -1;
}
if (strcmp(Word,HashTable[ActualPosition]->Word) == 0) {
return ActualPosition;
}
}
return -1; // Word not found
}
void Index (TTable **HashTable, char* Word, int DocumentIndex) {
int Aux; //Hash value
int OldPosition, ActualPosition;
if ((ActualPosition = Search(HashTable,Word)) != -1) {
for (int j = 0; j < HashTable[ActualPosition]->DocumentsCount;j++) {
if(HashTable[ActualPosition]->Documents[j] == DocumentIndex) {
return;
}
}
HashTable[ActualPosition]->Documents[HashTable[ActualPosition]->DocumentsCount] = DocumentIndex; HashTable[ActualPosition]->DocumentsCount++;
return;
}
ActualPosition = -1;
Aux = Hash(Word);
for (int i = 0; i < TABLESIZE; i++) {
OldPosition = ActualPosition;
ActualPosition = (Aux + i*i) % TABLESIZE;
if (OldPosition == ActualPosition) {
break;
}
if (HashTable[ActualPosition] == NULL) {
HashTable[ActualPosition] = (TTable*)malloc (sizeof(TTable));
HashTable[ActualPosition]->Word = Word;
HashTable[ActualPosition]->Documents[0] = DocumentIndex;
HashTable[ActualPosition]->DocumentsCount = 1;
return;
}
}
printf("No more free space\n");
}
int Hash (char *Word) {
int HashValue;
for (HashValue = 0; *Word != '\0';) {
HashValue += *Word++;
}
return (HashValue % TABLESIZE);
}