0

Beginner here,

I'm having difficulty understanding how to edit the contents of a txt file in c#. I'm trying to do the following -pseudocode:

foreach word in file.txt
        if ((word.length < 4) || (word.length > 11))
                        delete word from file.txt

What do I need to be doing? I know it involves the streamreader/writer classes but I don't get how they work.

5 Answers5

3

At first glance this seems simple to do using a StreamReader reading the file, splitting on the space and then removing the words that don't meet the length criteria. And then using the StreamWriter to write the result back. However with string parsing (word parsing) you run into a bunch of "special" cases where extra processing may be required.

Words are hard to describe in a programming language. For example a word may contain puncuation that is part of the word, or it may start \ end with punction that denotes the end of a sentence, new line etc.

Now that being said lets say we had the following rules.

  • A word contains one or more alphanumeric characters
  • A word may contain the following puncuation. [-,_']
  • A word may be separated by punctuation or a space.

Following these rules we can easily read all the text and perform the manipulations you have asked for. I would start with the word processing first. What you can do is create a static class for this. Lets call this class WordProcessor.

Here is commented code on parsing a word based on our rules from a string.

/// <summary>
/// characters that denote a new word
/// </summary>
const string wordSplitPuncuation = ",.!&()[] \"";

/// <summary>
/// Parse a string
/// </summary>
/// <param name="inputString">the string to parse</param>
/// <param name="preservePuncuation">preserve punctuation in the string</param>
/// <returns></returns>
public static IList<string> ParseString(string inputString, bool preservePuncuation)
{
    //create a list to hold our words
    List<string> rebuildWords = new List<string>();

    //the current word
    string currentWord = "";

    //iterate through all characters in a word
    foreach(var character in inputString)
    {
        //is the character is part of the split characters 
        if(wordSplitPuncuation.IndexOf(character) > -1)
        {
            if (currentWord != "")
                rebuildWords.Add(currentWord);
            if (preservePuncuation)
                rebuildWords.Add("" + character);
            currentWord = "";
        }
        //else add the word to the current word
        else
            currentWord += character;
    }
    return rebuildWords;
}

Now the above is pretty basic and if you set the preserve puncuation to true you get the same string back.

The next part of the class will actually be used to remove words that are less than a specific length or greater than a specific length. This uses the method above to split the word into pieces and evaluate each piece individually against the variables.

/// <summary>
/// Removes words from a string that are greater or less than the supplied lengths
/// </summary>
/// <param name="inputString">the input string to parse</param>
/// <param name="preservePuncuation">flag to preserve the puncation for rebuilding the string</param>
/// <param name="minWordLength">the minimum word length</param>
/// <param name="maxWordLength">the maximum word length</param>
/// <returns></returns>
public static string RemoveWords(string inputString, bool preservePuncuation, int minWordLength, int maxWordLength)
{
    //parse our string into pieces for iteration
    var words = WordProcessor.ParseString(inputString, preservePuncuation);

    //initialize our complete string container
    List<string> completeString = new List<string>();

    //enumerate each word
    foreach (var word in words)
    {
        //does the word index of zero matches our word split (as puncuation is one character)
        if (wordSplitPuncuation.IndexOf(word[0]) > -1)
        {
            //are we preserviing puncuation
            if (preservePuncuation)
                //add the puncuation
                completeString.Add(word);
        }
        //check that the word length is greater or equal to the min length and less than or equal to the max word length
        else if (word.Length >= minWordLength && word.Length <= maxWordLength)
            //add to the complete string list
            completeString.Add(word);
    }
    //return the completed string by joining the completed string contain together, removing all double spaces and triming the leading and ending white spaces
    return string.Join("", completeString).Replace("  ", " ").Trim();
}

Ok so the above method simple runs through and extracts the words that match a certain criteria and preserves the punctuation. The final piece of the puzzle is reading \ writing the file to disk. For this we can use the StreamReader and StreamWriter. (Note if you have file access problems you may want to look at the FileStream class).

Now the same code below simple reads a file, invokes the methods above and then writes the file back to the original location.

/// <summary>
/// Removes words from a file
/// </summary>
/// <param name="filePath">the file path to parse</param>
/// <param name="preservePuncuation">flag to preserve the puncation for rebuilding the string</param>
/// <param name="minWordLength">the minimum word length</param>
/// <param name="maxWordLength">the maximum word length</param>
public static void RemoveWordsFromAFile(string filePath, bool preservePuncuation, int minWordLength, int maxWordLength)
{


    //our parsed string
    string parseString = "";

    //read the file
    using (var reader = new StreamReader(filePath))
    {
        parseString = reader.ReadToEnd();
    }

    //open a new writer
    using (var writer = new StreamWriter(filePath))
    {
        //parse our string to remove words
        parseString = WordProcessor.RemoveWords(parseString, preservePuncuation, minWordLength, maxWordLength);

        //write our string
        writer.Write(parseString);
        writer.Flush();
    }
}

Now the above code same simple opens the file, parses the file against your parameters and then re-writes the file.

This can be then be reused by simply calling the method directly such as.

WordProcessor.RemoveWordsFromAFile(@"D:\test.txt", true, 4, 10);

On a final note. This is by no means the most effective way to handle your request, and by no means built for performance. This is simply a demonstration on how you could parse words out of a file.

Cheers

Nico
  • 12,043
  • 5
  • 38
  • 60
0

The concept is going to be more along the lines of:

While(there is input to read from the input file)
{
read the input
if(input fits your criteria of shorter than 4 or longer than 11)
   ignore it
else
   write it to output file (which is a new file, NOT the file you read it from)
}

You can use streamreader.readline()

kmort
  • 2,638
  • 2
  • 29
  • 49
  • Thanks for the reply. Is it not at all possible to overwrite the original file? Also what's going on with creating a stream- why can't I just run the method? –  Nov 13 '13 at 05:26
  • @dreadbeat You **can** overwrite the original file, but it is a ton simpler to write it out to a new file. Remember this is information stored on magnetic media. If you just "erase" a portion of a text file, you've got to read the whole rest of the file from that point on, and re-write it, starting at the location you erased. It can be done, but it's not nice. Much better and easier to create a new output file, and when you are done, if you need to delete the original and rename the output, you can. Also, you must create the stream that you write to. This preps the disk to be ready for a file. – kmort Nov 13 '13 at 05:40
0

I would look into regex to do pattern matching based on the requirements you describe in your question: Here's a good tutorial on regex. Target the words and replace them with blanks.

Combine that with the following post on how to read/write to text files. Depending on how large the file is, you might be ok just reading the whole file, remove the words you want to delete, and finally write the whole content back. How to both read and write a file in C#

If the file is very large you might have to optimize this and read the file in chunks instead.

Community
  • 1
  • 1
TGH
  • 37,121
  • 10
  • 94
  • 126
0

Try this.

  1. Get the contents of the text file in a string variable.

  2. split the text with space as delimiter to get the words in an array.

  3. then join the words in that array to meet your criteria write back

    to the text file.

sudhAnsu63
  • 5,510
  • 4
  • 34
  • 50
0
        var filePath = HttpRuntime.AppDomainAppPath + "your file path";
        if (!File.Exists(filePath))
            return;
        using (var sr = new StreamReader(filePath))
        {
            var text = sr.ReadToEnd();
            if (text.Length < 4 || text.Length > 11)
            {
                using (var sw = new StreamWriter(filePath))
                {
                    sw.Write("");
                }
            }
        }
shimron
  • 566
  • 6
  • 18
  • Not sure what's going on here and I'm getting an error: "the name http runtime does not exist in current context" –  Nov 13 '13 at 05:40
  • @dreadbeat I guess your app is a winform project, then remove the HttpRuntime.AppDomainAppPath, make the 'filepath' be a absolutely path – shimron Nov 13 '13 at 05:50