0

Here's what the teacher assigned us:

Suppose we are an online service that provides a bulletin board for its users. We would like to give our users the option of filtering out profanity. We will consider the words cat, dog, and llama to be profane. Write a program that reads a string from the keyboard and tests whether the string contains one of our profane words. Your program should find words like cAt that differ only in case. You must also not identify words that simply contain what might otherwise be considered a profane word. For example, Dogmatic concatenation is a small category should not be considered profane. Allow the user to use the following punctuation: ( , . ? " ' ( ) ! : ;) This will mean that you would be expected to find “The “Cat” is not a doggone llamaman.” or “Cat, and dog can not be llama.” (Note: You will only be responsible for the first occurrence of a given profane word in a sentence. However, more than one profane word may be contained in a sentence. So “Concatenate the cats”, would not find a profane word but “The doggone cat, and dog are not a llama.” would return 2 profane words cat and llama)

so, I tried this code:

import java.util.Scanner;
public class Degrees 
{
    private static Scanner keyboard = new Scanner(System.in);
    public static void main(String[]args)
    {
        System.out.println("Enter a sentence");
        String sentence = keyboard.nextLine();
        sentence = sentence.toLowerCase();
        if(sentence.indexOf("cat ") != -1)
            System.out.println("the profane word cat was detected");
        else
            System.out.println("the profane word cat wasn't detected");

        if(sentence.indexOf("dog ") != -1)
            System.out.println("the profane word dog was detected");
        else
            System.out.println("the profane word dog wasn't detected");
        if(sentence.indexOf("llama ") != -1)
            System.out.println("the profane word llama was detected");
        else
            System.out.println("the profane word llama wasn't detected");


    }       

}

however the code isn't working how it should be. if i wrote "dogmatic dog" it should only check the first occurrence of dog and see that it is within a word and then ignore the second dog. My code is just ughh. I don't know what I'm missing and what I should add. I've been going at this for 6 hours straight I swear. Please help. I just can't think of anything else. So please, I am open to suggestions and hints.

I also tried using a switch statement but for some reason it was only executing the default.

Luiggi Mendoza
  • 81,685
  • 14
  • 140
  • 306
jiija
  • 21
  • 2
  • 7
  • I think the problem is that you are searching for "cat " with a space at the end and so on. so it does not find the word if it is at the end of the input – Loreno Heer Jan 26 '15 at 02:40
  • but if i don't add the space then it would make concatenate a profanity. – jiija Jan 26 '15 at 02:41

3 Answers3

1

You'll have to create a "mini-parser" that'll iterate the words in the sentence and will check each one of them if it's considered profane or not.

Partially implemented solution:

public static void main(String[] args) {

    String s = "The doggone cat, and dOg are not a llama.";
    s = s.toLowerCase();
    Scanner sc = new Scanner(s);
    List<String> profaneWords = generateProfaneList();
    int counter = 0;
    while (sc.hasNext()) {
        String word = sc.next();
        for (String profane : profaneWords) {
            if (word.matches(".*\\b" + profane + "\\b.*") && // check an exact match
                    ! s.matches(".*" + profane + "[a-z].*\\b" + profane + "\\b.*") && // check that profane is not
                    ! s.matches(".*[a-z]" + profane + ".*\\b" + profane + "\\b.*")) { // included as part of another word
                counter++;
                System.out.println("The word '" + profane + "' is profane!");
            }
        }
    }
    System.out.println(counter + " profane words were found");
}

private static List<String> generateProfaneList() {
    List<String> profaneWords =  new ArrayList<>();
    profaneWords.add("dog");
    profaneWords.add("cat");
    profaneWords.add("llama");
    return profaneWords;
}

OUTPUT

The word 'cat' is profane!
The word 'llama' is profane!
2 profane words were found
Nir Alfasi
  • 49,889
  • 11
  • 75
  • 119
  • thanks this looks alot better, however we're not allowed to use loops – jiija Jan 26 '15 at 03:12
  • This solution is still wrong: `dog` should have not been detected. – Luiggi Mendoza Jan 26 '15 at 03:12
  • @LuiggiMendoza then I'm missing something - why `dog` shouldn't be detected ? – Nir Alfasi Jan 26 '15 at 03:16
  • It's at the bottom of the assignment (emphasis mine): *However, more than one profane word may be contained in a sentence. So **“Concatenate the cats”**, would not find a profane word but **“The doggone cat, and dog are not a llama.”** would return **2 profane words** cat and llama)* – Luiggi Mendoza Jan 26 '15 at 03:24
  • @LuiggiMendoza I think it's a mistake since it's not included in the list of requirements, further, see the OP's comment on sanjab's answer. – Nir Alfasi Jan 26 '15 at 03:28
  • There are two examples on this part from professor assignment. For me, looks like OP hasn't realized this. – Luiggi Mendoza Jan 26 '15 at 03:29
  • @LuiggiMendoza if the word `dog` should be considered as profane in some cases and shouldn't be considered as such on other cases, then the question lacks a good definition of these cases. If you understood something that I didn't (such as: a definition for such cases) please explain. – Nir Alfasi Jan 26 '15 at 03:32
  • It's very clear from both examples. For *“Concatenate the cats”, would not find a profane word*, it contains `cat` in `Concatenate` so `cat` was checked and then failed, so there's no need to continue checking for `cat` word in the sentence. For *“The doggone cat, and dog are not a llama.”*, there's `dog` in `doggone` so `dog` is discarded and should not be evaluated even if there is `dog` before `are not a llama`,so the output will be `cat` and `llama` only. – Luiggi Mendoza Jan 26 '15 at 03:36
  • Also, it's noted in the last statement of the body of the question (emphasis mine): *however the code isn't working how it should be. **if i wrote "dogmatic dog" it should only check the first occurrence of dog and see that it is within a word and then ignore the second dog.*** – Luiggi Mendoza Jan 26 '15 at 03:57
  • @LuiggiMendoza correction accepted and corrected. Thanks! – Nir Alfasi Jan 26 '15 at 04:33
1

I suggest using this algorithm:

  • Define all the profane words in an array. Let's call it profaneWords.
  • Split the sentence into several strings using whitespace. This will be stored into an array, let's call it wordsToAnalyze
  • For each word (string) in profaneWords, let's call the current word profane:
    • Create a flag to check if profane has been found. Let's call it found. Initialize it with a value of no.
    • For each word (string) in wordsToAnalyze, let's call the current word analyzeMe:
      • Trim all non-characters from analyzeMe.
      • Check if analyzeMe is equal to profane. If it does, then mark found to yes and break the for loop.
      • Check if analyzeMe contains profane. If it does, then break the current for loop.
    • If found is yes, then report that the profane word has been identified.

I won't provide the proper Java implementation for algorithm above. Instead, just a pseudo code (after all, it's homework, so it's your job to do the code, not ours =) ):

profaneWords = { "cat", "dog", "llama" } //why llama is profane? =(
wordsToAnalyze = sentence.split(" ") //this can be improved but you should not use regex yet
for each profane in profaneWords
begin for
    found = false
    for each analyzeMe in wordsToAnalyze
    begin for
        analyzeMe = trimNonCharacters(analyzeMe)
        if (analyzeMe is equal to profane)
            found = true
            break
        if (analyzeMe contains profane)
            break
    end for
    if (found is true)
        print "The word " + profane + " was found."
end for

For trimNonCharacters you may create another method that basically reads every character from the string parameter and removes any non-character in it and create a new string. You may use a StringBuilder for this:

public static String trimNonCharacters(String string) {
    int startIndex = 0;
    int endIndex = string.length();
    for (int i = 0; i < string.length(); i++) {
        if (Character.isLetter(string.charAt(i))) {
            break;
        }
        startIndex++;
    }
    for (int i = string.length() - 1; i >= 0; i--) {
        if (Character.isLetter(string.charAt(i))) {
            break;
        }
        endIndex--;
    }
    String result = "";
    if (startIndex <= endIndex) {
        result = string.substring(startIndex, endIndex);
    }
    return result;
}
Luiggi Mendoza
  • 81,685
  • 14
  • 140
  • 306
0

This is a great candidate for a regex:

System.out.println("Enter a sentence");
String sentence = keyboard.nextLine();
sentence = sentence.toLowerCase();

Pattern p = Pattern.compile("\Wcat\W|\Wdog\W|\Wllama\W");
Matcher m = p.matcher(sentence);
boolean matchFound = m.matches();

\W will match any non-digit and non-word so as an example concatenate would not trigger a match but "cat would.

For more information: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

  • Good but not good enough: it won't detect profane words that differ only in case! – Nir Alfasi Jan 26 '15 at 02:55
  • Converting to lowercase first removes the need for a more complicated regex "sentence = sentence.toLowerCase();" –  Jan 26 '15 at 02:57
  • From question: *“Concatenate the cats”, would not find a profane word*. Your answer doesn't apply for this. – Luiggi Mendoza Jan 26 '15 at 03:00
  • I really wasn't taught regex. So if there is any other way to solve the problem please share. – jiija Jan 26 '15 at 03:02
  • My answer wasn't intended to provide a complete regex solution, but simply to point the question asker in the right direction and encourage their learning –  Jan 26 '15 at 03:03
  • @stacksonstacks I don't think I have enough time to learn what regex is, besides it wasn't covered in class. Is there any other way. – jiija Jan 26 '15 at 03:04
  • 1
    Use "\b(dog|cat|llama)\b" (a pipe between each term), and convert your sentences to lowercase before matching. The effort to learn regex is by far better than simulating regex via character-by-character analysis.Start by checking out the Stack Overflow regex faq: http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean – aliteralmind Jan 26 '15 at 03:08
  • @jiija If the number of capital letters matter ("CaT" is not profane, but "Cat" is), my above advice will still work as long as you the follow it by analyzing the potential profane word *in its original case*, and determine if it has an acceptable number of upper/lowercase letters. If it does, done. If not, find the next potential match and repeat. Regex is your friend. Do not be afraid of the regex. – aliteralmind Jan 26 '15 at 03:52
  • @aliteralmind `CaT`, `cAt`, `CAt`, `cAT` and others are profane. The only case where the profane word cannot be found is when the word contains the profane e.g. dogmatic. – Luiggi Mendoza Jan 26 '15 at 04:00