0

I'm currently working on something where the code inputs about thousands of lines of strings. Each line must follow a specific format like the following:

"Name,#,#,#,#,#,#"

Where 'name' is the name of a movie (we can assume the name won't have any numbers), and # is any number from 0-10. Each value MUST be separated by a comma.

My code is the following:

if (line.matches(".*[a-zA-z].*,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)")) {
       System.out.println("no");
}

else {
    System.out.println(line);

The issue is that the title of the film can't have commas in it. If it does, it needs to be printed. However, my 'matches()' doesn't seem to pick up lines that have a comma in the title. It seems to me that my code specifically outlines that if the next entry (separated by a comma) is not an integer, then it does not match, and therefore the 'line' needs to be printed.

Can anyone see where I'm going wrong in this?

rexorsist
  • 79
  • 5
  • I'm sorry, but I can't figure out your exact requirements from the last paragraph. If a comma is in the title, should it match or not? Can you please provide an example input/output from your program and what you expect instead? – Izruo Feb 26 '18 at 23:00
  • The name can't have number but can have comma and the comma in normal format(occuring after name) is also followed by a number. So you need to check the first instance of comma followed by number and then rest of your regex – Balwinder Singh Feb 26 '18 at 23:01
  • Hi, sorry for being vague. If a comma is in the title, it should not match. There are thousands of lines being outputted. An example of something that should be outputted, but isn't would be "Fri,day,4,6,2,4,7,9" (this should NOT match). – rexorsist Feb 26 '18 at 23:02
  • This line definition doesn't make lots of sense. A separator should be unique and shouldn't appear as data. – SHG Feb 26 '18 at 23:02
  • *the title of the film can't have commas in it. If it does, it needs to be printed.* - so does it need to be printed or ignored? – SHG Feb 26 '18 at 23:04
  • line.split(",").length() == 7 to first verify the correct number of commas in the line, then use a regex for index 0 and then another regex for the numbers? – RAZ_Muh_Taz Feb 26 '18 at 23:05
  • I've tested `^[a-zA-z]+,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)` with a online tester which "seems" woek – MadProgrammer Feb 26 '18 at 23:05
  • I should mention I'm a complete beginner in Java. In general, this code works perfectly, except for the issue that it's falsely matching lines that have commas in their names (eg. "The Ki,ller,6,7,3,6,8,1" shouldn't match). If you guys have any alternative ways to code this, or any suggestions on how to fix my code, I'd really appreciate it. – rexorsist Feb 26 '18 at 23:06
  • @SHG. If it doesn't match the format, then it needs to be printed. My aim is to only print the lines that do not follow my format. – rexorsist Feb 26 '18 at 23:08
  • And does it print `no` for this line? – SHG Feb 26 '18 at 23:09
  • @SHG no, the actual line itself will need to be printed. To make this clear, if the input line matches my format, it will print 'no'. If it does not match my format, the line itself will need to be printed. – rexorsist Feb 26 '18 at 23:11
  • @MadProgrammer I just tried using that, and it doesn't seem to work. It seems like no input string is matching. – rexorsist Feb 26 '18 at 23:11
  • @MadProgrammer Tried your regex with "Na,me,2,2,4,6,7,7" test string and it doesn't match. It works fine with "Name,2,2,4,6,7,7". As per the problem description, it should match both of these – Balwinder Singh Feb 26 '18 at 23:11
  • I feel that your code just matches your requirements, so either your explanation isn't clear, or your code behaves differently than what you think.. – SHG Feb 26 '18 at 23:13
  • @rexorsist Try this regex `^[a-z,A-z]+,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)` – Balwinder Singh Feb 26 '18 at 23:13
  • @BalwinderSingh thanks for the help, but still not working. I should note, spaces are allowed in the name. Does the regex you wrote out allow for that? – rexorsist Feb 26 '18 at 23:15
  • @SHG like i mentioned before, it works perfectly for everything, EXCEPT in the case where there is a comma within the name, e.g. "The Ki,ller,6,7,3,6,8,1". For some reason, these kinds of lines are being matched , when they shouldn't be. – rexorsist Feb 26 '18 at 23:16
  • @rexorsist This one allows for spaces as well `^[a-z, A-z]+,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)` – Balwinder Singh Feb 26 '18 at 23:25
  • @rexorsist `System.out.println("Na,me,2,2,4,6,7,7".matches("^[a-zA-z]+,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)"));` returns `false` and `System.out.println("Name,2,2,4,6,7,7".matches("^[a-zA-z]+,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)"));` returns `true` for me – MadProgrammer Feb 26 '18 at 23:27
  • @BalwinderSingh this one works much better, but one problem is - the name can allow apostrophes. Eg. "The Killer's Revenge, 5,3,6,7,4,2" should match. It's only commas that aren't allowed, since they separate each entry. The one you just sent doesn't seem to match apostrophes. Thanks so much though. Any other ideas? – rexorsist Feb 26 '18 at 23:36
  • @rexorsist `^[a-z,' A-z]+,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)` should match apostrophes as well. You can try it out here https://regex101.com/r/sAoZFS/1/ – Balwinder Singh Feb 26 '18 at 23:43

4 Answers4

1

You are saying that rules are:

  • Lines must be 7 comma-separated values: a name and 6 numbers in range 0-10.
  • The name must not contain a comma.
  • We can assume the name won't have any numbers, but it is not a requirement that it cannot.

Since the only invalid character in a name is a comma, so regex would be:

[^,]*,(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10)

If you want to capture the fields, you would use this code:

Pattern p = Pattern.compile("([^,]*),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)");
for (String line : lines) {
    Matcher m = p.matcher(line);
    if (! m.matches()) {
        System.out.println("Invalid line: " + line);
    } else {
        System.out.println("Name: " + m.group(1));
        System.out.println("  Values: " + m.group(2)
                                  + " " + m.group(3)
                                  + " " + m.group(4)
                                  + " " + m.group(5)
                                  + " " + m.group(6)
                                  + " " + m.group(7));
    }
}

Test

String[] lines = { "Buffalo Bill and the Indians, or Sitting Bull's History Lesson,0,1,2,3,4,5",
                   "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb,6,7,8,9,10,0",
                   "300,1,2,3,4,5,6"};

Output

Invalid line: Buffalo Bill and the Indians, or Sitting Bull's History Lesson,0,1,2,3,4,5
Name: Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb
  Values: 6 7 8 9 10 0
Name: 300
  Values: 1 2 3 4 5 6

First movie name has a comma, so it doesn't match.
Second movie name has special characters (. and :), but no comma, so it matches.
Third movie name is "300", which is an actual movie, so it matches.

Andreas
  • 138,167
  • 8
  • 112
  • 195
  • this works 100% perfectly. Thank you so much. I'm a complete beginner in java. Can you break down the regex you wrote out? I really just want to learn more, such as what [^,] entails or why there is a '?' before ever integer. Thanks again – rexorsist Feb 26 '18 at 23:46
  • @rexorsist The javadoc of [`Pattern`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) lists all the regular-expression constructs. See also: [Learning Regular Expressions](https://stackoverflow.com/a/2759417/5221149) – Andreas Feb 27 '18 at 15:51
0

The problem lies within with the .*. This part is able to include the comma.

Fri,dayaervsere,6,4,78,7
<--><--------->^
.*  [a-zA-Z]   ,(  [...]

So, basically you only need to get rid of the .*. Instead, apply a quantifier to your first group:

[a-zA-Z]* // to match any number of characters

or

[a-zA-Z]+ // to match at least one character
Izruo
  • 1,678
  • 7
  • 19
  • This works well, but spaces and apostrophes are both allowed. It's just commas (since they separate each entry) that's not allowed. Is there any alternative to account for this? – rexorsist Feb 26 '18 at 23:38
  • @rexorsist You can work on this within the first group now. For example to allow space, simply add a space to it; or to only disallow commas, use `[^,]` and so on. Only take care to escape meaningful pattern characters (e.g. `.`, as defined in [`java.util.regex.Pattern`](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html)). – Izruo Feb 26 '18 at 23:40
0

If you do use regex to solve this, I'd recommend allowing commas in the 'Name' part of your regex. Focus on making sure there are 6 numbers, each following a comma. You can check to see if the name fits an appropriate criteria later.

import java.util.regex.Pattern;
import java.util.regex.Matcher;


// before your for-loop, create a pattern (Assuming no digits in title)
Pattern p = Pattern.compile("([^0-9]+),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)");

// ...
// later on in your actual for-loop for each line.
Matcher m = p.matcher(line);

if (m.matches())
{
    String title = m.group(1);
    // do extra checking for the title if needed
}
else
{
    // print no
}
black panda
  • 2,334
  • 1
  • 16
  • 26
0

The following regex supposed to solve your problem:

^([a-zA-Z ]+),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)

Or the shorter version of it, with no code duplication:

^([a-zA-Z ]+)(,([0-9]|10)){6}

Testing

"The Killer,6,7,3,6,8,1" matches the pattern.

"The Kill,er,6,7,3,6,8,1" doesn't match the pattern, as you wanted.

Also, spaces in the title are supported.

You can play with it here.

SHG
  • 2,296
  • 1
  • 11
  • 19