0

The collection of cards I need to grep is defined as:

{h ∈ H | h contains only cards in ascending order regardless of their suit} 

Example:

h = Ah2c2d3s5h6d8s8d9h9cTdTcKh   
h != 3d4dQc3sKcAh2sAc7hKdKsKh4h62 (Q is followed by lower rank 3)

The ascending ranks of cards are:

A(ace) 2 3 4 5 6 7 8 9 T(ten) J Q K 

The suits are defined as such:

c(clover) s(spade) h(heart) d(diamond)

I have tried the following grep and it is correct but I still don't understand why it works.

Edit*** added -P flag (forgot about it) as pointed out by tripleee that just grep -v is indeed invalid.

 grep -Pv "[KQJT].*[2-9A].* |[KQ].*[JT].* |[6-9].*[2-5A].* "

What baffles me is how K followed by Q got matched with this pattern or even 5 followed by [A2-4]

The solution has a total of 31027 lines

The text file provided for the exercise can be found here: http://computergebruik.ugent.be/oefeningenreeks1/kaarten1.txt

Kortika
  • 3
  • 3
  • 2
    Possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Biffen Dec 28 '16 at 12:42

1 Answers1

1

Your regex is not at all valid, so I don't understand why you say it works.

Plain grep does not understand | to mean alteration. You can add an -E option to specify ERE (traditionally, egrep) regex semantics, or with POSIX grep backslash the |; or you can specify multiple -e options. (See e.g. https://en.wikipedia.org/wiki/Regular_expression#Standards for some background about the various regex dialects in common use.)

grep -Ev "[KQJT].*[2-9A].* |[KQ].*[JT].* |[6-9].*[2-5A].* "
grep -v  "[KQJT].*[2-9A].* \|[KQ].*[JT].* \|[6-9].*[2-5A].* "
grep -ve "[KQJT].*[2-9A].* " -e "[KQ].*[JT].* " -e "[6-9].*[2-5A].* "

Even with this fix, the regex is obviously insufficient for removing matches where e.g. 3 is followed by 2. The only way to make it cover all cases is to enumerate every possibility. (Disallow 1 followed by any higher number, 2 followed by any higher number, 3 followed by any higher number, etc.) An altogether better approach would be to use a scripting language of some sort, and basically just map the symbols to ones with the desired sort order, then check if the input is sorted.

If that is not an option, maybe try

grep -E '^(A.)*(2.)*(3.)*(4.)*(5.)*(6.)*(7.)*(8.)*(9.)*(T.)*(J.)*(Q.)*(K.)* '

which looks for zero or more aces, followed by zero or more twos, followed by zero or more threes, etc.

tripleee
  • 139,311
  • 24
  • 207
  • 268
  • Maybe see also http://stackoverflow.com/questions/2298007/why-are-there-so-many-different-regular-expression-dialects – tripleee Dec 28 '16 at 12:57
  • Ah yes, I forgot that I was using -P flag. I said it looks like it works because the number of lines matched the given solution. I still don't have the correct feedback from my teaching asistents. I had indeed thought about writing every single combination possible but the exercise asked for the simplest form so I thought what I had written was correct. – Kortika Dec 28 '16 at 13:07
  • WIth `-P` you basically get a superset of `-E` but this is a nonstandard extension. – tripleee Dec 28 '16 at 13:08
  • Thank you very much for the detailed feedback. I also tried debug links as given by @Biffen and found out that indeed what I have written doesn't match the given criteria. – Kortika Dec 28 '16 at 13:13
  • To be clear `-P` is more than nonstandard, to quote [the man page](https://www.gnu.org/software/grep/manual/grep.html) `This is highly experimental` and so YMMV with using it and it should be avoided for anything important. – Ed Morton Dec 28 '16 at 21:34
  • 1
    Indeed, the fact that BSD implemented it, then later took it out should drive home this point. – tripleee Dec 29 '16 at 04:49