1

This is my first time posting on here so bear with me please.

I received a bash assignment but my professor is completely unhelpful and so are his notes.

Our assignment is to filter and print out palindromes from a file. In this case, the directory is:

/usr/share/dict/words

The word lengths range from 3 to 45 and are supposed to only filter lowercase letters (the dictionary given has characters and uppercases, as well as lowercase letters). i.e. "-dkas-das" so something like "q-evvavve-q" may count as a palindrome but i shouldn't be getting that as a proper result.

Anyways, I can get it to filter out x amount of words and return (not filtering only lowercase though).

grep "^...$" /usr/share/dict/words |
grep "\(.\).\1" 

And I can use subsequent lines for 5 letter words and 7 and so on:

grep "^.....$" /usr/share/dict/words |
grep "\(.\)\(.\).\2\1" 

But the prof does not want that. We are supposed to use a loop. I get the concept but I don't know the syntax, and like I said, the notes are very unhelpful.

What I tried was setting variables x=... and y=.. and in a while loop, having x=$x$y but that didn't work (syntax error) and neither did x+=..

Any help is appreciated. Even getting my non-lowercase letters filtered out.

Thanks!

EDIT:

If you're providing a solution or a hint to a solution, the simplest method is prefered. Preferably one that uses 2 grep statements and a loop.

Thanks again.

Community
  • 1
  • 1
Greg
  • 145
  • 1
  • 9

5 Answers5

2

Like this:

for word in `grep -E '^[a-z]{3,45}$' /usr/share/dict/words`;
    do [ $word == `echo $word | rev` ] && echo $word;
done;

Output using my dictionary:

aha
bib
bob
boob
...
wow

Update

As pointed out in the comments, reading in most of the dictionary into a variable in the for loop might not be the most efficient, and risks triggering errors in some shells. Here's an updated version:

grep -E '^[a-z]{3,45}$' /usr/share/dict/words | while read -r word;
    do [ $word == `echo $word | rev` ] && echo $word;
done;
Robby Cornelissen
  • 72,308
  • 17
  • 104
  • 121
  • Alright, it DOES work but for me, it lists the words and I have to hit ENTER for every word. If it qualifies, it'll show up, if not, it stays blank. Plus, I have to use grep, but thanks still. – Greg Oct 28 '14 at 05:30
  • Having to hit enter after every echo is strange. Isn't it just slow? I am using `grep` in my solution though: `egrep` is the same as `grep -E`. – Robby Cornelissen Oct 28 '14 at 05:33
  • Hmmm, it's still doing the thing where I have to hit enter. If I remember, the first time I copied it in, it worked but crashed. I'm using PuTTY if that makes any difference. – Greg Oct 28 '14 at 06:17
  • 2
    In some shells, this will produce [`argument list too long`](http://www.in-ulm.de/~mascheck/various/argmax/) because you are effectively reading in the entire dictionary as the argument to the `for` loop. It would be more economical to use `grep ... /usr/share/dict/words | while read -r word; do` ... and still, running an external process on each word is going to take a long time. – tripleee Oct 28 '14 at 06:20
  • Gotcha. I really don't know how long this is optimally supposed to take, given what we've learned in class and what the prof expects of us. There are over 479,000 words it sorts through. If I get what you're saying, I think what tripleee was also saying about going over in one pass is the key. I don't know why he would want a loop then, but I guess a loop can be implemented as well as only doing one pass maybe? It may just need a bunch of more thought on my part. – Greg Oct 28 '14 at 06:29
2

The multiple greps are wasteful. You can simply do

grep -E '^([a-z])[a-z]\1$' /usr/share/dict/words

in one fell swoop, and similarly, put the expressions on grep's standard input like this:

echo '^([a-z])[a-z]\1$
^([a-z])([a-z])\2\1$
^([a-z])([a-z])[a-z]\2\1$' | grep -E -f - /usr/share/dict/words

However, regular grep does not permit backreferences beyond \9. With grep -P you can use double-digit backreferences, too.

The following script constructs the entire expression in a loop. Unfortunately, grep -P does not allow for the -f option, so we build a big thumpin' variable to hold the pattern. Then we can actually also simplify to a single pattern of the form ^(.)(?:.|(.)(?:.|(.)....\3)?\2?\1$, except we use [a-z] instead of . to restrict to just lowercase.

head=''
tail=''
for i in $(seq 1 22); do
    head="$head([a-z])(?:[a-z]|"
    tail="\\$i${tail:+)?}$tail"
done
grep -P "^${head%|})?$tail$" /usr/share/dict/words

The single grep should be a lot faster than individually invoking grep 22 or 43 times on the large input file. If you want to sort by length, just add that as a filter at the end of the pipeline; it should still be way faster than multiple passes over the entire dictionary.

The expression ${tail+:)?} evaluates to a closing parenthesis and question mark only when tail is non-empty, which is a convenient way to force the \1 back-reference to be non-optional. Somewhat similarly, ${head%|} trims the final alternation operator from the ultimate value of $head.

tripleee
  • 139,311
  • 24
  • 207
  • 268
  • Okay, I think this definitely helps. The prof doesn't want more than 2 grep statements anyways. Again, I'm pretty much completely new to bash and have only had limited programming experience. I guess if I break this down and change a few things, it may work. Thanks. – Greg Oct 28 '14 at 06:16
  • Updated to only look at lowercase words -- using `[a-z]` instead of `.`. If you want case-insensitive matching, add an `-i` option to `grep`. You could extend the range to `[-a-z]` if you want to include literal hyphens. – tripleee Oct 28 '14 at 08:28
  • Further updated to use `grep -P` instead, in order to accommodate backreferences beyond `\9`. – tripleee Oct 28 '14 at 09:51
  • You could also add apostrophe to the character class `[-'a-z]` -- I have the hit `ma'am` in my `/usr/share/dict/words`. – tripleee Oct 28 '14 at 09:52
2

Why use grep? Bash will happily do that for you:

#!/bin/bash

is_pal() {
    local w=$1
    while (( ${#w} > 1 )); do
        [[ ${w:0:1} = ${w: -1} ]] || return 1
        w=${w:1:-1}
    done
 }

 while read word; do
     is_pal "$word" && echo "$word"
 done

Save this as banana, chmod +x banana and enjoy:

./banana < /usr/share/dict/words

If you only want to keep the words with at least three characters:

grep ... /usr/share/dict/words | ./banana

If you only want to keep the words that only contain lowercase and have at least three letters:

grep '^[[:lower:]]\{3,\}$' /usr/share/dict/words | ./banana
gniourf_gniourf
  • 38,851
  • 8
  • 82
  • 94
0

Ok here is something to get you started:

I suggest to use the plan you have above, just generate the number of "." using a for loop.

This question will explain how to make a for loop from 3 to 45:

How do I iterate over a range of numbers defined by variables in Bash?

for i in {3..45}; 
do 
   * put your code above here *
done

Now you just need to figure out how to make "i" number of dots "." in your first grep and you are done.

Also, look into sed, it can nuke the non-lowercase answers for you..

Community
  • 1
  • 1
user230910
  • 1,927
  • 1
  • 19
  • 45
0

Another solution that uses a Perl-compatible regular expressions (PCRE) with recursion, heavily inspired by this answer:

grep -P '^(?:([a-z])(?=[a-z]*(\1(?(2)\2))$))++[a-z]?\2?$' /usr/share/dict/words
Community
  • 1
  • 1
Robby Cornelissen
  • 72,308
  • 17
  • 104
  • 121