bash check if a string has a character more than once

Question

The title actually almost explains it all. I would like to check if a string contains a letter (not a specific letter, really any letter) more than once.

for example:

user:

test.sh this list

script:

if [ "$1" has some letter more then once ]
then 
do something
fi

what ouput do you expect ? Yes or No or the letter ? the number of occurences ? — Gerard Rozsavolgyi, Dec 13 '15 at 13:11

score 2 · Accepted Answer · edited May 23 '17 at 11:59

2

Use a Posix character class:

if [[ $1 =~ [[:alpha:]].*[[:alpha:]] ]]; then
  echo "more than one letter"
fi

edited May 23 '17 at 11:59

Community

1
1

answered Dec 13 '15 at 13:13

Cyrus

69,405
13
65
117

one more question. Would the code also work if you changed the [[:alpha:]]'s with [[a-z]]? – Joey Dec 13 '15 at 13:23
If you only want small letters use a [range](http://stackoverflow.com/questions/1545751/how-to-back-reference-inner-selections-in-a-regular-expression/1553171#1553171) with only one bracket on each side: [a-z].*[a-z] – Cyrus Dec 13 '15 at 14:25
This just checks that a string has at least two letters; I thought the goal was to have something like `aa` be accepted but `ab` rejected. – chepner Dec 13 '15 at 15:28
@Cyrus Well ... a range will work correctly for ASCII chars between a and z `if, and only if` the LC_COLLATE (or LANG) is set to C. More detail in my answer. – Dec 14 '15 at 03:27
@chepner I did it (detect `aa`, I mean) !!. – Dec 14 '15 at 03:28

score 0 · Answer 2 · 2015-12-14T03:16:51.603

This regex (in bash) will tell you the first lower case letter that is repeated.
And which is it:

#!/bin/bash
regex="([a-z]).*\1"
if [[ $1 =~ $regex ]]; then
    echo "more than one letter ${BASH_REMATCH[1]}"
fi

Call as:

$ script.sh "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZz"
more than one letter "z"

Of course, the range of letters could be changed to lower and upper:

[a-zA-Z]

But only if the LC_COLLATE is set to "C", if that is set to UTF-8, then also accented characters could be included in the a-z range. As this may show:

$ ./sc.sh abcdefghijklémnopéqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZz
more than one letter "é"

This will keep letters as what ASCII believe a letter is:

$ LC_COLLATE=C ./sc.sh abcdefghijklémnopéqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZz
more than one letter "z"

The range of characters could be some of the POSIX character ranges:

[[:word:]] [[:alpha:]] [[:lower:]] [[:upper:]]

Please note that what those ranges mean is also changed by the character set in use.

score 0 · Answer 3 · answered Dec 15 '15 at 04:47

If you want to go by using just basic commands, you can use something like this ...

#!/bin/bash
PATH=/bin/:/usr/bin/:$PATH
if [ `echo $* | tr -d ' ' | sed 's/\(.\)/\1\n/g' | sort | uniq -c | tr -s ' ' |  sort -n | grep -v '^ 1 ' | wc -l`  -ge 1 ]
then
    echo "Input contains duplicate characters"
fi

In case it is unclear, it will be easy to try it out each step on the command line like this ... echo test input | tr -d ' 'see the output, then add the sed part to it and so on and so forth.

The first tr -d ' ' will ensure spaces from your input are not counted as duplicates. For example, if the input is "abcd efgh ijkl", the only character repeating is the space. If you keep tr -d ' ' in there, the script will not count the input to be having duplicate characters, if you remove it, the script will count the input to be having duplicate characters.

Cheers.

-- Parag

bash check if a string has a character more than once

3 Answers3