88

I have a variable like this:

words="这是一条狗。"

I want to make a for loop on each of the characters, one at a time, e.g. first character="这", then character="是", character="一", etc.

The only way I know is to output each character to separate line in a file, then use while read line, but this seems very inefficient.

  • How can I process each character in a string through a for loop?
Mateusz Piotrowski
  • 6,087
  • 9
  • 44
  • 71
Village
  • 18,301
  • 39
  • 106
  • 153
  • 3
    It might be worth mentioning that we see a lot of newbie questions where the OP *thinks* this is what they want to do. Very often, a better solution which does not require each character to be processed individually is possible. This is known as an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) and the proper solution is to explain what you actually want to *accomplish* in your question, not just how to execute the steps you think will help you get there. – tripleee Jan 10 '18 at 10:07

15 Answers15

241

You can use a C-style for loop:

foo=string
for (( i=0; i<${#foo}; i++ )); do
  echo "${foo:$i:1}"
done

${#foo} expands to the length of foo. ${foo:$i:1} expands to the substring starting at position $i of length 1.

chepner
  • 389,128
  • 51
  • 403
  • 529
  • Why do you need two sets of brackets around the for statement for it to work? – tgun926 Aug 16 '15 at 06:41
  • 1
    That's the syntax `bash` requires. – chepner Aug 16 '15 at 13:03
  • 3
    I know this is old, but, the two parentheses are required because they allow for arithmetic operations. See here => http://tldp.org/LDP/abs/html/dblparens.html – Hannibal Apr 23 '16 at 07:21
  • 9
    @Hannibal I just wanted to point out that this particular use of double parentheses is actually the bash construct: `for (( _expr_ ; _expr_ ; _expr_ )) ; do _command_ ; done` and not the same as $((_expr_)) nor ((_expr_)). In all three bash constructs, _expr_ is treated the same and $((_expr_)) is also POSIX. – nabin-info Jan 22 '17 at 06:26
  • `"${foo:i:1}"` would work as well - Bash does expand `i` without the `$` sign as it is an array operation. – codeforester Jun 11 '17 at 23:42
  • 2
    @codeforester That has nothing to do with arrays; it's just one of many expressions in `bash` that is evaluated in an arithmetic context. – chepner Jun 12 '17 at 00:39
  • I don't think it is POSIX unfortunately: https://stackoverflow.com/questions/51052475/how-to-iterate-over-the-characters-of-a-string-in-a-posix-shell-script Why do I like to suffer like this? – Ciro Santilli新疆棉花TRUMP BAN BAD Feb 15 '19 at 22:54
  • This should be the marked answer, as it does not rely on external tools or encoding. – Brandon Miller Oct 28 '20 at 18:36
47

With sed on dash shell of LANG=en_US.UTF-8, I got the followings working right:

$ echo "你好嗎 新年好。全型句號" | sed -e 's/\(.\)/\1\n/g'
你
好
嗎

新
年
好
。
全
型
句
號

and

$ echo "Hello world" | sed -e 's/\(.\)/\1\n/g'
H
e
l
l
o

w
o
r
l
d

Thus, output can be looped with while read ... ; do ... ; done

edited for sample text translate into English:

"你好嗎 新年好。全型句號" is zh_TW.UTF-8 encoding for:
"你好嗎"     = How are you[ doing]
" "         = a normal space character
"新年好"     = Happy new year
"。全型空格" = a double-byte-sized full-stop followed by text description
Rony
  • 1,514
  • 10
  • 10
37

${#var} returns the length of var

${var:pos:N} returns N characters from pos onwards

Examples:

$ words="abc"
$ echo ${words:0:1}
a
$ echo ${words:1:1}
b
$ echo ${words:2:1}
c

so it is easy to iterate.

another way:

$ grep -o . <<< "abc"
a
b
c

or

$ grep -o . <<< "abc" | while read letter;  do echo "my letter is $letter" ; done 

my letter is a
my letter is b
my letter is c
sebix
  • 2,475
  • 2
  • 25
  • 35
Tiago Peczenyj
  • 3,383
  • 1
  • 17
  • 33
  • 1
    what about whitespace? – Leandro Aug 07 '14 at 01:32
  • What *about* whitespace? A whitespace character is a character and this loops over all characters. (Though you should take care to use double quotes around any variable or string which contains significant whitespace. More generally, always quote everything unless [you know what you are doing.](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable)) – tripleee Jan 10 '18 at 10:09
24

I'm surprised no one has mentioned the obvious bash solution utilizing only while and read.

while read -n1 character; do
    echo "$character"
done < <(echo -n "$words")

Note the use of echo -n to avoid the extraneous newline at the end. printf is another good option and may be more suitable for your particular needs. If you want to ignore whitespace then replace "$words" with "${words// /}".

Another option is fold. Please note however that it should never be fed into a for loop. Rather, use a while loop as follows:

while read char; do
    echo "$char"
done < <(fold -w1 <<<"$words")

The primary benefit to using the external fold command (of the coreutils package) would be brevity. You can feed it's output to another command such as xargs (part of the findutils package) as follows:

fold -w1 <<<"$words" | xargs -I% -- echo %

You'll want to replace the echo command used in the example above with the command you'd like to run against each character. Note that xargs will discard whitespace by default. You can use -d '\n' to disable that behavior.


Internationalization

I just tested fold with some of the Asian characters and realized it doesn't have Unicode support. So while it is fine for ASCII needs, it won't work for everyone. In that case there are some alternatives.

I'd probably replace fold -w1 with an awk array:

awk 'BEGIN{FS=""} {for (i=1;i<=NF;i++) print $i}'

Or the grep command mentioned in another answer:

grep -o .


Performance

FYI, I benchmarked the 3 aforementioned options. The first two were fast, nearly tying, with the fold loop slightly faster than the while loop. Unsurprisingly xargs was the slowest... 75x slower.

Here is the (abbreviated) test code:

words=$(python -c 'from string import ascii_letters as l; print(l * 100)')

testrunner(){
    for test in test_while_loop test_fold_loop test_fold_xargs test_awk_loop test_grep_loop; do
        echo "$test"
        (time for (( i=1; i<$((${1:-100} + 1)); i++ )); do "$test"; done >/dev/null) 2>&1 | sed '/^$/d'
        echo
    done
}

testrunner 100

Here are the results:

test_while_loop
real    0m5.821s
user    0m5.322s
sys     0m0.526s

test_fold_loop
real    0m6.051s
user    0m5.260s
sys     0m0.822s

test_fold_xargs
real    7m13.444s
user    0m24.531s
sys     6m44.704s

test_awk_loop
real    0m6.507s
user    0m5.858s
sys     0m0.788s

test_grep_loop
real    0m6.179s
user    0m5.409s
sys     0m0.921s
Six
  • 3,909
  • 3
  • 21
  • 34
  • `character` is empty for whitespace with the simple `while read` solution, which may be problematic if different types of whitespace must be distinguished from each other. – pkfm Aug 29 '19 at 01:58
  • Nice solution. I found that changing `read -n1` to `read -N1` was needed to handle space characters correctly. – nielsen Feb 03 '20 at 08:13
16

I believe there is still no ideal solution that would correctly preserve all whitespace characters and is fast enough, so I'll post my answer. Using ${foo:$i:1} works, but is very slow, which is especially noticeable with large strings, as I will show below.

My idea is an expansion of a method proposed by Six, which involves read -n1, with some changes to keep all characters and work correctly for any string:

while IFS='' read -r -d '' -n 1 char; do
        # do something with $char
done < <(printf %s "$string")

How it works:

  • IFS='' - Redefining internal field separator to empty string prevents stripping of spaces and tabs. Doing it on a same line as read means that it will not affect other shell commands.
  • -r - Means "raw", which prevents read from treating \ at the end of the line as a special line concatenation character.
  • -d '' - Passing empty string as a delimiter prevents read from stripping newline characters. Actually means that null byte is used as a delimiter. -d '' is equal to -d $'\0'.
  • -n 1 - Means that one character at a time will be read.
  • printf %s "$string" - Using printf instead of echo -n is safer, because echo treats -n and -e as options. If you pass "-e" as a string, echo will not print anything.
  • < <(...) - Passing string to the loop using process substitution. If you use here-strings instead (done <<< "$string"), an extra newline character is appended at the end. Also, passing string through a pipe (printf %s "$string" | while ...) would make the loop run in a subshell, which means all variable operations are local within the loop.

Now, let's test the performance with a huge string. I used the following file as a source:
https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt
The following script was called through time command:

#!/bin/bash

# Saving contents of the file into a variable named `string'.
# This is for test purposes only. In real code, you should use
# `done < "filename"' construct if you wish to read from a file.
# Using `string="$(cat makefiles.txt)"' would strip trailing newlines.
IFS='' read -r -d '' string < makefiles.txt

while IFS='' read -r -d '' -n 1 char; do
        # remake the string by adding one character at a time
        new_string+="$char"
done < <(printf %s "$string")

# confirm that new string is identical to the original
diff -u makefiles.txt <(printf %s "$new_string")

And the result is:

$ time ./test.sh

real    0m1.161s
user    0m1.036s
sys     0m0.116s

As we can see, it is quite fast.
Next, I replaced the loop with one that uses parameter expansion:

for (( i=0 ; i<${#string}; i++ )); do
    new_string+="${string:$i:1}"
done

The output shows exactly how bad the performance loss is:

$ time ./test.sh

real    2m38.540s
user    2m34.916s
sys     0m3.576s

The exact numbers may very on different systems, but the overall picture should be similar.

Thunderbeef
  • 1,016
  • 11
  • 16
13

I've only tested this with ascii strings, but you could do something like:

while test -n "$words"; do
   c=${words:0:1}     # Get the first character
   echo character is "'$c'"
   words=${words:1}   # trim the first character
done
William Pursell
  • 174,418
  • 44
  • 247
  • 279
8

It is also possible to split the string into a character array using fold and then iterate over this array:

for char in `echo "这是一条狗。" | fold -w1`; do
    echo $char
done
sebix
  • 2,475
  • 2
  • 25
  • 35
8

The C style loop in @chepner's answer is in the shell function update_terminal_cwd, and the grep -o . solution is clever, but I was surprised not to see a solution using seq. Here's mine:

read word
for i in $(seq 1 ${#word}); do
  echo "${word:i-1:1}"
done
De Novo
  • 5,790
  • 17
  • 34
2
#!/bin/bash

word=$(echo 'Your Message' |fold -w 1)

for letter in ${word} ; do echo "${letter} is a letter"; done

Here is the output:

Y is a letter o is a letter u is a letter r is a letter M is a letter e is a letter s is a letter s is a letter a is a letter g is a letter e is a letter

Simas Joneliunas
  • 2,522
  • 12
  • 20
  • 28
Evgeny
  • 23
  • 2
1

To iterate ASCII characters on a POSIX-compliant shell, you can avoid external tools by using the Parameter Expansions:

#!/bin/sh

str="Hello World!"

while [ ${#str} -gt 0 ]; do
    next=${str#?}
    echo "${str%$next}"
    str=$next
done

or

str="Hello World!"

while [ -n "$str" ]; do
    next=${str#?}
    echo "${str%$next}"
    str=$next
done
nggit
  • 411
  • 5
  • 5
1

sed works with unicode

IFS=$'\n'
for z in $(sed 's/./&\n/g' <(printf '你好嗎')); do
 echo hello: "$z"
done

outputs

hello: 你
hello: 好
hello: 嗎
Paul
  • 138
  • 6
0

Another approach, if you don't care about whitespace being ignored:

for char in $(sed -E s/'(.)'/'\1 '/g <<<"$your_string"); do
    # Handle $char here
done
0

Another way is:

Characters="TESTING"
index=1
while [ $index -le ${#Characters} ]
do
    echo ${Characters} | cut -c${index}-${index}
    index=$(expr $index + 1)
done
Javier Salas
  • 882
  • 2
  • 13
  • 30
-1

I share my solution:

read word

for char in $(grep -o . <<<"$word") ; do
    echo $char
done
-3
TEXT="hello world"
for i in {1..${#TEXT}}; do
   echo ${TEXT[i]}
done

where {1..N} is an inclusive range

${#TEXT} is a number of letters in a string

${TEXT[i]} - you can get char from string like an item from an array

Jason Roman
  • 7,503
  • 10
  • 32
  • 35