2

I am trying to figure out how to count any and all non-whitespace characters in a string such that in the example below I would get an output of 4.

my $str = 'A b C $'; 

my $cnt =~ /\S/g;  

print "Char Count: $cnt\n";

This Is there a Perl shortcut to count the number of matches in a string? does not answer my question.

perl -e 'my $str = "A b C"; my $cnt = () = $str =~ /\./gi;  print "Chs: $cnt\n";'
Chs: 0

Someone keeps wanting to say that this question is a duplicate of this question: Is there a Perl shortcut to count the number of matches in a string?

However, the other question references how to match a specific character, in their case I believe it's a dot. I looked at many other other examples that seek to match a specific character and could not get them to work to match all characters except whitespace.

This does not answer my question.

$string = "ThisXlineXhasXsomeXx'sXinXit";
$count = ($string =~ tr/X//);
print "There are $count X characters in the string";

This question: What do \S, \W, \D stand for in regex? merely defines various Perl wildcard operators--one of which (the \S operator) I attempted to employ in my original question to no avail. It does not however demonstrate how one actually employs one of those operators in order to obtain the count of all non-whitespace characters in a string.

gatorreina
  • 576
  • 2
  • 11

2 Answers2

6

From perlfaq4 (How can I count the number of occurrences of a substring within a string?):

Another version uses a global match in list context, then assigns the result to a scalar, producing a count of the number of matches.

You can also query the Perl documentation from your command line:

perldoc -q count

use warnings;
use strict;

my $str = 'A b C $'; 
my $cnt = () = $str =~ /\S/g;
print "Char Count: $cnt\n";
toolic
  • 46,418
  • 10
  • 64
  • 104
3
require 5.014;
use feature qw( unicode_strings );

my $count = () = $str =~ /\S/g;

or

require 5.014;
use feature qw( unicode_strings );

my $count = 0;
++$count while $str =~ /\S/g;

or

# Count non-whitespace characters.
my $count = $str =~ tr/\x{0009}\x{000A}\x{000B}\x{000C}\x{000D}\x{0020}\x{0085}\x{00A0}\x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005}\x{2006}\x{2007}\x{2008}\x{2009}\x{200A}\x{2028}\x{2029}\x{202F}\x{205F}\x{3000}//c;
  • The first can use up a lot of memory. It creates a scalar for each non-whitespace character.

  • The second doesn't. I'm not sure if it's faster or slower.

  • The third should be much faster, but you can't use prebuilt character classes.

  • require 5.014; use feature qw( unicode_strings ); (or just use 5.014;) is required for \s/\S to handle U+85 NEL and U+A0 NBSP correctly. (Higher versions also fine.) Otherwise, it will "randomly" be considered a space or non-space.

    use feature qw( say );
    
    { local $_ = "abc\x{0085}\x{00A0}\x{2000}"; say scalar( () = /\S/g ); }  # 3
    { local $_ = "abc\x{0085}";                 say scalar( () = /\S/g ); }  # 4?!
    { local $_ = "abc\x{00A0}";                 say scalar( () = /\S/g ); }  # 4?!
    { local $_ = "abc\x{2000}";                 say scalar( () = /\S/g ); }  # 3
    
ikegami
  • 322,729
  • 15
  • 228
  • 466