Perl regex to find keywords and not variables

Question

I'm trying to create a regex as following :

print $time . "\n"; --> match only print because time is a variable ($ before)

$epoc = time(); --> match only time

My regex for the moment is /(?-xism:\b(print|time)\b)/g but it match time in $time in the first example.

Check here.

I tried things like [^\$] but then it doesn't match print anymore.

(I will have more keyword like print|time|...|...)

Thanks

I'm not sure if what you're doing is right but as it seems to me, you only need a [negative lookbehind](http://stackoverflow.com/q/22937618): `(? — HamZa, Apr 13 '14 at 14:20
Thank you it's exactly this. Post it as an answer, I will validate it. — anasaitali, Apr 13 '14 at 14:24

score 7 · Answer 1 · answered Apr 13 '14 at 21:16

7

Parsing perl code is a common and useful teaching tool since the student must understand both the parsing techniques and the code that they're trying to parse.

However, to do this properly, the best advice is to use PPI

The following script parses itself and outputs all of the barewords. If you wanted to, you could compare the list of barewords to the ones that you're trying to match. Note, this will avoid things within strings, comments, etc.

use strict;
use warnings;

use PPI;

#my $src = do {local $/; <DATA>};  # Could analyze the smaller code in __DATA__ instead
my $src = do {
    local @ARGV = $0;
    local $/;
    <>;
};

# Load a document
my $doc = PPI::Document->new( \$src );

# Find all the barewords within the doc
my $barewords = $doc->find( 'PPI::Token::Word' );
for (@$barewords) {
    print $_->content, "\n";
}

__DATA__
use strict;
use warnings;

my $time = time;

print $time . "\n";

Outputs:

use
strict
use
warnings
use
PPI
my
do
local
local
my
PPI::Document
new
my
find
for
print
content
__DATA__

answered Apr 13 '14 at 21:16

Miller

34,344
4
33
55

**+1** for not using regex – HamZa Apr 13 '14 at 22:07
That's looks great but I can't use it (```Can't locate PPI.pm in @INC```). And I'm allowed to use only modules already installed. – anasaitali Apr 15 '14 at 10:40
I've been noticing your fine regex style, but this solution is not regex! Upvoting for original and instructive solution... Perl is mysterious to me. :) – zx81 May 05 '14 at 23:33
@zx81 Thank you for the compliment. I'm glad you were able to learn something. I actually picked up `PPI` from another thread, but discovered that it is quite powerful and the preferred technique for this type of problem. – Miller May 06 '14 at 00:32
`picked it up from another thread`... the magic of stackoverflow. We're all learning from somewhere. Nice meeting you and thanks for your message. :) – zx81 May 06 '14 at 00:34

score 3 · Accepted Answer · edited May 23 '17 at 12:21

3

What you need is a negative lookbehind (?<!\$), it's zero-width so it doesn't "consume" characters.

(?<!\$)a means match a if not preceded with a literal $. Note that we escaped $ since it means end of string (or line depending on the m modifier).

Your regex will look like (?-xism:\b(?<!\$)(print|time)\b).

I'm wondering why you are turning off the xism modifiers. They are off by default.
So just use /\b(?<!\$)(?:print|time)\b/g as pattern.

Online demo SO regex reference

edited May 23 '17 at 12:21

Community

1
1

answered Apr 13 '14 at 14:30

HamZa

13,530
11
51
70

I'm using xism because in my perl code I'm doing ```$var = qr/\b(? – anasaitali Apr 13 '14 at 14:33

Perl regex to find keywords and not variables

2 Answers2