1

I need to find all occurrences of "st" within any class declaration on any html page like this for example:

class="st0 st1 st2", class="st3 st45", class="st678"

I say within a class declararion because there may be other occurrences of "st" throughout the document and I do not want to change every occurrence.

My ultimate goal here is a find and replace. I have the logic written out for that but I just need to figure out how to isolate "st" from the string.

I have experimented with a few different lookaround expressions but I cannot seem to match every occurrence. Below are a few examples of what I have been trying.

This expression gets everything between 'class="' and '"':

Regular Expression:

(?<=class=").*(?=")

Test sting:

class="st10 st11"

Matching result :

"st10 st11"

Here is another one I tried:

Regular Expression:

(?<=class=")((st)\d*\s*)*(?=")

Test sting:

class="st10 st11"

Matching result:

"st10 st11"

Matching groups:

  1. st11
  2. st

I have been testing my regular expression here at Rubular.com

added from comments
I am going to be using the regular expression within a terminal shell command which I will run on a specific folder. The shell command will do a find and replace on every file that is in the folder like this...

perl -pi -w -e 's/st/stx/g;' ~/Desktop/svg_find_replace/*.svg.

Any help would be much appreciated.

halfer
  • 18,701
  • 13
  • 79
  • 158
masahs
  • 25
  • 6
  • 1
    This could be done much more easily with a parser - if this is not an option [**`\bst\d+`**](https://regex101.com/r/aJ8cU8/1) might be what you're looking for. – Jan Jul 06 '16 at 18:04
  • Exactly, take a look at this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Jorge Campos Jul 06 '16 at 18:05
  • I am going to be using the regular expression within a terminal shell command which I will run on a specific folder. The shell command will do a find and replace on every file that is in the folder like this... perl -pi -w -e 's/st/stx/g;' ~/Desktop/svg_find_replace/*.svg. – masahs Jul 06 '16 at 18:11
  • @Jan tried `\bst\d+` with no matches. Modifed expression to this `(?<=class=")((\bst\d+)\s*)*(?=")` still the same match results. Seems that regular epressions don't quite work the same within lookaround expressions – masahs Jul 06 '16 at 18:20
  • please add your specific use case to your question, a good answer will be dependent on the tool you are using to do this replacement – Will Barnwell Jul 06 '16 at 18:57
  • for example, if you are using perl you should definitely not be trying to write a Ruby regex, as Ruby does not use PCRE while perl does – Will Barnwell Jul 06 '16 at 18:58
  • @WillBarnwell Hi. What I am really trying to do is write a regular expression that will find all occurrences of "st" on any class declaration within an html file. I will be using the expression in a shell script but at this point if I cannot get the expression working properly, I cannot move to the next step. So I had, first, written an expression that uses lookbefore `(?<=class=")` and lookafter `(?=")` to define the boundaries of where I want to search. So now I want to isolate all occurrences of "st". That is really it right now. I was using rubular.com to test the expression, that's all. – masahs Jul 06 '16 at 19:15
  • @WillBarnwell I am using this as my test string: `class="st0 st1 st2", class="st3 st45", class="st678"` – masahs Jul 06 '16 at 19:18
  • @WillBarnwell and this as my regular expression so far: `(?<=class=")((\bst\d+\s*)*)(?=")` – masahs Jul 06 '16 at 19:19
  • don't write these as comments, edit your post to add this information – Will Barnwell Jul 06 '16 at 19:20
  • PCRE is the regex language used by perl and sed and a bund of other shell tools, you should use something like https://regex101.com/ instead of Rubular, which is using a different regex language – Will Barnwell Jul 06 '16 at 19:22
  • You need to explain what exactly going to do and what tool you use as others mentioned. Maybe you can use a [`\G`](http://www.regular-expressions.info/continue.html) based pattern [like this for Ruby](http://www.rubular.com/r/6Vl4mgXmfG) or something [like this for PCRE](https://regex101.com/r/wM4eJ8/1) – bobble bubble Jul 06 '16 at 19:27
  • @WillBarnwell Thanks for your comments. I have used both rubular and regex101 within the last hour or so with the same results, just to make sure that things are the same in both. – masahs Jul 06 '16 at 19:30
  • @masahs I edited your question, didn't see you comment before (: please check if ok. See if [this demo at regex101](https://regex101.com/r/wM4eJ8/3) helps. – bobble bubble Jul 06 '16 at 19:37
  • @bobblebubble I am just trying to figure out if I can get this regular expression working as a proof of concept and thought I'd reach out to the community for help. One of the earlier commenters had asked why I was not using a parser and I replied because I would be using the expression in a shell command so that is why i chose to use a regular expression... nothing more. – masahs Jul 06 '16 at 19:40
  • @bobblebubble - That is exactly what I was trying to accomplish. Thank you so much for giving it a go. – masahs Jul 06 '16 at 19:41
  • @bobblebubble regular expressions are certainly not my strong suit. Really appreciate it. – masahs Jul 06 '16 at 19:42
  • 1
    @masahs you're welcome. Also can do without the lookbehind like this: [`(?:class="|\G(?!^))(?:(?!st)[^"])*\Kst`](https://regex101.com/r/nE1jH0/1) should I post answer if you need explanation? – bobble bubble Jul 06 '16 at 19:43
  • @bobblebubble you should make that an answer – Will Barnwell Jul 06 '16 at 19:46
  • 1
    I forgot about \K, which is a serious regex ninja tool – Will Barnwell Jul 06 '16 at 19:48
  • @bobblebubble yes, make that an answer. – masahs Jul 06 '16 at 19:54
  • @bobblebubble - just tested the regular expression within my shell command and it worked flawlessly. Thanks. – masahs Jul 06 '16 at 20:43
  • @masha glad it's working for your needs (: you're welcome. – bobble bubble Jul 07 '16 at 09:04

1 Answers1

1

You can use a regex based on \G to chain matches.

(?:class="|\G(?!^))(?:(?!st)[^"])*\Kst
  • (?: opens a non capturing group for alternation.
  • (?:class="|\G(?!^)) the first part is to set where the match starts. \G would also match the beginning of the string. To prevent this the negative lookahead (?!^) is used.
  • (?:(?!st)[^"])* this part is to match any amount of characters that are not " and prevent skipping of st by use of a negative lookahead (?!st)
  • \K resets beginning of the reported match.

Here is the demo at regex101. It is probably a rather advanced pattern. SO has a nice regex faq.

Community
  • 1
  • 1
bobble bubble
  • 11,968
  • 2
  • 22
  • 34