2

I was just practicing regex and found something intriguing

for a string

"world9 a9$ b6$" my regular expression "^(?=.*[\\d])(?=\\S+\\$).{2,}$"

will return false as there is a space in between before the look ahead finds the $ sign with at least one digit and non space character.

As a whole the string doesn't matches the pattern.

What should be the regular expression if I want to return true even if a substring follows a pattern? as in this one a9$ and b6$ both follow the regular expression.

Varun Sharma
  • 130
  • 7
  • Related: https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean – Lino Nov 30 '20 at 15:09
  • 1
    Replace `(?=\\S+\\$)` with `(?=.*?\\S\\$)`? – Wiktor Stribiżew Nov 30 '20 at 15:10
  • The digit and `$` by them selves are already 2 characters, so you can also directly match it. `^(?=.*\d).*\S\$.*$` – The fourth bird Nov 30 '20 at 15:13
  • @WiktorStribiżew it does return true, can you explain what it is doing? – Varun Sharma Nov 30 '20 at 15:16
  • Matches a non-whitespace + `$` anywhere after any 0 or more chars other than line break chars. – Wiktor Stribiżew Nov 30 '20 at 15:17
  • wait, I got it, is it just that now it will look for 0 or more characters with have a nonspace word and $ after wards? Thanks I understood the concept now, what is the need of ? though?? the second one how is .*? doing something different from .* – Varun Sharma Nov 30 '20 at 15:19
  • It is basically the same. I just assumed the `\S\$` pattern is closer to the start of string. `.*` is used when we assume the next subpatterns appear close the end of string. – Wiktor Stribiżew Nov 30 '20 at 15:20
  • How does this change based on assumption help? Is it more efficient? if yes, how? – Varun Sharma Nov 30 '20 at 15:21
  • `.*` and `.*?` are the same in efficiency terms when we do not know where matches occur. But if you know that, you may use either `.*` (matches closer to the end) or `.*?` (matches closer to the start). – Wiktor Stribiżew Nov 30 '20 at 15:23
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225328/discussion-between-varun-sharma-and-wiktor-stribizew). – Varun Sharma Nov 30 '20 at 15:23
  • I think the big thing is you need to use Matcher#find() rather than Matcher#match(). – NomadMaker Dec 01 '20 at 08:36

2 Answers2

3

You can use

^(?=\D*\d)(?=.*\S\$).{2,}$

See the regex demo. As The fourth bird mentions, since \S\$ matches two chars, you may simply move the pattern to the consuming part, and use ^(?=\D*\d).*\S\$.*$, see this regex demo.

Details

  • ^ - start of string (implicit if used in .matches())
  • (?=\D*\d) - a positive lookahead that requires zero or more non-digit chars followed with a digit char immediately to the right of the current location
  • (?=.*\S\$) - a positive lookahead that requires zero or more chars other than line break chars, as many as possible, followed with a non-whitespace char and a $ char immediately to the right of the current location
  • .{2,} - any two or more chars other than line break chars, as many as possible
  • $ - end of string (implicit if used in .matches())
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
-1

Mostly, knock out the ^ and $ bits, as those force this into a full string match, and you want substring matches. In general, look-ahead seems like a mistake here, what are you trying to accomplish by using that? (Look-ahead/look-behind is rarely needed in general). All you need is:

Pattern.compile("\\S+\\$");

possibly, if you want an element (such as a9$) to stand on its own, use \b which is regexpese for word break: Basically, whitespace (and a few other characters, such as underscores. Most non-letter, non-digits characters are considered a break. Think [^a-zA-Z0-9]) - but \b also matches start/end of input. Thus:

Pattern.compile("\\b\\S+\\$\\b")

still matches foo a9$ bar, or a9$ just fine.

If you MUST put this in terms of a full match, e.g. because matches() (which always does a full string match) is run and you can't change that, well, put ^.* in front and .*$ at the back of it, simple as that.

Absolutely nothing about this says "This can only be needed with lookahead".

rzwitserloot
  • 44,252
  • 4
  • 27
  • 37