0

I want to return all the words which start and end with letters or numbers. They may contain at most one period . OR hypen -in the word. So, ab.ab is valid but ab. is not valid.

import re
reg = r"[\d\w]+([-.][\d\w]+)?"
s = "sample text"
print(re.findall(reg, s))

It is not working because of the parenthesis. How can I apply the ? on combination of [-.][\d\w]+

aste123
  • 1,122
  • 2
  • 16
  • 36

3 Answers3

4

If ab. is not valid and should not be matched and the period or the hyphen should not be at the start or at the end, you could match one or more times a digit or a character followed by an optional part that matches a dot or a hyphen followed by one or more times a digit or a character.

(?<!\S)[a-zA-Z\d]+(?:[.-][a-zA-Z\d]+)?(?!\S)

Regex demo

Explanation

  • (?<!\S) Negative lookbehind to assert that what is on the left is not a non whitespace character
  • [a-zA-Z\d]+ Match one or more times a lower/uppercase character or a digit
  • (?:[.-][a-zA-Z\d]+)? An optional non capturing group that would match a dot or a hypen followed by or more times a lower/uppercase character or a digit
  • (?!\S Negative lookahead that asserts that what is on the right is not a non whitespace character.

Python demo

The fourth bird
  • 96,715
  • 14
  • 35
  • 52
1

Of course, don't make the group capturing. Use (?:pattern) instead of (pattern):

import re
reg = r"[\d\w]+(?:[-.][\d\w]+)?"
s = "sample text"
print(re.findall(reg, s))

Output:

['sample', 'text']
iBug
  • 30,581
  • 7
  • 64
  • 105
0

Make it a non-capturing group instead, so that there won't be any capturing groups, which will ensure that the full match is grabbed by re.findall:

reg = r"[\d\w]+(?:[-.][\d\w]+)?"
CertainPerformance
  • 260,466
  • 31
  • 181
  • 209