-1

I have a scenario where I want the grab the data separated by | from the string as list elements using regex.

str = "| id_number | Category | Description |"
match = re.search(r"^\|(.*)\|", str)

But not getting proper results. Can anyone please help out?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397

3 Answers3

0

You can use split function by delimiter to get the values like below.

>>> s = "| id_number | Category | Description |"
>>> match = s.strip(" | ").split(" | ")
>>> match
['id_number', 'Category', 'Description']

As you are specifically asking for regex, then findall can give all the matching patterns in a list from the given string like below. The value you want to fetch needs to be inside parentheses, otherwise the function will return the whole matching pattern.

>>> import re
>>> s = "| id_number | Category | Description |"
>>> match = re.findall(r'\|\s+(\w+)\s+', s)
>>> match
['id_number', 'Category', 'Description']
Surajit Mitra
  • 404
  • 6
  • 12
  • 1
    While this code may resolve the OP's issue, it is best to include an explanation as to how your code addresses the OP's issue. In this way, future visitors can learn from your post, and apply it to their own code. SO is not a coding service, but a resource for knowledge. Also, high quality, complete answers are more likely to be upvoted. These features, along with the requirement that all posts are self-contained, are some of the strengths of SO as a platform, that differentiates it from forums. You can edit to add additional info &/or to supplement your explanations with source documentation. – SherylHohman Jun 07 '20 at 01:55
0

Try (?:^|(?<=\|))([^|]*?)(?=\||$) with findall

is split simulation but better.
should not consume pipe, only check if there using assertion

demo


if requirement tha field must be surrounded by pipe thin
it this (?<=\|)([^|]*?)(?=\|)

demo2

0

Don't name a variable str

str is a builtin, which you'll no longer be able to use if you mask it with a variable named str.

Issues with current regex

You asked why you are not getting proper results. One reason is that your regex is greedy. The (.*) will match | too.

A second challenge is that Python does not support repeated capturing groups.

A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data (regex101.com).

An easier way is to split based on the delimiter.

With str.split()

>>> s = "| id_number | Category | Description |"
>>> s.strip("| ").split(" | ")
['id_number', 'Category', 'Description']

With re.split()

The str.split() solution above assumes the exact delimiter |. Alternatively you could use:

>>> re.split(r"\s+\|\s+", s.strip("| "))
['id_number', 'Category', 'Description']

to account for extra whitespace.

Brad Solomon
  • 29,156
  • 20
  • 104
  • 175