0

I'm parsing a bunch of line items on an inventory list and while each line describes something similar, the text format was not standardized. I'm been working on a regex pattern for the past few days but I'm not having much luck with getting a pattern that can match all of my test scenarios. I hoping that someone with a lot more regex experience might be able to point out a few errors in the the pattern

Pattern To Match the palette number: \([Pp]alette [No\.\s]?#?(.*?)\),

1. Warehouse A, (Palette #91L41)
# Match Result Correct: 91L41

2. Warehouse B Palette No. 214
# Match Result Incorrect: no match

3. Warehouse Lot Storage C (Palette No. 9),
# Match Result Incorrect: o. 9 //I don't quite understand why it matches the o

4. Store Location D of Palette (Palette #1),
# Match Result Correct: 1

5. Store Location E of Palette, Empty, lot #45, 
# Match Result Incorrect: no match

I've also tried to make the parenthesis optional so that it will match examples 2 and 5 but it's too greedy and included the previously mentioned lot word

dirkoneill
  • 1,551
  • 17
  • 22

3 Answers3

2

Anything in brackets causes the engine to look for ONE of the provided characters. Your pattern successfully matches, for example, strings like: Palette Nabcdefg

To indicate one of different options, you'll need to use paranthesis. What you're actually looking for should look something like this: [Pp]alette (No\.?\s?|#)?(\d+?)

Though it seems highly ineffective to not standardize the pattern. Your last case for example could be completely incompatible since it seems to be capable of containing possibly any kind of input.

Kiruse
  • 1,613
  • 1
  • 12
  • 21
1

A little bit of explanation on matching your patterns with regular expressions. You really don't need to look for and match your parentheses ( .. ) in this case.

Let's say we want to just find any string with the word Palette that is followed with whitespace and the # symbol and capture the Palette sequence from it.

You could simply just use the following:

[Pp]alette\s+#([A-Z0-9]+)

This will result in capturing 91L41 and 1 from the matched patterns

1. Warehouse A, (Palette #91L41)
4. Store Location D of Palette (Palette #1)

Now say we want to find any string that has Palette, followed by whitespace and either a # symbol or No.

We can use a Non-capturing group for this. Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything.

So we could do something like:

[Pp]alette\s+(?:No[ .]+|#)([A-Z0-9]+)

Now this results in matching the following strings and capturing 91L41, 214, 9 and 1

1. Warehouse A, (Palette #91L41)
2. Warehouse B Palette No. 214
3. Warehouse Lot Storage C (Palette No. 9)
4. Store Location D of Palette (Palette #1)

And last if you want to match all the following strings and capture the Palette sequence.

[Pp]alette[\w, ]+(?:No[ .]+|#)([A-Z0-9]+)

See working demo and an explanation on this regular expression.

Everyone has a different way of using regular expressions, this is just one of many ways you can simply understand and accomplish this.

Community
  • 1
  • 1
hwnd
  • 65,661
  • 4
  • 77
  • 114
  • A ridiculously thorough and accurate answer. I didn't know about capturing groups although that is exactly what I was looking for. Thanks so much for taking the time to answer my question. – dirkoneill Sep 30 '13 at 13:41
1

This should work for your case:

[Pp]alette.*?(?:No\.?|#)\s*(\w+)

This will search following types of patterns:

  • [Pp]alette{any_characters}No.{optonal_spaces}(alphanumeric)
  • [Pp]alette{any_characters}No{optonal_spaces}(alphanumeric)
  • [Pp]alette{any_characters}#{optonal_spaces}(alphanumeric)

Check it in action here

MATCH 1
1.  [26-31]    `91L41`
MATCH 2
1.  [60-63]    `214`
MATCH 3
1.  [104-105]    `9`
MATCH 4
1.  [148-149]    `1`
MATCH 5
1.  [195-197]    `45`
jkshah
  • 10,615
  • 6
  • 32
  • 43
  • Thanks this came close but on match 5, group 0 also included the first occurrence of the word palette. The pattern below doesn't include it. – dirkoneill Sep 30 '13 at 13:50