10

We will use below regex to get the digits before the words.

Example :

838123 someWord 8 someWord 12 someWord

(\d+)\s*someWord

But sometimes anything will come between Number and word.Please see the below example line.

Ex:

43434 of someword 12 anything someword 2323 new someword

How to get the exact digit before that word using regex?

Please give me your suggestions.

LPLN
  • 626
  • 6
  • 19
bala k
  • 135
  • 6
  • 3
    It looks like the existing posts answer your question. Please let the answerers and future readers know if you find the answers useful (Take a [tour]). Otherwise please provide more details about what you are looking for and why the answers are not suit your case. – Reza Aghaei Dec 25 '19 at 11:21
  • 1
    Not clear what you're asking... – JohnyL Dec 28 '19 at 13:44

8 Answers8

14

Do this:

(\d+)[^\d]+some[wW]ord

You need to accept anything other than digits themselves. Also I considered both w and W since your examples contained both.

Demo

CinCout
  • 8,291
  • 9
  • 47
  • 55
  • Are you looking for a different answer? – Reza Aghaei Dec 25 '19 at 10:16
  • @RezaAghaei maybe – CinCout Dec 25 '19 at 10:21
  • What are the problems with the current answer that you provided? What improvements are you looking for? – Reza Aghaei Dec 25 '19 at 10:28
  • @RezaAghaei That fact that OP hasn't accepted any answer yet makes me think if I missed a corner case or something. Alternate approaches to solve it are also welcome. – CinCout Dec 25 '19 at 11:17
  • 1
    @CinCout-ReinstateMonica Please see [my answer](https://stackoverflow.com/questions/59317088/how-to-get-the-digits-before-some-particular-word-using-regex-in-c/59482240#59482240) for a possible missed edge case (not sure if this is relevant to the OP). – Steve Chambers Dec 25 '19 at 22:26
4

Presuming that "anything" does not include digits, you could use this regex:

(\d+)[^\d]+someWord

Demo on regex101

Nick
  • 118,076
  • 20
  • 42
  • 73
3

One possible "missed corner case" from CinCout's answer is if the match for someWord must be exact, e.g. if notsomeWord and someWordNotThis shouldn't be matched.

The following extension to that regular expression provides a way to address this:

(\d+)[^\d]*[^\w]some[wW]ord[^\w]

Explanation: The [^\w] before or after the matcher for someWord look for a "non-word character" before and after it - an end of the line also counts here. This could of course be made more complex/specific, depending on the exact requirements.

Demo

Steve Chambers
  • 31,993
  • 15
  • 129
  • 173
3

You could try something like this:

(\d+)\s?([^\d]*)

(\d+)    - get the digits
\s?      - discard a possible space
([^\d]*) - get all chars that are not digits

You can see the test here

3

first separated the some[wW]ord, number and space with a pattern, then execute the second pattern on it

 var pattern = @"\b(some[wW]ord|[\d]|\s)*\b";
 var rgx = new Regex(pattern);
 var sentence = "43434 of someword 12 anything someword 2323 new someword";
 var result = string.Empty;
 foreach (Match match in rgx.Matches(sentence)){
     result += match.Value;
}
//output => result: 43434 someword 12 someword 2323 someword

 var patternOnCorrectSentence = @"\b(\d+)\s*some[wW]ord*\b";
 var rgxOnCorrectSentence = new Regex(patternOnCorrectSentence);

 var resultOnCorrectSentence = new List<string>();
 foreach (Match match in rgxOnCorrectSentence.Matches(result)){
     resultOnCorrectSentence.Add(match.Value);
 }
 resultOnCorrectSentence.ForEach(Console.WriteLine);

 Console.ReadKey();

When the first pattern is executed, the sentence will be as desired

43434 of someword 12 anything someword 2323 new someword

change:

43434 someword 12 someword 2323 someword

Reza Jenabi
  • 2,706
  • 18
  • 27
2

But sometimes anything will come between Number and word.Please see the below example line.

Ex:

43434 of someword 12 anything someword 2323 new someword

try this

(\d+)(.*?)someword

Explained

\d+ - numbers

.*? - anything after numbers but minimum occurrence.

someword - exact match of somewhat

Demo

Community
  • 1
  • 1
Rajesh G
  • 512
  • 2
  • 9
2

Using \s* will only match 0 or more whitespace characters.

You could use \D+ but it will also match newlines as it matches any char except a digit.

If you want to match the digits on the same line, you can add not matching a newline to a negated character class [^\d\r\n]

In your example, you use \d, but if you only want to match 1 or more digits 0-9 you could use a character class [0-9]+

To prevent the digits and the word being part of a larger word you could make use of word boundaries \b

If you want to match the word in a case insensitive manner, you could use RegexOptions.IgnoreCase or an inline modifier (?i)

(?i)\b([0-9]+)\b[^\d\r\n]*\bsomeword\b

See a .NET regex demo

The fourth bird
  • 96,715
  • 14
  • 35
  • 52
2

Use Named Match Captures (To get data use mtch.Groups["Value"].Value... etc) to extract the information as needed.

(?<Value>\d+)     -- Get the digits
(?<Other>.+?)     -- Capture all text, but minimal (greedy) capture
(?<Key>someword)  -- til the keyword here.

When the above is run (with IgnorePatternWhiteSpace otherwise remove the comments and join the pattern to run it such as (?<Value>\d+)(?<Other>.+?)(?<Key>someword) with no regex options) it gets the data for each Data/Key pairs and organizes each in a single match.

Result

Here is the result (for your second example) which are all contained in individual matches and their groups and captures provide in each match:

Match #0
              [0]:  43434˽of˽someword
  ["Value"] → [1]:  43434
      →1 Captures:  43434
  ["Other"] → [2]:  ˽of˽
      →2 Captures:  ˽of˽
    ["Key"] → [3]:  someword
      →3 Captures:  someword
Match #1
              [0]:  12˽anything˽someword
  ["Value"] → [1]:  12
      →1 Captures:  12
  ["Other"] → [2]:  ˽anything˽
      →2 Captures:  ˽anything˽
    ["Key"] → [3]:  someword
      →3 Captures:  someword
Match #2
              [0]:  2323˽new˽someword
  ["Value"] → [1]:  2323
      →1 Captures:  2323
  ["Other"] → [2]:  ˽new˽
      →2 Captures:  ˽new˽
    ["Key"] → [3]:  someword
  →3 Captures:  someword

Visually here is what is matched:

enter image description here

ΩmegaMan
  • 22,885
  • 8
  • 76
  • 94