0

I have this text "£24,250.00 (inc. VAT)"

I want a regex that will show ONLY "24250.00"

I've managed to get the last portion with:

( \(inc\. VAT\))

And separately I can get the £ and , with:

[£,]

But I can't seem to work out how to combine both expressions to just return what I want.

Note that the number is dynamic so will change depending on applicable costs on a website.

In theory I could just run it through two separate regex in my c# code each one trimming what I want. But is there a way that it can be done with just one expression?

Reason for this is I have a GetConvertedExtension method that takes an IWebElement, a string (the regex) and then converts the string to Double, Int etc

I don't really want to change this extension method or avoid using and going down the root of multiple expressions and then a parse statement.

I've used https://regexr.com/ to try getting a working solution but with no luck and starting to struggle.

I'm using Visual Studio 2017 and C# with the Regex library

Uwe Keim
  • 36,867
  • 50
  • 163
  • 268
Scott
  • 73
  • 7
  • 1
    You could use match £ and the comma and use 2 capturing groups for what you want to keep `£(\d+),(\d+\.\d+) \(inc\. VAT\)` [demo](https://regex101.com/r/OTQEZ0/1) Do the numbers always end with a dot and 2 digits (or more)? – The fourth bird Feb 13 '19 at 16:03
  • I think you could probly copy this post: https://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex – BugCatcherJoe Feb 13 '19 at 16:05

2 Answers2

1

If you want to use a single regex, you could use 2 capturing groups:

£(\d+),(\d+\.\d+) \(inc\. VAT\)

Then you could use group1 and group2 after each other to get your value.

If the decimal part after the dot can contain only 2 digits, replace the last \d+ with \d{2}

For example:

string pattern = @"£(\d+),(\d+\.\d+) \(inc\. VAT\)";
string input = @"£24,250.00 (inc. VAT)";

foreach (Match m in Regex.Matches(input, pattern))
{
    Console.WriteLine(m.Groups[1].Value + m.Groups[2].Value);
}

Result

24250.00

See a .NET regex demo | C# Demo

The fourth bird
  • 96,715
  • 14
  • 35
  • 52
  • 1
    thanks - this has done the trick...and not too overly complicated. The grouping method used did the tricks...never thought to capture the individual groups and then recombine – Scott Feb 20 '19 at 08:19
-1

(?<currency>[£$€])(?<value>[0-9]{1,3}(?:,[0-9]{3})*\.[0-9]{2})\s\(inc\.\sVAT\)

I would use sometihng like this. I added the first capture group with currency just as I thought maybe this could be useful too? You'd just have to add which currency symbols you are interested in the square brackets.

In visual studio you:

var regex = new Regex(@"(?<currency>[£$€])(?<value>[0-9]{1,3}(?:,[0-9]{3})*\.[0-9]{2})\s\(inc\.\sVAT\)");

Then you do your regex.Match(data) or regex.Matches(data) or whatever you need to do.

Then to access the number in your match you need to access the value group so... match.Groups["value"].Value where match is what you've assigned to be your regex match.

Just to quickly run through the regex:

(?<currency>[£$€]) this is a named capture group which will capture £ or $ or literally.

(?<value>[0-9]{1,3}(?:,[0-9]{3})*\.[0-9]{2}) This is named capture group to get the number. Further breaking it down:

[0-9]{1,3} matches a digit from 0 to 9 between 1 and 3 (inclusive) times.
(?:,[0-9]{3})* matches the thousands seperated by commas 0 or more times.
\.[0-9]{2} matches the decimal point and two digits after.

\s\(inc\.\sVAT\) This matches literally the inc VAT bit after number. Using \s instead of as whitespace because I find it easier to read.

NOTE: this regex only works for this number format with a comma for every thousand and always includes the decimal.

JackPRead
  • 178
  • 8