2

I have a Regex to remove non-numerical characters prior to parsing a decimal number.

I use the following code

Regex.Replace(myStr, "[^0-9.]", "");

Now this works for decimal numbers, but it removes the "sign" character, i.e. output for "A16.1" and "A-16.1" returns both "16.1"...

Using following edited version seems to work

Regex.Replace(myStr, "[^-0-9.]", "");

But being unfamiliar with Regex, can an experienced user confirm this is the right expression...?

neggenbe
  • 1,289
  • 15
  • 38

2 Answers2

4

I suggest

 -?[0-9]+(\.[0-9]+)?

pattern, i.e. removing decimals will be

 string result = Regex.Replace(myStr, @"-?[0-9]+(\.[0-9]+)?", "");

explanation:

 -?           one or zero minus sign "-" - sign
 [0-9]+       at least one digit
 (\.[0-9]+)?  followed by one or none 
              fractional part (decimal separator and at least one digit)

In case you want to obtain (not remove) numbers, use Matches:

 string myStr = "-1,2.3.de2.43.";

 string[] numbers = Regex
   .Matches(myStr, @"-?[0-9]+(\.[0-9]+)?")
   .OfType<Match>()
   .Select(match => match.Value)
   .ToArray(); 

 // Test
 Console.Write(string.Join(Environment.NewLine, numbers));

the outcome is

 -1
 2.3
 2.43
Dmitry Bychenko
  • 149,892
  • 16
  • 136
  • 186
  • Thanks +1 for the explanation. As a side note: forcing the decimal part makes it fail to handle the case where there is NO fractional part - so I'll actually use `@"[^-0-9.]"`. – neggenbe Dec 19 '16 at 14:12
  • 1
    @neggenbe: please, notice `?` in the `(\.[0-9]+)?` which means *zero or one* fractional part; so *no fractional part* case is covered – Dmitry Bychenko Dec 19 '16 at 14:14
  • Could you extend this to handle .1 as 0.1 and 1. as 1.0 (at least the Python interpreter deals with these as decimal numbers). – CodeMonkey Dec 19 '16 at 14:46
  • 1
    @CodeMonkey: in case `.1` as well as `1.` are valid floating points, `@"-?([0-9]+(\.[0-9]*)?)|([0-9]*\.[0-9]+)"` pattern will do – Dmitry Bychenko Dec 19 '16 at 14:51
2

In the expression [^-0-9.], the hyphen character has a special meaning within the square brackets... unless it comes at the very beginning or end of those square brackets. The - character here means that it accepts a range: in this case, anything between a literal 0 and a literal 9 as in 0-9.

However, when the hyphen is either first or last, it has nothing to go "from" (or "to"), so it cannot be treated as a "range" and is therefore parsed to be the - character.

I have found that being slightly more verbose and escaping the hyphen allows a user to place the hyphen anywhere within the square character group block, and not worry that it accidentally be parsed as a "range" indicator: [^\-0-9.] or [^0-9\-.] or [^0-9.\-]

What you have above works correctly because of the placement of the hyphen either at the beginning or end, where you do not need to explicitly escape the character, but it may be easier to read (and expand in the future) if you go with an escaped version so you (or other users) know that the hyphen should be used literally as a hyphen character.

OnlineCop
  • 3,799
  • 19
  • 33