1

I have a pricelist that I would like to 'normalize', using the Javascript flavor of Regex.

Sample input:

1
1,99
1.99
10
100
5999 dollars
2 USD
$2,99
Our price 2.99
Price: $ 20
200 $
20,-
6 999 USD

Desired output:

1
1,99
1.99
10
100
5999
2
2,99
2.99
20
200
20
6999

I am getting rather good results with /([0-9.,\s]+)/ but I've got two problems:

  • The last sample line returns 6 instead of 6 999. I am not sure if it's possible to "remove" the space, preferably I would like to get 6999 but 6 999 is close enough.

  • Second last line returns 20, (which is logical since I include commas) but rather want 20 only in these cases.

Liu Kang
  • 1,209
  • 3
  • 20
  • 37
  • I am not a regex expert. But, you can add a `or` for `,` followed by any digit. Also why `\s`, if you do not want to include whitespace ? – Jashwant May 21 '14 at 20:45
  • is `5 999` coming out correctly but `6 999 USD` not? – adamdc78 May 21 '14 at 20:54
  • Also worth noting, the \s will match a newline (at least on [regex101](http://regex101.com/r/tD5wR1)), so you'll get a match like of `1.996 999` even without the m flag (maybe a regex101 bug?). – adamdc78 May 21 '14 at 20:58
  • What JavaScript function are you calling with the input string and the regular expression? – Dan Korn May 21 '14 at 21:02
  • @Jashwant Re-phrased first problem, see above. – Liu Kang May 21 '14 at 21:18
  • FYI added
     to the online demo so you can clearly see the output like you want it.
    – zx81 May 21 '14 at 21:27

5 Answers5

2

Here is a fiddle: http://jsfiddle.net/8h8Tk/

If you really wanted to normalize your input, I would suggest you choose either , or . for your decimal value separator. However, if not, the jsfiddle above gives the correct output.

var output = input.replace(/[^0-9\.,\n]|,[^0-9]/g, "");

All it does is remove the characters you don't want.

unclekyky
  • 180
  • 6
0

If you don't mind doing it in two steps, first convert all commas to dots:

x = x.replace(/,/g, '.')

Then get rid of everything else:

x = x.replace(/[^.|0-9]+/g,'')
punund
  • 3,648
  • 2
  • 28
  • 42
  • Doesn't look as though they want to replace commas with periods, both appear to be valid in the sample data above. – adamdc78 May 21 '14 at 20:59
  • Then it's even simpler: `x = x.replace(/[^.|,|0-9]+/g,'')` – punund May 21 '14 at 21:01
  • I would do this: `input.replace(/[^0-9\.,\n]|,[^0-9]/g, "")`. It removes everything that is not a number, dot, comma, or newline and it removes commas if not followed by a number. – unclekyky May 21 '14 at 21:06
0

Replace what you don't want:

result = subject.replace(/[^\d.,]+/g, "");
Ron Rosenfeld
  • 40,315
  • 6
  • 22
  • 49
0

Here's a version that is straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc

The regex: (?:\d|[.,](?=\d))+|(\w+|.)

The left side of the alternation matches characters we want: digits, or dots and commas followed by digits. The right side matches and captures word characters or a single character, and we know these are not characters we want because they were not matched by the expression on the left.

When Group 1 is set, we replace with an empty string.

See the output in the online demo

<script>
var subject = "1 \n\
1,99 \n\
1.99 \n\
10 \n\
100 \n\
5 999 \n\
2 USD \n\
$2,99 \n\
Our price 2.99 \n\
Price: $ 20 \n\
200 $ \n\
20,- \n\
6 999 USD";

var regex = /(?:\d|[.,](?=\d))+|(\w+|.)/g;
replaced = subject.replace(regex, function(m, group1) {
    if (group1 == "" ) return m;
    else return "";
});
document.write("<pre>");
document.write(replaced);
document.write("</pre>");
</script>

The Output

1
1,99
1.99
10
100
5999
2
2,99
2.99
20
200
20
6999
Community
  • 1
  • 1
zx81
  • 38,175
  • 8
  • 76
  • 97
0

How about /((?:[\d.,\s]+)?[\d]+)\b/g It extends from your original version

Dalorzo
  • 19,312
  • 7
  • 50
  • 97