1

Following Split string that used to be a list, I am doing this:

console.log(lines[line]);
var regex = /(-?\d{1,})/g;
var cluster = lines[line].match(regex);
console.log(cluster);

which will give me this:

((3158), (737))
["3158", "737"]

where 3158 will be latter treated as the ID in my program and 737 the associated data.

I am wondering if there was a way to treat inputs of this kind too:

((3158, 1024), (737))

where the ID will be a pair, and do something like this:

var single_regex = regex_for_single_ID;
var pair_regex = regex_for_pair_ID;
if(single_regex)
  // do my logic
else if(pair_regex)
  // do my other logic
else
  // bad input

Is that possible?


Clarification:

What I am interested in is treating the two cases differently. For example one solution would be to have this behavior:

((3158), (737))
["3158", "737"]

and for pairs, concatenate the ID:

((3158, 1024), (737))
["31581024", "737"]
Community
  • 1
  • 1
gsamaras
  • 66,800
  • 33
  • 152
  • 256
  • Let me clarify, you expect to get an input of `((3158, 1024), (737))` and you want to match `3158, 1024` as a single ID, is that correct? Or do you expect to match `3158` as an ID and `1024` as an ID and return two IDs? – VLAZ Sep 13 '16 at 18:59
  • Single or pair `(-?\d+)(?:\s*,\s*(-?\d+))?` or `-?\d+(?\s*,\s*-?\d+)?` or `(-?\d+(?\s*,\s*-?\d+)?)` It depends on what you want to see comma in the array, etc... –  Sep 13 '16 at 18:59
  • vlaz updated. @sln hmm is that correct? I mean is the regex everything you have in code? Both will result in a syntax error if I do `var regex = ...";` – gsamaras Sep 13 '16 at 19:03
  • It's pretty correct. In JS you can slice it like pie, take what you want. There are many ways to do it. –  Sep 13 '16 at 19:06
  • (_with delimiters_): Single or pair `/(-?\d+)(?:\s*,\s*(-?\d+))?/` or `/-?\d+(?\s*,\s*-?\d+)?/` or `/(-?\d+(?\s*,\s*-?\d+)?)/` It depends on what you want to see comma in the array, etc... –  Sep 13 '16 at 19:10
  • Be aware that factually, a lot of people that don't know how to process text, assume there exists a constant form to their data. The post the form with almost constant like precision thinking it is a helper to get a proper regex. The reality is they are not helping themselves because there is no constant form in generated text. This leads to errors and frustration. You should code regex specific to the _inner most_ constant form, while generalizing the outer form with a few pseudo anchors here and there... –  Sep 13 '16 at 19:20
  • Hmm I see @sln, it would be nice to summarize all your comments in answer! ;) – gsamaras Sep 13 '16 at 19:23
  • It's just a comment –  Sep 13 '16 at 19:26

2 Answers2

2

You may use an alternation operator to match either a pair of numbers (capturing them into separate capturing groups) or a single one:

/\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g

See the regex demo

Details:

  • \((-?\d+), (-?\d+)\) - a (, a number (captured into Group 1), a ,, space, another number of the pair (captured into Group 2) and a )
  • | - or
  • \((-?\d+)\) - a (, then a number (captured into Group 3), and a ).

var re = /\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g; 
var str = '((3158), (737)) ((3158, 1024), (737))';
var res = [];
while ((m = re.exec(str)) !== null) {
  if (m[3]) {
    res.push(m[3]);
  } else {
    res.push(m[1]+m[2]);
  }
}
console.log(res);
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • Wiktor great, that will do the trick, but isn't there a way to have `res` hold the other data as well? That is 737. So for the example, it would give `["3158", "737"]` and `["31581024", "737"]`, instead of `["3158"]` and `["31581024"]`. – gsamaras Sep 13 '16 at 19:10
  • Yes, whatever you push there. Feel free to adjust the code as per your needs. The point is that if Group 3 matched (`if (m[3])`) we know we have a single number, else, we have Group 1 and Group 2 that you may combine or do whatever you please. – Wiktor Stribiżew Sep 13 '16 at 19:19
2

For a simple way, you can use .replace(/(\d+)\s*,\s*/g, '$1') to merge/concatenate numbers in pair and then use simple regex match that you are already using.

Example:

var v1 = "((3158), (737))"; // singular string

var v2 = "((3158, 1024), (737))"; // paired number string

var arr1 = v1.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["3158", "737"]

var arr2 = v2.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["31581024", "737"]

We use this regex in .replace:

/(\d+)\s*,\s*/
  • It matches and groups 1 or more digits followed by optional spaces and comma.
  • In replacement we use $1 that is the back reference to the number we matched, thus removing spaces and comma after the number.
anubhava
  • 664,788
  • 59
  • 469
  • 547
  • Great, that will work..Could you please post an explanation of the regex? I am trying to understand them and learn... :) – gsamaras Sep 13 '16 at 19:26
  • Added a brief explanation for `.repalce` function. For `match` I'm reusing same regex you already have. – anubhava Sep 13 '16 at 19:31