2

guys. I have a problem. My input strings looks like:

  • 1-000000.02
  • 1-000000.00
  • 1+000025.48
  • 1-000025.47
  • 1-000000.00
  • 1+000000.00
  • 1+000025.46

And I want to extract normilize (remove plus sign, remove leading zeros, but exclude one zero before dot) float numbers like this:

  • 0.02
  • 0.00
  • 25.48
  • -25.47
  • -0.00
  • 0.00
  • 25.46

I use next expression: 0*([0-9]+.?[0-9]+) (http://regexr.com/3dicv), it's works fine, but I can't catch minus sign ("25.47" instead "-25.47"). So if somebody point me to right way I will very grateful. Thanks.

  • What about `1?[+-]0*([0-9]+\.?[0-9]+)`? – Will Jun 04 '16 at 13:21
  • Does [What is a non capturing group? (?:)](http://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group) give you any clues? – Andrew Morton Jun 04 '16 at 13:21
  • Will, I used similar variant - in this case we lost "-" in negative numbers. – Sergey Alikin Jun 04 '16 at 13:45
  • Andrew, thanks for your comment. I'm using noncapturing groups in my expressions. But the problem that when number is negative leading zeros placed INSIDE capturing group (-000012.34), and in this case leading zeros will catch too( – Sergey Alikin Jun 04 '16 at 13:49
  • Can't you replace `.replace(/^1?([+-])0*([0-9]+(?:\.[0-9]+)?)$/, "$1$2")`? – Wiktor Stribiżew Jun 04 '16 at 13:51
  • Wiktor, thanks. Unfortunately - no. I can't use replace, my final goal - extract number without language-specific tools. Rather, this is my last posibility - use post-proccessing function for "normalizing" any input number, but I want to skip this variant very strong. – Sergey Alikin Jun 04 '16 at 13:56
  • 1
    You cannot use single regex match operation to match discontinuous texts. You have to use a replace or capture the different string parts to concatenate them later. – Wiktor Stribiżew Jun 04 '16 at 14:10
  • I was able to add an optional hyphen (`-?`) to the front of your regexr expression and match the minus signs. – aghast Jun 04 '16 at 15:20
  • Is there a reason for keeping `-` before `0.00` ? I'd more think of [something like this](https://regex101.com/r/fP9bE6/1) but useless if you can't use replace (: – bobble bubble Jun 04 '16 at 17:45

2 Answers2

2

This works in regexr.com

\d[+]?([-]?)0*(\d+\.\d+)

Replace:

$1$2

For these values:

1+000050.93
1-000025.47
1+000000.00
1-000000.00
1+000000.02
1-000000.02
1+100025.47
1-100025.48

It returns :

50.93
-25.47
0.00
-0.00
0.02
-0.02
100025.47
-100025.48

But I don't see why the trailing zero's are an issue when you would use it in C++.
This C++ seems to extract and parse it to a double just fine.

#include <iostream>
#include <string>
#include <cstdlib>
#include <regex>
using namespace std;

int main() {
    string teststring("1+000001.62");

    regex re("1([+-][0-9]+[.][0-9]{2})");
    smatch match;

    string resultstring = regex_replace(teststring, re, "$1" );
    double value = std::atof(resultstring.c_str());
    cout << value;

    return 0;
}
LukStorms
  • 19,080
  • 3
  • 26
  • 39
  • This regex requires anchors or it will might match what is not required – Wiktor Stribiżew Jun 04 '16 at 14:20
  • Luk, thanks for your example - it's a very good, but it produce TWO separate capturing group, not one. I agree with Wiktor from last comment in init post - this task is unsolveable with regex. – Sergey Alikin Jun 04 '16 at 14:22
  • So the language you would be using it for can't handle multiple capture groups? A pity if that's the case. – LukStorms Jun 04 '16 at 14:25
  • Luk, of course, my language (C++, boost, Qt) allow to do any actions with regexs, but I want to have unificate method for proccessing input strings from different instruments, for execute this condition I need to have one capturing group. So as we found out that my task is unsolveable, I will use post-proccessing function where will be placed replace() and all others. – Sergey Alikin Jun 04 '16 at 14:30
  • You could use something like this 1([+-]\d+\.\d{2}) That only has 1 capture group. but it will contain the leading zero's. But would that really be a problem to cast to a number in c++ ? – LukStorms Jun 04 '16 at 14:35
1

Description

You just have to validate your string with a look ahead, then match the substrings you want removed while capturing the minus sign if it were there.

(?=^[0-9][+-][0-9]+\.[0-9]{2}$)(?:[0-9]+(?:(-)|\+))0+(?!\.)

Replace with: $1

Regular expression visualization

This regular expression will do the following:

  • validate your string is in the format integer plus or minus real with two decimal points
  • replaces everything else that is not desirable, like the leading integer, and zeros before the decimal point, not including the zero directly before the decimal point.

Example

Live Demo

https://regex101.com/r/mP4gH1/2

Sample text

1-000000.02
1-000000.00
1+000025.48
1-000025.47
1-000000.00
1+000000.00
1+000025.46

After Replacement

-0.02
-0.00
25.48
-25.47
-0.00
0.00
25.46

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    ^                        the beginning of a "line"
----------------------------------------------------------------------
    [0-9]                    any character of: '0' to '9'
----------------------------------------------------------------------
    [+-]                     any character of: '+', '-'
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
    $                        before an optional \n, and the end of a
                             "line"
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      (                        group and capture to \1:
----------------------------------------------------------------------
        -                      '-' character
----------------------------------------------------------------------
      )                        end of \1
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      \+                       '+' character
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  0+                       '0' (1 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
Ro Yo Mi
  • 13,586
  • 4
  • 31
  • 40