12

I have the following string as input :

"2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101"

Final output will be like :

"2,3,-4,0,0,.03,2.01,.001,-.03,101"

i.e. all leading and trailing zeros will be removed and both positive/negative zeros will be simply zero.

We can achieve this by split the string first and using Regex for each part. But my string size is more than 10000.
How can we achieve this using Regex?

Edit:

Analysis of Answers:

I have tested all answers with String "0.00,-0.00,00.00,-00.00,40.00,-40.00,4.0,-4.0,4.01,-4.01,04.01,-04.01,004.04,-004.04,0004.040,-0004.040,101,.40,-.40,0.40,-0.40" and answer from Wiktor Stribiżew passed all the test cases .(see here : https://regex101.com/r/tS8hE3/9 ) Other answers were passed on most of the cases but not all.

codevscolor
  • 2,088
  • 14
  • 29
  • I have only split the string ...and next used regex for each parts separately. But that will not be efficient for large strings . How can I achieve this without split ? – codevscolor Jan 25 '16 at 05:33
  • how about you give this solution a try if it works for you http://stackoverflow.com/questions/5965767/performance-of-stringtokenizer-class-vs-split-method-in-java – Manish Singh Jan 25 '16 at 05:41
  • processing chars one by one and collecting them in `StringBuilder` would perform much faster and more space-efficient than regex – Alex Salauyou Feb 15 '16 at 15:23
  • After a fix this gets your output https://regex101.com/r/rQ2rG5/1. Just curious, since you gave stribnetz all the gold, is there any reason to believe your input is all valid numbers? You can parse text to convert it to a number. If its not valid it will throw an exception. Otherwise, this is an exercise in futility, ie. _why trim zero's from a number if its not a number_? If you don't do this, you must validate while you parse, which is what mine did. Take a look at this as an example of what will happen https://regex101.com/r/aH6gX0/1 –  Feb 17 '16 at 01:11

7 Answers7

3
\.0+$|^(-)?0+(?=\.)

You can try this.Replace by $1.if u get empty string or - after replacement replace it by 0.See demo.

https://regex101.com/r/cZ0sD2/7

If you want to do on full string use

-?0*\.0+\b|\.0+(?=,|$)|(?:^|(?<=,))(-)?0+(?=\.)

See demo.

https://regex101.com/r/cZ0sD2/16

vks
  • 63,206
  • 9
  • 78
  • 110
  • ok..it can be used..but i will have to split the string to substrings and use for all substrings separately. Instead of this, how can i directly check for strings between comma and make the modifications ? – codevscolor Jan 25 '16 at 06:36
  • Test string should be "2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101", not individual elements. – codevscolor Feb 11 '16 at 05:17
  • almost correct , but i will have to traverse the string one more time to replace - with 0. is there any option for this ? – codevscolor Feb 11 '16 at 06:07
  • yes..but "-" and blank spaces ? should we have to traverse the string one more time to replace it with 0 ? – codevscolor Feb 11 '16 at 06:37
  • sorry for removing accepted. ur answer does not work with .50 – codevscolor Feb 12 '16 at 06:01
  • .5 i.e. we will remove all extra zeros from a number. ( extra zero means without these zeros also, value of the number will be same) – codevscolor Feb 12 '16 at 06:05
  • :sorry but other answers have much better solution. (i.e. without traversing two times.) – codevscolor Feb 16 '16 at 12:58
3

Updated test case answer

Use the following regex:

String rx = "-?0+\\.(0)+\\b|\\.0+\\b|\\b0+(?=\\.\\d*[1-9])|\\b0+(?=[1-9]\\d*\\.)|(\\.\\d*?)0+\\b";

And replace with $1$2. See another demo.

The regex matches several alternatives and captures some parts of the string to later re-insert during replacement:

  • -?0+\.(0)+\b - matching an optional - followed with one or more 0s followed with a . and then captures exactly one 0 but matching one or more occurrences (because the (...) is placed on the 0 and the + is applied to this group); the word boundary at the end requires a non-word character to appear after the last matched 0. In the replacement, we restore the 0 with $1 backreference. So, -00.00 or 00.00 will be replaced with 0.
  • | - or...
  • \.0+\b - a dot followed with one or more zeros before a , (since the string is comma-delimited).
  • | - or...
  • \b0+(?=\.\d*[1-9]) - a word boundary (start of string or a location after ,) followed with one or more 0s that are followed by . + zero or more digits followed by a non-0 digit (so we remove leading zeros in the integer part that only consists of zeros)
  • | - or...
  • \b0+(?=[1-9]\d*\.) - a word boundary followed by one or more zeros followed by a non-0 digit before a . (so, we remove all leading zeros from the integer part that is not equal to 0).
  • | - or...
  • (\.\d*?)0+\b - capturing a .+zero or more digits, but as few as possible, up to the first 0, and then just matching one or more zeros (up to the end of string or ,) (so, we get rid of trailing zeros in the decimal part)

Answer before the test cases update

I suggest a very simple and short regex that does what you need:

-0+\.(0)+\b|\.0+\b|\b0+(?=\.\d*[1-9])

Replace with $1.

See the regex demo. Short IDEONE demo:

String re = "-0+\\.(0)+\\b|\\.0+\\b|\\b0+(?=\\.\\d*[1-9])"; 
String str = "2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101,0.001,-0.03";
String expected = "2,3,-4,0,0,.03,2.01,.001,-.03,101,.001,-.03"; 
System.out.println(str.replaceAll(re, "$1").equals(expected)); // TRUE

Explanation:

  • -0+\.(0)+\b - a minus followed with one or more 0s (0+) followed with a literal dot (\.) followed with one or more zeros (and capturing just the last 0 matched with (0)+) followed with a word boundary (location before , in this context)
  • | - or...
  • \.0+\b - a literal dot (\.) followed with one or more zeros followed with a word boundary (location before , in this context)
  • | - or...
  • \b0+(?=\.\d*[1-9]) - a word boundary (location after , in this context) followed with one or more zeros that must be followed with a literal dot (\.), then zero or more digits and then a digit from 1 to 9 range (so that the decimal part is more than 0).
Community
  • 1
  • 1
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
2

UPDATE to cover more cases such as 01.,.100, 01.10

(?<=,|^)(?:[0.+-]+(?=0(?:,|\.\B|$))|0+(?=[1-9]))|\.0+\b|\b0+(?=\d*\.\b)|\.\B|(?<=[1-9])0+(?=,|$)

This pattern requires more backtracking, thus can get slower on large input. Java String:

"(?<=,|^)(?:[0.+-]+(?=0(?:,|\\.\\B|$))|0+(?=[1-9]))|\\.0+\\b|\\b0+(?=\\d*\\.\\b)|\\.\\B|(?<=[1-9])0+(?=,|$)"

In addition to the previous pattern this one matches

  • (?<=,|^)(?:...|0+(?=[1-9])) add leading zeros preceding [1-9]
  • \.0+\b modified to match period with zeros only before a word boundary
  • \b0+(?=\d*\.\b) match zeros at boundary if period preceded by optional digits ahead
  • \.\B matches a period bordering to a non word boundary (eg .,)
  • (?<=[1-9])0+(?=,|$) matches trailing zeros following [1-9]

Demo at regex101 or Regexplanet (click Java)


Answer before update
You can also try replaceAll this regex with empty.

(?<=,|^)[0.+-]+(?=0(?:,|$))|\.0+\b|\b0+(?=\.)
  • (?<=,|^)[0.+-]+(?=0(?:,|$)) matches all parts that consist only of [0.+-] with at least a trailing zero. Limited by use of lookaround assertions: (?<=,|^) and (?=0(?:,|$))

  • |\.0+\b or match a period followed by one or more zeros and a word boundary.

  • |\b0+(?=\.) or match a boundary followed by one or more zeros if a period is ahead.

Unquestioned cases like 0.,01,1.10 are not covered by this pattern yet. As a Java String:

"(?<=,|^)[0.+-]+(?=0(?:,|$))|\\.0+\\b|\\b0+(?=\\.)"

Demo at regex101 or Regexplanet (click Java)

bobble bubble
  • 11,968
  • 2
  • 22
  • 34
  • yes..some parts are not covered. we need to remove all zeros that are not required – codevscolor Feb 12 '16 at 06:03
  • sorry for replying late . ur case is failing for cases that have more than two zero before decimal like : -004.04 , this value remains same – codevscolor Feb 16 '16 at 12:43
  • @nKaushik I see, did little modification. There are [other cases, that my answer treats different](https://regex101.com/r/sA7oE6/1) to the [selected answer](https://regex101.com/r/aO6nM1/1). – bobble bubble Feb 16 '16 at 14:05
1

Using the list of numbers from your question, and some additional ones, the following regex replace will remove all leading and trailing zeros.

numbers.replaceAll("\\b0*([1-9]*[0-9]+)(\\.[0-9]*[1-9])?\\.?0*\\b", "$1$2");

with input:

2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101,101.1010,0020.00

the result is:

2,3,-4,0,-0,0.03,2.01,0.001,-0.03,101,101.101,20

If you want to have decimals without the leading 0 then you can use the following.

numbers.replaceAll("\\b0*([0-9]+)(\\.[0-9]*[1-9])?\\.?0+\\b|0+(\\.[0-9]+?)0*\\b", "$1$2$3");

with input:

2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101,101.1010,0020.00

the result is:

2,3,-4,0,-0,.03,2.01,.001,-.03,101,101.101,20

roblovelock
  • 1,761
  • 1
  • 18
  • 39
0

You can do it with 2 times replacement :

first use \.0+(?=(,|$)) and replace with ""

then use (?!(^|,))-0(?=(,|$)) and replace it with "0"

nAviD
  • 1,779
  • 1
  • 16
  • 34
0

is it possible to just use replace? example:

str.replaceAll("\.0+,|,0+(?=\.)", ",");

demo

bmbigbang
  • 1,088
  • 1
  • 9
  • 15
0

/(?!-)(?!0)[1-9][0-9]*\.?[0-9]*[1-9](?!0)|(?!-)(?!0)\.?[0-9]*[1-9](?!0)/g

Ryan G
  • 71
  • 1
  • 11