Detect and replace multiple leading zeros in string

Question

I'm working with an address database. For further cleaning I need to identify the leading zeros that are stored in the string containing the door number. So a clean and friendly case would be something like 7/5 - where 7 would be the house number, 5 the door number.

The problem is that in some cases, there are leading zeros involved - not only at the beginning of the string, but also in the middle. And of course, there are also "normal" and necessary zeros. So I could have an address like 7/50, where the zero is totally fine and should stay there. But I could also have 007/05, where all the zeros need to go away. So the only pattern I can think of is to check wheter there is a number greater than zero before that zero. If no, then delete that zero, if yes, keep it.

Is there any function to achieve something like this? Or would this require something custom built? Thanks for any suggestions!

ThomasIsCoding · Answer 1 · 2020-12-09T09:54:49.090

1

You can try the code below base R option using gsub

> gsub("\\b0+", "", s)
[1] "1/1001001" "7/50"      "7/50"      "7/5"       "7/5"

with given

s <- c("01/1001001", "07/050", "0007/50", "7/5", "007/05")

edited Dec 09 '20 at 09:54

answered Dec 09 '20 at 09:20

ThomasIsCoding

53,240
4
13
45

Hmm. For `s – MrFlick Dec 09 '20 at 09:24
This looks very promising. Could you briefly explain? I look for 0 followed by 1-9 `"0+([1-9])"`, if found `"\\1"` – Jakob Dec 09 '20 at 09:26
@MrFlick Thanks for your reminder! I updated my answer, hope it works – ThomasIsCoding Dec 09 '20 at 09:27
Good point, @MrFlick. I just had a look, such numbers do exist. I will have a look at your suggestion next. – Jakob Dec 09 '20 at 09:28
@Jakob `\\1` means you keeps the pattern in the first `()` – ThomasIsCoding Dec 09 '20 at 09:30
@Jakob I updated my solution with a shorter one, so you can see if it works for you – ThomasIsCoding Dec 09 '20 at 10:56

score 0 · Accepted Answer · answered Dec 09 '20 at 09:19

0

Maybe a negative look behind will help

x <- c("7/50", "7/5", "007/05")
stringr::str_remove_all(x, "\\b(?<![1-9])0+")
# [1] "7/50" "7/5"  "7/5"

Hard to say for sure with such a limited set of test cases.

answered Dec 09 '20 at 09:19

MrFlick

163,738
12
226
242

Great answer, seems to work as intended. Could you maybe also briefly explain what the term `"\\b(? – Jakob Dec 09 '20 at 09:35
This doesn’t really have anything to do with R. stringr uses regular expression which is a syntax defined outside of R and used by many programming languages. Maybe https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean can help you. – MrFlick Dec 09 '20 at 09:37
That's perfect. I came across regex multiple times when reading answers - but I was not always sure what to make out of it! – Jakob Dec 09 '20 at 09:39

Detect and replace multiple leading zeros in string

2 Answers2