-1

I'm working with an address database. For further cleaning I need to identify the leading zeros that are stored in the string containing the door number. So a clean and friendly case would be something like 7/5 - where 7 would be the house number, 5 the door number.

The problem is that in some cases, there are leading zeros involved - not only at the beginning of the string, but also in the middle. And of course, there are also "normal" and necessary zeros. So I could have an address like 7/50, where the zero is totally fine and should stay there. But I could also have 007/05, where all the zeros need to go away. So the only pattern I can think of is to check wheter there is a number greater than zero before that zero. If no, then delete that zero, if yes, keep it.

Is there any function to achieve something like this? Or would this require something custom built? Thanks for any suggestions!

Jakob
  • 163
  • 6

2 Answers2

1

You can try the code below base R option using gsub

> gsub("\\b0+", "", s)
[1] "1/1001001" "7/50"      "7/50"      "7/5"       "7/5"

with given

s <- c("01/1001001", "07/050", "0007/50", "7/5", "007/05")
ThomasIsCoding
  • 53,240
  • 4
  • 13
  • 45
0

Maybe a negative look behind will help

x <- c("7/50", "7/5", "007/05")
stringr::str_remove_all(x, "\\b(?<![1-9])0+")
# [1] "7/50" "7/5"  "7/5"

Hard to say for sure with such a limited set of test cases.

MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • Great answer, seems to work as intended. Could you maybe also briefly explain what the term `"\\b(? – Jakob Dec 09 '20 at 09:35
  • This doesn’t really have anything to do with R. stringr uses regular expression which is a syntax defined outside of R and used by many programming languages. Maybe https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean can help you. – MrFlick Dec 09 '20 at 09:37
  • That's perfect. I came across regex multiple times when reading answers - but I was not always sure what to make out of it! – Jakob Dec 09 '20 at 09:39