2

I have a requirement to replace more than 2 continuous 1's with equal number of zeros. Currently, I can find the match as below but I don't know how to replace with the exact number of zeros as the match is found

ind<-c(1,1,0,0,0,1,1,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,1,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1,1,0)
gsub("([1])\\1\\1+","0",paste0(ind,collapse=""))

gives

"11000001100011010010101000101000"   

as it replaces the match with just one 0 but I need

"11000000001100011010010101000000010100000"
Saksham
  • 8,110
  • 6
  • 35
  • 63

2 Answers2

2

You can use the following gsub replacement:

ind<-c(1,1,0,0,0,1,1,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,1,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1,1,0)
gsub("1(?=1{2,})|(?!^)\\G1","0",paste(ind,collapse=""), perl=T)

See IDEONE demo, the result is [1] "11000000001100011010010101000000010100000".

The regex is Perl-based since it uses look-aheads and the \G operator.

This regex matches:

  • 1 - a literal 1 if...
  • (?=1{2,}) - it is followed by 2 or more 1s or...
  • (?!^)\\G1 - any 1 that is following the previous match.

For more details on the \G operator, see What good is \G in a regular expression? at perldoc.perl.org, and When is \G useful application in a regex? SO post.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • what if i want to do reverse. I want to replace sequence of 3 or less 1's with 0? – Saksham Sep 05 '15 at 07:42
  • and could you tell me what to google for `\\G1`. I am seeing this for the first time – Saksham Sep 05 '15 at 07:44
  • I added some links to the `\G` operator descriptions. When you need to replace sequences of 1 to 3 `1`s, you can use nested look-ahead with a look-behind to restrict the number of `1`s: [`(? – Wiktor Stribiżew Sep 05 '15 at 20:35
1

A solution not using regex but rle:

x = rle(ind)
x$values[x$lengths>2 & x$values] <- 0
inverse.rle(x)

#[1] 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
Colonel Beauvel
  • 28,120
  • 9
  • 39
  • 75