5

I have Unicode newline characters in a string in which I need to remove.

These characters can be carriage return \U000D, newline \U000A, line separator or paragraph separator.

I am able to remove the carriage return and newline characters by using the following.

gsub("\\s", "", x)

Like I said this works fine for those Unicode characters, but I am not able to remove the the line separator \U2028 or paragraph separator \U2029 characters.

Is there another way to do this?

user3856888
  • 229
  • 1
  • 7

1 Answers1

5

You can switch on PCRE using perl=T and utilize the handy escape sequence (\R)

> x <- 'foo\U000D\U000A bar\U2029 baz\U2028\U2029'
> x
## [1] "foo\r\n bar\u2029 baz\u2028\u2029"
> gsub('\\R', '', x, perl=T)
## [1] "foo bar baz"
Community
  • 1
  • 1
hwnd
  • 65,661
  • 4
  • 77
  • 114