0

This type of question is already asked many times, however I could not get the answer according to my needs.

I know some way of splitting strings in R. If I have a string x <- "AGCAGT", and want to split the string into characters of three. I would do this by

substring(x, seq(1, nchar(x)-1, 3), seq(3, nchar(x), 3))

and string of two character much faster by

split <- strsplit(x, "")[[1]]
substrg <- paste0(split[c(TRUE, FALSE)], split[c(FALSE, TRUE)])

As a new user of R, I feel difficulty to split string according to my requirements. If x <- "AGCTG" and if I use substring(x, seq(1, nchar(x)-1, 3), seq(3, nchar(x), 3)), I do not get the last two character substring. I get

"AGC" ""

However I am interested to get something like

"AGC" "TG"

or if I have x <- "AGCT" and splitting 3 characters at a time, I would like to get some thing like

"AGC" "T"`

I short, how to split a string into substrings of desired length (2,3,4,5...n), and also retaining those remaining characters less than the desired length.

nicola
  • 21,926
  • 2
  • 28
  • 48

2 Answers2

1

Here is one possible solution in a few simple steps.

x <- "AGCGGCCAGCTGCCTGAA"

# desired length
mylen = 5

# start indices
start <- seq(1, nchar(x), mylen)

# end indicies
end <- pmin(start + mylen - 1, nchar(x))

substring(x, start, end)
[1] "AGCGG" "CCAGC" "TGCCT" "GAA" 
cdeterman
  • 17,900
  • 5
  • 62
  • 91
1

Answer by zx8754. But unfortunately he deleted the answer after some marked the question as duplicate. If he would like to post this as an answer, I'l delete my post.

x <- "AGCGGCCAGCTGCCTGAA"
mylen <- 5 
ss <- strsplit(x, "")[[1]]
sapply(split(ss, ceiling(seq_along(ss)/mylen)), paste, collapse = "")