Questions tagged [strsplit]

strsplit is a function in R and MATLAB which splits the elements of a character vector around a given delimiter.

strsplit is a function in R (documentation) and MATLAB (documentation), which splits the elements of a character vector into substrings:

# R:  
strsplit(x, split, fixed=FALSE)
% MATLAB
strsplit(x, split);

Splits a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list (R) or cell array (MATLAB), where each list item corresponds to an element of x that has been split.

  • x a character string or vector of character strings to split.
  • split the character string to split x.
    In R, if split is an empty string (""), then x is split between every character.
  • [R only:] fixed if the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.
559 questions
123
votes
6 answers

Split delimited strings in a column and insert as new rows

I have a data frame as follow: +-----+-------+ | V1 | V2 | +-----+-------+ | 1 | a,b,c | | 2 | a,c | | 3 | b,d | | 4 | e,f | | . | . | +-----+-------+ Each of the alphabet is a character separated by comma. I would like to…
Boxuan
  • 4,147
  • 5
  • 31
  • 66
60
votes
12 answers

Chopping a string into a vector of fixed width character elements

I have an object containing a text string: x <- "xxyyxyxy" and I want to split that into a vector with each element containing two letters: [1] "xx" "yy" "xy" "xy" It seems like the strsplit should be my ticket, but since I have no regular…
JD Long
  • 55,115
  • 51
  • 188
  • 278
35
votes
3 answers

How to use the strsplit function with a period

I would like to split the following string by its periods. I tried strsplit() with "." in the split argument, but did not get the result I want. s <- "I.want.to.split" strsplit(s, ".") [[1]] [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" The…
user3022875
  • 7,420
  • 21
  • 80
  • 145
32
votes
2 answers

Split a string by any number of spaces

I have the following string: [1] "10012 ---- ---- ---- ---- CAB UNCH CAB" I want to split this string by the gaps, but the gaps have a variable number of spaces. Is there a way to use strsplit()…
Stu
  • 1,249
  • 3
  • 15
  • 30
32
votes
1 answer

Non character argument in R string split function (strsplit)

This works x <- "0.466:1.187:2.216:1.196" y <- as.numeric(unlist(strsplit(x, ":"))) Values of blat$LRwAvg all look like X above but this doesn't work for (i in 1:50){ y <- as.numeric(unlist(strsplit(blat$LRwAvg[i], "\\:"))) …
AWE
  • 3,748
  • 8
  • 30
  • 38
31
votes
10 answers

How to avoid a loop in R: selecting items from a list

I could solve this using loops, but I am trying think in vectors so my code will be more R-esque. I have a list of names. The format is firstname_lastname. I want to get out of this list a separate list with only the first names. I can't seem to…
JD Long
  • 55,115
  • 51
  • 188
  • 278
27
votes
3 answers

Why does strsplit use positive lookahead and lookbehind assertion matches differently?

Common sense and a sanity-check using gregexpr() indicate that the look-behind and look-ahead assertions below should each match at exactly one location in testString: testString <- "text XX text" BB <- "(?<= XX )" FF <- "(?= XX…
Josh O'Brien
  • 148,908
  • 25
  • 332
  • 435
23
votes
5 answers

strsplit on first instance

I would like to write a strsplit command that grabs the first ")" and splits the string. For example: f("12)34)56") "12" "34)56" I have read over several other related regex SO questions but I am afraid I am not able to make heads or tails of this.…
Francis Smart
  • 3,316
  • 5
  • 27
  • 54
22
votes
5 answers

Create new column with dplyr mutate and substring of existing column

I have a dataframe with a column of strings and want to extract substrings of those into a new column. Here is some sample code and data showing I want to take the string after the final underscore character in the id column in order to create a…
PM.
  • 494
  • 1
  • 7
  • 13
19
votes
3 answers

using strsplit and subset in dplyr and mutate

I have a data table with one string column. I'd like to create another column that is a subset of this column using strsplit. dat <- data.table(labels=c('a_1','b_2','c_3','d_4')) The output I want is label sub_label a_1 a b_2 b c_3 …
chungkim271
  • 779
  • 1
  • 5
  • 19
18
votes
3 answers

How should I split and retain elements using strsplit?

What a strsplit function in R does is, match and delete a given regular expression to split the rest of the string into vectors. >strsplit("abc123def", "[0-9]+") [[1]] [1] "abc" "" "" "def" But how should I split the string the same way…
jackson
  • 603
  • 1
  • 5
  • 12
17
votes
5 answers

apply strsplit rowwise

Im trying to split a string on "." and create additional columns with the two strings before and after…
Misha
  • 2,986
  • 6
  • 35
  • 55
15
votes
1 answer

How to vectorize R strsplit?

When creating functions that use strsplit, vector inputs do not behave as desired, and sapply needs to be used. This is due to the list output that strsplit produces. Is there a way to vectorize the process - that is, the function produces the…
James
  • 61,307
  • 13
  • 140
  • 186
14
votes
3 answers

Using strsplit() in R, ignoring anything in parentheses

I'm trying to use strsplit() in R to break a string into pieces based on commas, but I don't want to split up anything in parentheses. I think the answer is a regex but I'm struggling to get the code right. So for example: x <- "This is it, isn't it…
John Smith
  • 141
  • 3
13
votes
1 answer

strsplit inconsistent with gregexpr

A comment on my answer to this question which should give the desired result using strsplit does not, even though it seems to correctly match the first and last commas in a character vector. This can be proved using gregexpr and regmatches. So why…
Simon O'Hanlon
  • 54,383
  • 9
  • 127
  • 173
1
2 3
37 38