Questions tagged [strsplit]

strsplit is a function in R and MATLAB which splits the elements of a character vector around a given delimiter.

strsplit is a function in R (documentation) and MATLAB (documentation), which splits the elements of a character vector into substrings:

# R:  
strsplit(x, split, fixed=FALSE)
% MATLAB
strsplit(x, split);

Splits a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list (R) or cell array (MATLAB), where each list item corresponds to an element of x that has been split.

  • x a character string or vector of character strings to split.
  • split the character string to split x.
    In R, if split is an empty string (""), then x is split between every character.
  • [R only:] fixed if the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.
559 questions
7
votes
4 answers

How to split a string on first number only

So i have a dataset with street adresses, they are formatted very differently. For example: d <- c("street1234", "Street 423", "Long Street 12-14", "Road 18A", "Road 12 - 15", "Road 1/2") From this I want to create two columns. 1. X: with the…
Jesse
  • 93
  • 1
  • 6
7
votes
4 answers

R Split string and keep substrings righthand of match?

How to do this stringsplit() in R? Stop splitting when no first names seperated by dashes remain. Keep right hand side substring as given in results. a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz") # result: c("tim…
Kay
  • 2,454
  • 5
  • 28
  • 45
7
votes
3 answers

R: how to display the first n characters from a string of words

I have the following string: Getty <- "Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal." I want to display the first 10…
mapleleaf
  • 608
  • 3
  • 7
  • 14
7
votes
7 answers

Use strsplit to get last character in r

I have a file of baby names that I am reading in and then trying to get the last character in the baby name. For example, the file looks like.. Name Sex Anna F Michael M David M Sarah F I read this in using sourcenames =…
dataCruncher02
  • 361
  • 2
  • 5
  • 13
7
votes
2 answers

R: splitting a string between two characters using strsplit()

Let's say I have the following string: s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705" I would like to recover the strings between ";" and "=" to get the following output: [1] "MIMAT0027618" "MIMAT0027618" …
biohazard
  • 1,867
  • 7
  • 23
  • 38
7
votes
3 answers

Speed up `strsplit` when possible output are known

I have a large data frame with a factor column that I need to divide into three factor columns by splitting up the factor names by a delimiter. Here is my current approach, which is very slow with a large data frame (sometimes several million…
Noam Ross
  • 4,809
  • 4
  • 20
  • 36
6
votes
5 answers

R how to create columns/features based on existing data

I have a dataframe df: userID Score Task_Alpha Task_Beta Task_Charlie Task_Delta 3108 -8.00 Easy Easy Easy Easy 3207 3.00 Hard Easy Match Match 3350 5.78 Hard Easy Hard …
Sandy
  • 511
  • 2
  • 11
6
votes
3 answers

Split string with repeated delimiters

I have a string in R in the following form: example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3") And I wish to obtain two columns: namei1 namej1 | surname1 name2 | surnamei2 surnamej2 name3 |…
anespinosa
  • 73
  • 3
6
votes
3 answers

Split column by multiple delimiters, keeping delimiters

How can I split a character column into 3 columns using %, -, and + as the possible delimiters, keeping the delimiters in the new columns? Example Data: data <- data.table(x=c("92.1%+100-200","90.4%-1000+200", "92.8%-200+100",…
Neal Barsch
  • 1,781
  • 7
  • 29
6
votes
2 answers

Split data.frame into groups by column name

I'm new to R. I have a data frame with column names of such type: file_001 file_002 block_001 block_002 red_001 red_002 ....etc' 0.05 0.2 0.4 0.006 0.05 0.3 0.01 0.87 0.56 0.4 …
Keity
  • 113
  • 8
6
votes
3 answers

Count characters of a section of a string

I have this df: dput(df) structure(list(URLs = c("http://bursesvp.ro//portal/user/_/Banco_Votorantim_Cartoes/0-7f2f5cb67f1-22918b.html", "http://46.165.216.78/.CartoesVotorantim/Usuarios/Cadastro/BV6102891782/",…
Sotos
  • 44,023
  • 5
  • 28
  • 55
6
votes
2 answers

String split on a number word pattern

I have a data frame that looks like this: V1 V2 peanut butter sandwich 2 slices of bread 1 tablespoon peanut butter What I'm aiming to get is: V1 V2 peanut butter sandwich 2 slices of bread peanut…
yokota
  • 937
  • 10
  • 23
6
votes
1 answer

Use regular expressions in R strsplit

I would like to split "2015-05-13T20:41:29+0000" into 2015-05 and 20:41:29+0000. I tried the following: > strsplit("2015-05-13T20:41:29+0000",split="-\\d\\dT",fixed=TRUE) [[1]] [1] "2015-05-13T20:41:29+0000" but the pattern is not matched.…
Leonardo
  • 297
  • 4
  • 11
6
votes
4 answers

R: How to shorten data frame values to first character

I would like to shorten the values of one column of my data.frame. Right now, each value consists of many letters, such as df$col1 [1] AHG ALK OPH BCZ LKH QRQ AAA VYY what I need is only the first letter: df$col1 [1] A A O …
PikkuKatja
  • 1,021
  • 2
  • 11
  • 19
6
votes
4 answers

Substituting the results of a calculation

I'm munging data, specifically, I've opened this pdf http://pubs.acs.org/doi/suppl/10.1021/ja105035r/suppl_file/ja105035r_si_001.pdf and scraped the data from table s4, 1a 1b 1a 1b 1 5.27 4.76 5.09 4.75 2 2.47 2.74 2.77 2.80 4 1.14 1.38 1.12…
user1945827
  • 1,338
  • 2
  • 13
  • 27
1 2
3
37 38