Remove last numbers in rows in R

Question

This is a fragment of my data.frame:

MC0_1_N.1   a   c
MC0_1_N.2   d   b
MC0_5_N.1   b   c
MC0_5_N.2   c   d
MC0_5_N.3   a   b
MC0_5_N.4   e   f
MC0_5_N.5   a   h
MC0_5_N.6   k   m
MC0_5_N.7   s   z
MC0_5_N.8   o   p
MC0_5_N.9   p   r
MC0_5_N.10  r   t

Is there any way to remove the last numbers of rows (1,2,3,4...) which formed after creating data.frame?

https://stackoverflow.com/questions/4736/learning-regular-expressions — jogo, Jun 06 '17 at 13:30

score 0 · Answer 1 · answered Jun 06 '17 at 13:27

We can use sub to match the dot (\\. -escape as it is a metacharacter meaning any character) followed by one or more numbers (\\d+) at the end ($) of the string and replace by blank ("")

df1$col1 <- sub("\\.\\d+$", "", df1$col1)
df1$col1
#[1] "MC0_1_N" "MC0_1_N" "MC0_5_N" "MC0_5_N" "MC0_5_N" "MC0_5_N" "MC0_5_N"
#[8] "MC0_5_N" "MC0_5_N" "MC0_5_N" "MC0_5_N" "MC0_5_N"

NOTE: Here we assumed it is the first column. If it is row.names then replace df1$col1 with row.names(df1)

i.e.

row.names(df1) <- sub("\\.\\d+$", "", row.names(df1))

score 0 · Accepted Answer · answered Jun 06 '17 at 13:39

For the given example this would work quite fine:

df1$col1 <- strtrim(df1$col1,7)

Of course this only works when

column variables are strings
there are no items like MC0_10_N.1 in the dataframe (mind the two digit 10 in the middle)

Note, this also removes the . before the number.

score 0 · Answer 3 · answered Jun 06 '17 at 14:32

Given df and col1 your dataframe and name of the column you wish to alter:

unlist(lapply(stringi::stri_split(str = df$col1,regex = "\\."),function(x) x[[1]]))

resulting in:

MC0_1_N
MC0_1_N
MC0_5_N
MC0_5_N
MC0_5_N
MC0_5_N
MC0_5_N
MC0_5_N
MC0_5_N
MC0_5_N
MC0_5_N
MC0_5_N

Remove last numbers in rows in R

3 Answers3