0

I have a mock-up dataframe representing some of the confidential data I have and it looks like this:

Name                      Value
1. AaaaBaCCCaaa.x         1
2. AbbAbbKalllNBN.y       2
3. CCCdddEfffFg.x         8
4. ZZZtTThGGtGGGG.y       1
...
9. AAAHHHhhhhIIIIII.x     2
10. RRRRmmmmJJJJJJJ.y     3
11. MMMMMnnnnNNNNrrrr.x   4
...

What's important to notice here is that the Name variable contains ordinal numbers (e.g. 1. 2., 10.) at the beginning of the string and either .x or .y at the end of the string. Also, length of the Name variable is not the same in each row.

How can I remove the number from the beginning of the each string in the Name variable along with the period and the space that come after it? It's very important for me to get rid of them because I need to use the separate function on this data afterwards to separate into x and y from the end of the string. If I will still have that period after the number on the beginning of the string, separate will fail.

I wanted to use substr but I didn't know how to do it since, for example, 10. is longer than 9. and I don't know which values I would put into the start and stop arguments.

J. Doe
  • 786
  • 4
  • 14
  • 3
    `sub("^\\d+\\. ", "", DF$Name)` – jogo Sep 07 '20 at 11:28
  • @jogo Works, please add it as an answer so I can accept it – J. Doe Sep 07 '20 at 11:30
  • 3
    It is too basic to put it as an answer. https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean https://stackoverflow.com/questions/4736/learning-regular-expressions – jogo Sep 07 '20 at 11:31
  • `sub("^\\d+\\.(\\s+)?", "", DF$Name)` - in case you occasionally have more than one or no space. – Feakster Sep 07 '20 at 12:52

0 Answers0