0

If a have a dataframe with a column with a pattern that is: a row with a string with a name on it followed by other rows containing names and a sequence of numbers. This is repeated all over the dataframe.

I want to create a new column base on the condition that if it founds a row with a string that starts with the word "CANTON" (and without a number), copy the string without the first word (CANTON) trough all the next rows of the new column until it appears another row with a string that starts with the word "CANTON" where it has to take the new string, and copy the new last word in the new column.

An example of the dataframe is the next one:

datos <- data.frame(sitio = c("CANTON SAN JOSE", "01 Carmen", "02 Merced", 
      "03 Hospital", "04 Catedral", "05 San Franscisco", 
      "CANTON ESCAZU", "01 Escazu", "02 San Antonio", "03 San Rafael" ),
      area = c(44.62, 1.49, 2.29, 3.38, 2.31, 2.85, 34.49, 4.38,
               16.99, 13.22))
datos

And the expected result would be:

expected_result <-data.frame(
      sitio = c("CANTON SAN JOSE", "01 Carmen", "02 Merced",
                "03 Hospital", "04 Catedral", "05 San Franscisco", 
                "CANTON ESCAZU", "01 Escazu", "02 San Antonio", 
                "03 San Rafael" ),
      area = c(44.62, 1.49, 2.29, 3.38, 2.31, 2.85, 34.49, 4.38,
               16.99, 13.22),
      canton = c("SAN JOSE", "SAN JOSE", "SAN JOSE", "SAN JOSE", 
                 "SAN JOSE", "SAN JOSE", "ESCAZU", "ESCAZU", "ESCAZU",
                 "ESCAZU"))

I have tried to do many for loops, subsets and joining dataframes without success. I cannot make clear this pattern in a instruction in R.

Thanks for helping!

ronnyhdez
  • 21
  • 5

1 Answers1

0

Hope this works for you data:

x <- gsub('^CANTON ', '', datos$sitio)
x[!grepl('^CANTON ', datos$sitio)] <- NA
datos$canton <- ave(x, cumsum(!is.na(x)), FUN = function(xx) xx[1])

# > datos
#                sitio  area   canton
# 1    CANTON SAN JOSE 44.62 SAN JOSE
# 2          01 Carmen  1.49 SAN JOSE
# 3          02 Merced  2.29 SAN JOSE
# 4        03 Hospital  3.38 SAN JOSE
# 5        04 Catedral  2.31 SAN JOSE
# 6  05 San Franscisco  2.85 SAN JOSE
# 7      CANTON ESCAZU 34.49   ESCAZU
# 8          01 Escazu  4.38   ESCAZU
# 9     02 San Antonio 16.99   ESCAZU
# 10     03 San Rafael 13.22   ESCAZU
mt1022
  • 15,027
  • 4
  • 36
  • 59