I need to extract numbers prior to their respective units from a string. Unfortunately the inputs sometimes vary and this is giving me trouble.
Sample data:
df <- data.frame(id = c(1, 2, 3, 4),
targets = c("1800 kcal 75 g", "2000kcal 80g", "1900 kcal,87g", "2035kcal,80g"))
> df
id targets
1 1 1800 kcal 75 g
2 2 2000kcal 80g
3 3 1900 kcal,87g
4 4 2035kcal,80g
Desired output:
df <- data.frame(id = c(1, 2, 3, 4),
targets = c("1800 kcal 75 g", "2000kcal 80g", "1900 kcal,87g", "2035kcal,80g"),
kcal_target = c("1800", "2000", "1900", "2035"),
protein_target = c("75", "80", "87", "80"))
> df
id targets kcal_target protein_target
1 1 1800 kcal 75 g 1800 75
2 2 2000kcal 80g 2000 80
3 3 1900 kcal,87g 1900 87
4 4 2035kcal,80g 2035 80
I got as far as this but it is breaking down with spaces between the numbers and unit keyword and a comma after the number keyword.
df <- df %>%
mutate(calorie_target = str_extract_all(targets, regex("\\d+(?=kcal)|\\d+(?=kcal,)"))) %>%
mutate(protein_target = str_extract_all(targets, regex("\\d+(?=g)")))
> df
id targets calorie_target protein_target
1 1 1800 kcal 75 g
2 2 2000kcal 80g 2000 80
3 3 1900 kcal,87g 87
4 4 2035kcal,80g 2035 80
edit: removed portion of code I'm not trying to capture