I have a list of strings, some of which contain dollar figures. For example:
'$34232 foo \n bar'
is there an [r]
command that can return to me only the strings which contain dollar amounts in them?
Thank you!
I have a list of strings, some of which contain dollar figures. For example:
'$34232 foo \n bar'
is there an [r]
command that can return to me only the strings which contain dollar amounts in them?
Thank you!
Use \\$
to protect the $
which otherwise means "end of string":
grep("\\$[0-9]+",c("123","$567","abc $57","$abc"),value=TRUE)
This will select strings that contain a dollar sign followed by one or more digits (but not e.g. $abc
). grep
with value=FALSE
returns the indices. grepl
returns a logical vector. One R-specific point is that you need to specify \\$
, not just \$
(i.e. an additional backslash is required for protection): \$
will give you an "unrecognized escape" error.
@Cerbrus's answer, '\\$[0-9,.]+'
, will match slightly more broadly (e.g. it will match $456.89
or $367,245,100
). It will also match some implausible currency strings, e.g. $45.13.89
or $467.43,2,1
(i.e. commas should be allowed only for groupings of 3 digits in the dollars segment; there should be only one decimal point separating dollars and cents). Both of our answers will (incorrectly?) match $45abc
. If you're lucky, your data don't have contain any of these tricky possibilities. Getting this right in general is hard; the answer referred to in the comments ( What is "The Best" U.S. Currency RegEx? ) tries to do this, and as a result has significantly more complex answers, but could be useful if you adapt the answers to R by protecting $
appropriately.
Sure there is:
'\\$[0-9,.]+'
\\$ //Dollar sign
[0-9,.]+ // One or more numbers, dots, or comma's.