I wish to write a regex pattern to extract the address or location from a string of narration for the data of 350k records.
txn_add <- data.frame(NARRATION=c("$ $ $ +YBL PATAUDI CHOWK \ $",
"$ $ -ATM CASH 83181 + MAIN BHAWANA ROAD NEW DELHI $",
"$ $ [5839/P1TNDE06/+RAGHUBARPURA $",
"$ MAXIMUMOUTFITS PRIVATE LIMITED } $ ATDELHIIN- $ $ /5631 $",
"$ ATM CASH-N4077800-+SPRINGFIELDCOLONYFFAR IDABADHRIN-04/06/18 $ /5631 ( $ $ VERIFICATION $"))
I ran the following regex pattern:
gsub(".*[:|+]([^.]+)[$|\\|\\/].*", "\\1", txn_add$NARRATION)
And i got the output as :
[1] "YBL PATAUDI CHOWK "
[2] " MAIN BHAWANA ROAD NEW DELHI "
[3] "RAGHUBARPURA "
[4] "$ MAXIMUMOUTFITS PRIVATE LIMITED } $ ATDELHIIN- $ $ /5631 $"
[5] "SPRINGFIELDCOLONYFFAR IDABADHRIN-04/06/18 $ /5631 ( $ $ VERIFICATION "
This output is not correct as I have to implement some conditions: Address can start from :
1. '+'
2. '@'
3. ' AT '
4. ':'
5. <P|S><SBI><P|S> # EXACT TEXT PRECEEDED AND FOLLOWED BY PUNCTUATION OR SPACE
6. <NNN> FOLLOWED BY <P|S|A> # 3 NUMBERS FOLLOWED BY EITHER PUNCTUATION OR SPACE OR ALPHA
And End with :
1. -
2. /
3. $
4. \
5.<NNNNNNN> # Combination of numbers
CAN CONTAIN
Alphabets, numbers, dot (.), dash (-),space ( ), coma(,),underscore (_) brackets(()) at (@), hash (#) and(&) semi colon (;)
This is to extract the address from the transaction & Desired Output will be:
[1] "YBL PATAUDI CHOWK"
[2] "MAIN BHAWANA ROAD NEW DELHI "
[3] "RAGHUBARPURA "
[4] "DELHIIN"
[5] "SPRINGFIELDCOLONYFFAR IDABADHRIN"
I am not able to get the desired output. What can I try next?