I have a CVS file where I expect some values such as Y
or N
. Folks are adding comments or arbitrary entries such as NA?
that I want to remove:
Create,20055776,Y,,Y,Y,,Y,,NA?,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,,
Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,,
Create,20055779,,Y,,,,,,,,Y,,,NA ?,,,Y,,,,,,TBD,,,,,,,,,
I can use gsub
to remove things that I am anticipating such as:
$ cat test.csv | awk '{gsub("NA\\?", ""); gsub("NA \\?",""); gsub("TBD", ""); print}'
Create,20055776,Y,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,
Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,,
Create,20055779,,Y,,,,,,,,Y,,,,,,Y,,,,,,,,,,,,,,,
Yet that will break if someone adds a new comment. I am looking for a regex to generalise the match as "not Y".
I tried some negative look arounds but couldn't get it to work on the awk that I have which is GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.1, GNU MP 6.1.2)
. Thanks in advance!