0

I have tried using cat /home/kgh/abc.CSV | awk -F, '$2 ~ /^[[:digit:]]+$/' . It is checking condition at column level.

cat /home/kgh/abc.CSV
1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
1,010116085144,125,125,124,123, 000,00,232,28,294522,077\82,041755,0074,0013
1,010116085149,125,125,124,123,^@000,00,232,28,294522,088,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

expected o/p

1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013   
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

I am not able to check at row level as cat /home/kgh/abc.CSV | awk -F, '$0 ~ /^[[:digit:]]+$/' . I cannot put this condition for other columns because it contains different number of columns in each row.I want to check only numeric data in my csv file otherwise I want to delete that row from my csv file.Is there any efficient way to do this ? Thanks.

Edit

I am worry why the below mentioned solution is not working with sed and grep too

root@aa:/home/kgh/# cat abc.CSV
1,010116084135,118,115,113,115,-368,09,225,28,294579,077,044677,0074,0013
1,010116084146,000,000,000,000,000,09,227,28,294536,077,044271,0074,0013
1,010116084151,122,121,118,119,-099,05,228,28,294547,077,044150,0074,0013
1,010116084156,121,117,117,116,-244,^@06,228,28,294557,077,044047,0074,0013
root@aa:/home/kgh# sed -n '/^[0-9, -]*$/p' abc.CSV
root@aa:/home/kgh# sed -rn '/^-?[0-9]+(, ?-?[0-9]+)*$/p' abc.CSV  
root@aa:/home/kgh# grep -v '[^0-9, -]' abc.CSV

What am I missing ?

Edited

file abc.CSV
abc.CSV: ASCII text, with CRLF line terminators

I opened a file to see CRLF line terminators in vi using :set list

1,010116084135,118,115,113,115,-368,09,225,28,294579,077,044677,0074,0013$
1,010116084146,000,000,000,000,000,09,227,28,294536,077,044271,0074,0013$
1,010116084151,122,121,118,119,-099,05,228,28,294547,077,044150,0074,0013$
1,010116084156,121,117,117,116,-244,^@06,228,28,294557,077,044047,0074,0013$

Hence no result with sed or grep.

I have removed CRLF line terminators now its working with grep and sed.

sed 's/'"$(printf '\015')"'//g' abc.CSV | sed '/[^0-9, -]/d'
1,010116084135,118,115,113,115,-368,09,225,28,294579,077,044677,0074,0013
1,010116084146,000,000,000,000,000,09,227,28,294536,077,044271,0074,0013
1,010116084151,122,121,118,119,-099,05,228,28,294547,077,044150,0074,0013
Aashu
  • 1,195
  • 1
  • 21
  • 38
  • 1
    I'm guessing your CSV file has DOS line terminators. Use `dos2unix` (or one of the many, many alternatives; google around) and try again, or update your regex to allow for this alien character. Plain old `grep` should be all you need, unless you are seeking to also validate e.g. the number of columns, or weed out invalid numbers like 1.2.3 or 0--0. – tripleee Jan 21 '16 at 07:12
  • Thank you so much for highlighting this :) – Aashu Jan 21 '16 at 08:58
  • Do you still have issues, or could you accept one of the proposed solutions, so that this question no longer comes up as unresolved? Thanks. – tripleee Jan 21 '16 at 09:34

5 Answers5

1

With GNU sed:

sed -n '/^[0-9, -]*$/p' abc.csv

If you want to edit your file "in place" add sed's option -i.

Output:

1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

See: The Stack Overflow Regular Expressions FAQ

Community
  • 1
  • 1
Cyrus
  • 69,405
  • 13
  • 65
  • 117
1

With grep:

grep -v '[^0-9, -]' abc.CSV
jyvet
  • 1,550
  • 9
  • 19
0

This prints all lines that consist of numbers, separated by commas and an optional space; the numbers are prepended by an optional unary minus.

$ sed -rn '/^-?[0-9]+(, ?-?[0-9]+)*$/p' infile
1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

Without extended regex (-r), the parentheses would have to be escaped, ? becomes \{0,1\} and + becomes \{1,\}.

Benjamin W.
  • 33,075
  • 16
  • 78
  • 86
0

Another method using grep: grep '^[0-9, -]*$' abc.csv

0

In addition to the sed print command, you can alternatively use the sed delete command to delete any lines that contain characters other than '0-9, -'. For example:

$sed '/[^0-9, -]/d' abc.csv
1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

note: to 'edit-in-place', you can use the '-i' option. Additionally, you can use the '-i.bak' option to have sed create abc.csv.bak containing the original file with the modified file in abc.csv.

David C. Rankin
  • 69,681
  • 6
  • 44
  • 72
  • with -i option, its getting ovewrite as Empty file,because I am not getting any o/p after sed '/[^0-9, -]/d' abc.csv – Aashu Jan 21 '16 at 07:14
  • What operating system are you using? I am using the exact same file and expression on Linux and it is working as indicated. Are you on Mac? – David C. Rankin Jan 21 '16 at 08:12