0

I use following script to combine all the csv file in csvFolder with these condition

  1. delete the first line (i use 'grep -v "words"' to achieve it)
  2. extract the 18-21 column
  3. output to folder/test.csv

it take a lot of time to run it when the csv file is la. Please advise me for better performance.

i am quite new in writing bat file, please explain more. THANKS!!

this is the bat script I used.

for /f "tokens=18-21 delims=," %%a in ('cat csvFolder/*.csv') do echo %%a,%%b,%%c,%%d | grep -v "words" >> folder/test.csv

this is the sample csv file.

"(PDH-CSV 4.0) (GMT words g)(0)","\\s-s\s(c)\% t g","\\s-s\s(I)\% t g","\\s-s\s(Iace)\% t g","\\s-s\s(Nr)\% t g","\\s-s\s(Rface)\% t g","\\s-s\s(c)\y u","\\s-s\s(I)\y u","\\s-s\s(Iace)\y u","\\s-s\s(Nr)\y u","\\s-s\s(Rface)\y u","\\s-s\s(c)\p Bytes","\\s-s\s(I)\p Bytes","\\s-s\s(Iace)\p Bytes","\\s-s\s(Nr)\p Bytes","\\s-s\s(Rface)\p Bytes","\\s-s\s(c)\q Set","\\s-s\s(I)\q Set","\\s-s\s(Iace)\q Set","\\s-s\s(Nr)\q Set","\\s-s\s(Rface)\q Set","\\s-s\Memory\% Committed Bytes In Use","\\s-s\Memory\Available MBytes","\\s-s\t(0)\% j g","\\s-s\t(1)\% j g","\\s-s\t(2)\% j g","\\s-s\t(3)\% j g","\\s-s\t(4)\% j g","\\s-s\t(5)\% j g","\\s-s\t(6)\% j g","\\s-s\t(7)\% j g","\\s-s\t(8)\% j g","\\s-s\t(9)\% j g","\\s-s\t(10)\% j g","\\s-s\t(11)\% j g","\\s-s\t(12)\% j g","\\s-s\t(13)\% j g","\\s-s\t(14)\% j g","\\s-s\t(15)\% j g","\\s-s\t(16)\% j g","\\s-s\t(17)\% j g","\\s-s\t(18)\% j g","\\s-s\t(19)\% j g","\\s-s\t(20)\% j g","\\s-s\t(21)\% j g","\\s-s\t(22)\% j g","\\s-s\t(23)\% j g","\\s-s\t(24)\% j g","\\s-s\t(25)\% j g","\\s-s\t(26)\% j g","\\s-s\t(27)\% j g","\\s-s\t(28)\% j g","\\s-s\t(29)\% j g","\\s-s\t(30)\% j g","\\s-s\t(31)\% j g","\\s-s\t(32)\% j g","\\s-s\t(33)\% j g","\\s-s\t(34)\% j g","\\s-s\t(35)\% j g","\\s-s\t(36)\% j g","\\s-s\t(37)\% j g","\\s-s\t(38)\% j g","\\s-s\t(39)\% j g","\\s-s\t(40)\% j g","\\s-s\t(41)\% j g","\\s-s\t(42)\% j g","\\s-s\t(43)\% j g","\\s-s\t(44)\% j g","\\s-s\t(45)\% j g","\\s-s\t(46)\% j g","\\s-s\t(47)\% j g","\\s-s\t(_Total)\% j g","\\s-s\t(0)\% t g","\\s-s\t(1)\% t g","\\s-s\t(2)\% t g","\\s-s\t(3)\% t g","\\s-s\t(4)\% t g","\\s-s\t(5)\% t g","\\s-s\t(6)\% t g","\\s-s\t(7)\% t g","\\s-s\t(8)\% t g","\\s-s\t(9)\% t g","\\s-s\t(10)\% t g","\\s-s\t(11)\% t g","\\s-s\t(12)\% t g","\\s-s\t(13)\% t g","\\s-s\t(14)\% t g","\\s-s\t(15)\% t g","\\s-s\t(16)\% t g","\\s-s\t(17)\% t g","\\s-s\t(18)\% t g","\\s-s\t(19)\% t g","\\s-s\t(20)\% t g","\\s-s\t(21)\% t g","\\s-s\t(22)\% t g","\\s-s\t(23)\% t g","\\s-s\t(24)\% t g","\\s-s\t(25)\% t g","\\s-s\t(26)\% t g","\\s-s\t(27)\% t g","\\s-s\t(28)\% t g","\\s-s\t(29)\% t g","\\s-s\t(30)\% t g","\\s-s\t(31)\% t g","\\s-s\t(32)\% t g","\\s-s\t(33)\% t g","\\s-s\t(34)\% t g","\\s-s\t(35)\% t g","\\s-s\t(36)\% t g","\\s-s\t(37)\% t g","\\s-s\t(38)\% t g","\\s-s\t(39)\% t g","\\s-s\t(40)\% t g","\\s-s\t(41)\% t g","\\s-s\t(42)\% t g","\\s-s\t(43)\% t g","\\s-s\t(44)\% t g","\\s-s\t(45)\% t g","\\s-s\t(46)\% t g","\\s-s\t(47)\% t g","\\s-s\t(_TL)\% t g"
"02/04/2014 02:25:19.850","0","0","173.29978448754693","0","0","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7012352","1668734976","282546176","641404928","98045952","43.227033512749721","14578","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","100","100","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","100","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","100","100","100","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","42.233405170817683","49.454183237426584"
"02/04/2014 02:25:20.839","0","0","115.12529196882903","0","6.3082351763741924","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7012352","1668734976","282546176","641404928","98045952","43.226920632400869","14578","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.5770587940935481","0","0","0","1.5770587940935481","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0.065710361866962802","0","2.2223547662000076","0","0","0","0","0","0","0","0","0","0","8.5305899425741956","0","0","0","0","0","0.64529597210645218","5.3764723543870963","5.3764723543870963","0","0","0","10.107648736667752","19.57000150122904","32.18647185397743","19.57000150122904","13.26176632485484","17.992942707135484","0","8.5305899425741956","2.2223547662000076","0.64529597210645218","3.7994135602935519","0.64529597210645218","0","2.2223547662000076","0.64529597210645218","3.7994135602935519","3.7994135602935519","2.2223547662000076","8.5305899425741956","6.9535311484806517","3.7994135602935519","6.9535311484806517","8.5305899425741956","8.5305899425741956","3.8979791030939959"
"02/04/2014 02:25:21.845","0","1.550550710336521","103.8868975925469","0","21.707709944711297","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7012352","1668734976","282546176","641404928","98045952","43.226983013646283","14581","0","1.550550710336521","1.550550710336521","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.550550710336521","0","0","0","1.550550710336521","0","0","0","0","0","0","0","0","1.550550710336521","0","0","1.550550710336521","0","0","0","1.550550710336521","0","0","0.2261205291001715","0.76475453846265307","5.4164066694722184","3.8658559591357","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","14.719710931491347","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","0.76475453846265307","3.8658559591357","30.22521803485655","3.8658559591357","0.76475453846265307","0.76475453846265307","0.76475453846265307","17.820812352164385","24.023015193510467","13.169160221154819","16.270261641827865","16.270261641827865","19.371363062500901","2.315305248799171","3.8658559591357","5.4164066694722184","5.4164066694722184","3.8658559591357","0.76475453846265307","5.4164066694722184","3.8658559591357","3.8658559591357","3.8658559591357","3.8658559591357","3.8658559591357","5.4164066694722184","3.8658559591357","2.315305248799171","5.4164066694722184","8.5175080901452649","11.618609510818301","5.5456184003865978"
"02/04/2014 02:25:22.853","0","0","92.848453249913092","0","12.379793766655082","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7012352","1668734976","282546176","641404928","98045952","43.227161245776045","14579","0","1.5474742208318852","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.5474742208318852","0","0","0","1.5474742208318852","0","0","0","1.5474742208318852","0","0","0","0","0","0","0.12895535843241074","0","7.1515467500869008","0.96164986675935094","0","0","0","0.96164986675935094","0","0","0","0","0","7.1515467500869008","2.5091240875912413","0","0","0","0","0","11.79396941258255","0.96164986675935094","0","0","0","14.888917854246319","11.79396941258255","4.056598308423121","16.43639207507821","17.98386629591009","11.79396941258255","2.5091240875912413","2.5091240875912413","0.96164986675935094","0.96164986675935094","0","0","0","0.96164986675935094","2.5091240875912413","0","2.5091240875912413","0.96164986675935094","4.056598308423121","0.96164986675935094","2.5091240875912413","4.056598308423121","5.6040725292550109","8.6990209709187809","2.8315224033152231"
"02/04/2014 02:25:23.848","0","1.5692552674079339","116.12488978818712","0","9.4155316044476045","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7012352","1668734976","282546176","641404928","98045952","43.227161245776045","14580","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.1369181533001704","0","0","0","0","0","0","0","0","0","0","15.260215559971568","0","1.1369181533001704","0","0","0","1.1369181533001704","8.9831944903398409","0","0","0","0","12.121705025155705","18.398726094787442","10.552449757747773","15.260215559971568","16.82947082737951","16.82947082737951","1.1369181533001704","1.1369181533001704","2.7061734207081023","7.4139392229318979","5.8446839555239656","7.4139392229318979","2.7061734207081023","2.7061734207081023","8.9831944903398409","4.2754286881160342","5.8446839555239656","1.1369181533001704","0","2.7061734207081023","7.4139392229318979","12.121705025155705","5.8446839555239656","7.4139392229318979","4.079262977833908"
"02/04/2014 02:25:24.844","0","0","108.05231529511674","0","17.225731423859191","122","3357","5634","3279","2933","51122176","1887068160","377069568","1403805696","141160448","7012352","1668734976","282546176","641404928","98045952","43.22384909869794","14580","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1.5659755839871992","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0.032624282203052524","1.3435382088064496","6.0414649607680504","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","13.871342880704042","6.0414649607680504","4.4754893767808497","1.3435382088064496","1.3435382088064496","1.3435382088064496","6.0414649607680504","13.871342880704042","1.3435382088064496","1.3435382088064496","1.3435382088064496","1.3435382088064496","10.73939171272964","15.437318464691241","17.003294048678441","18.569269632665641","13.871342880704042","15.437318464691241","6.0414649607680504","4.4754893767808497","4.4754893767808497","4.4754893767808497","2.9095137927936499","2.9095137927936499","4.4754893767808497","2.9095137927936499","2.9095137927936499","9.1734161287424403","6.0414649607680504","2.9095137927936499","2.9095137927936499","7.6074405447552396","4.4754893767808497","4.4754893767808497","6.0414649607680504","15.437318464691241","5.4216035989100515"
CK WONG
  • 51
  • 7

2 Answers2

1

If i have to do it, the obvious solution is awk.

awk -F , -v OFS=, "NR>1{print $18,$19,$20,$21}" "csvfolder\*.csv" > "folder\output.csv"

Once this is said, let's solve it with batch. But anyway this will be slow.

When for /f is used to process the output of a command the behaviour is to first retrieve all the data and then start working over it. And, when large volumes of data are involved, this is really slow.

This behaviour is less evident when the for /f command processes a file on disk. The file IS completly readed in memory to start to work, but load times are a lot faster.

It is counterintuitive, but when large files are handled, it is faster to generate an intermediate temporary file with only the required lines in it and then process this file with for. And, of course, it will be a lot faster if, at least, the intermediate file is located on local hard disk.

set "tempFile=%temp%\csv.tmp"
for %%z in (csvFolder\*.csv) do (
    echo %%z
    findstr /v "words" "%%~fz" > "%tempFile%"
    (for /f "usebackq tokens=18-21 delims=," %%a in ("%tempFile%") do echo %%a,%%b,%%c,%%d) >> "folder\test.csv"
)
del "%tempFile%" >nul 2>nul
MC ND
  • 65,671
  • 6
  • 67
  • 106
  • It was found too the FOR /F has a size limit of somewhere around 3GB (maybe the line length of the content is part of it, I didn't check). The OP is processing a 4GB file. – foxidrive Feb 13 '14 at 12:41
  • 1
    You are correct that use of temporary files can improve performance, but FOR /F does **NOT** immediately iterate lines when reading from a file. FOR /F always buffers all content before iterating any lines - this is true for files, command output, and strings. This is why FOR /F is slow with large files. It also means FOR /F will always iterate the original content of a file, even if the DO code modifies the content. – dbenham Feb 13 '14 at 12:45
  • @foxidrive, the OP is processing over 4GB of csv files. That is the reason for the `for %%z` in the code to process file by file instead of directly concatenating them in only one and then processing it. – MC ND Feb 13 '14 at 12:46
  • Ahh yes, 4GB of csv files. – foxidrive Feb 13 '14 at 12:55
  • @dbenham, I've never think on it. I had no reason until now. In this case i was testing with a 280MB csv file. I had to run it under procmon to see the ReadFile call for ALL the file happens in 0.11 seconds. Testing from a network share the difference is evident. As usually, you are right. Thank you. – MC ND Feb 13 '14 at 13:03
  • I removed my answer as the regexp I used must have been wrong due to greedy matching at the end. Awk or gawk does this well. – foxidrive Feb 13 '14 at 13:11
  • @foxidrive - I got REPL.BAT to work well with the correct regex. See [my answer](http://stackoverflow.com/a/21756442/1012053). It makes heavy use of non-greedy matching, as well as non-capturing sub-expressions. – dbenham Feb 13 '14 at 14:11
  • @MC ND your help is great!!! i want to ask one more thing. can i change [ (csvFolder\\*.csv) ] to [(csvFolder\\*\\*.csv)]? It seems not work if i change it. – CK WONG Feb 14 '14 at 08:27
  • @user2702041, for awk `findstr /s /v "words" "csvFolder\*.csv" | awk -F , -v OFS=, "{print $18,$19,$20,$21}" > "folder\output.csv"` For `for` command, change to `for /r "csvFolder" %%z in (*.csv) do (` – MC ND Feb 14 '14 at 09:33
1

awk or sed sound like your best bet.

But I did manage to find a pure native Windows script solution that performs reasonably well. It uses my REPL.BAT hybrid JScript/batch utility that performs a regex search and replace on lines of stdin and writes the result to stdout. Full documentation is embedded within the script.

Assuming REPL.BAT is somewhere within your PATH, then the following command line one liner should do the trick:

findstr /v words *.csv | repl ".*?:(?:.*?,){17}((?:.*?,){3}.*?),.*" $1 >folder\output.csv

I tested the above on 20 CSV files totaling a bit over 1GB, and it completed successfully in 80 seconds. Time should be linear with total file size.

Note that the initial .*?: expression in the regex matches the filename prefix that findstr inserts before each line.

Note also that it is critical that every source CSV file have a linefeed at the end of the last line in the file. If not, then the last line of a file will merge with the first line of the next file.

Community
  • 1
  • 1
dbenham
  • 119,153
  • 25
  • 226
  • 353