I have a bash script that extracts logs from a file between two timestamps. However, as the files get bigger (more than 2 GB, up to 10GB) it takes considerably longer to complete (more than 20 mins)
My log structure looks like this:
087B0037 08AD0056 03/09 02:40:40 [MMS:Main,INF] MMS state changed
087B0037 096100BE 03/09 02:40:41 [Navigation,INF] CDDClient Initialize...
EndeavourDriver: 03/09/2017 02:40:42 :
00400004 047B0012 EndeavourDriver: 71 [SDIO87871]:
087B0037 0BE10002 03/10 06:40:40 [NNS:NNS,INF] Initializing NNS thread id 0x0BE10002...
087B0037 08AD0056 03/10 06:40:40 Initialized state: BITServer
My script uses the following command:
grep -a -A 1000000 "03/09" fileName.txt | grep -a -B 1000000 "03/10"
But it takes too long. If I add time (e.g. "03/09 02:") is a faster but log is not always running so some time values might be missing. The date values are always in the 3rd column so I tried using awk:
awk '$3 >= "03/09" && $3 <= "03/10"' fileName.txt
But that does not collect the following lines:
EndeavourDriver: 03/09/2017 02:40:42 :
00400004 047B0012 EndeavourDriver: 71 [SDIO87871]:
I'm not too familiar with awk, sed and grep so any suggestions be would be appreciated. Perhaps something in a different language like python would be better? Thanks