I have a long files with identifiers, e.g.
A
A
B
C
A
C
I would like to do a group by, count and sort operation to get a file with:
A 3
C 2
B 1
How can I achieve it in a CMD script?
I have a long files with identifiers, e.g.
A
A
B
C
A
C
I would like to do a group by, count and sort operation to get a file with:
A 3
C 2
B 1
How can I achieve it in a CMD script?
Global edit - All code has been modified to allow -
in identifiers. Identifiers must not contain !
Assuming the identifiers do not contain =
or $
or !
, and the identifiers are NOT case sensitive, the following lists the counts sorted by identifier.
@echo off
setlocal enableDelayedExpansion
:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="
:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
set /a "cnt=!$%%A!+1"
set "$%%A=!cnt!"
)
:: Write the results to a new file
>output.txt (
for /f "tokens=1,2 delims=$=" %%A in ('set $') do echo %%A %%B
)
:: Show the result
type output.txt
The prefix can be adapted as needed. But this technique cannot be used if the identifiers are case sensitive.
EDIT
Here is a version that sorts the result by count descending
@echo off
setlocal enableDelayedExpansion
:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="
:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
set /a "cnt=!$%%A!+1"
set "$%%A=!cnt!"
)
:: Write a temp file with zero padded counts prefixed to the left.
>temp.txt (
for /f "tokens=1,2 delims=$=" %%A in ('set $') do (
set "cnt=000000000000%%B"
echo !cnt:~-12!=%%A=%%B
)
)
:: Sort and write the results to a new file
>output.txt (
for /f "tokens=2,3 delims=$=" %%A in ('sort /r temp.txt') do echo %%A %%B
)
del "temp.txt"
:: Show the result
type output.txt
EDIT 2
And here is another option sorted by count descending that assumes REPL.BAT is somewhere within your PATH
@echo off
setlocal enableDelayedExpansion
:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="
:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
set /a "cnt=!$%%A!+1"
set "$%%A=!cnt!"
)
:: Sort result by count descending and write to output file
set $|repl "\$(.*)=(.*)" "000000000000$2=$1 $2"|repl ".*(.{12}=.*)" $1|sort /r|repl ".{13}(.*)" $1 >output.txt
:: Show the result
type output.txt