0

I have a long files with identifiers, e.g.

A
A
B
C
A
C

I would like to do a group by, count and sort operation to get a file with:

A 3
C 2
B 1

How can I achieve it in a CMD script?

Grzenio
  • 33,623
  • 43
  • 148
  • 226

1 Answers1

2

Global edit - All code has been modified to allow - in identifiers. Identifiers must not contain !

Assuming the identifiers do not contain = or $ or !, and the identifiers are NOT case sensitive, the following lists the counts sorted by identifier.

@echo off
setlocal enableDelayedExpansion

:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
  set /a "cnt=!$%%A!+1"
  set "$%%A=!cnt!"
)

:: Write the results to a new file
>output.txt (
  for /f "tokens=1,2 delims=$=" %%A in ('set $') do echo %%A %%B
)

:: Show the result
type output.txt

The prefix can be adapted as needed. But this technique cannot be used if the identifiers are case sensitive.

EDIT

Here is a version that sorts the result by count descending

@echo off
setlocal enableDelayedExpansion

:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
  set /a "cnt=!$%%A!+1"
  set "$%%A=!cnt!"
)

:: Write a temp file with zero padded counts prefixed to the left.
>temp.txt (
  for /f "tokens=1,2 delims=$=" %%A in ('set $') do (
    set "cnt=000000000000%%B"
    echo !cnt:~-12!=%%A=%%B
  )
)

:: Sort and write the results to a new file
>output.txt (
  for /f "tokens=2,3 delims=$=" %%A in ('sort /r temp.txt') do echo %%A %%B
)
del "temp.txt"

:: Show the result
type output.txt

EDIT 2

And here is another option sorted by count descending that assumes REPL.BAT is somewhere within your PATH

@echo off
setlocal enableDelayedExpansion

:: Clear any existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Get a count of each identifier
for /f "usebackq delims=" %%A in ("test.txt") do (
  set /a "cnt=!$%%A!+1"
  set "$%%A=!cnt!"
)

:: Sort result by count descending and write to output file
set $|repl "\$(.*)=(.*)" "000000000000$2=$1 $2"|repl ".*(.{12}=.*)" $1|sort /r|repl ".{13}(.*)" $1 >output.txt

:: Show the result
type output.txt
Community
  • 1
  • 1
dbenham
  • 119,153
  • 25
  • 226
  • 353