2

I have a variable %%p created from for /f command When i try to use it with some additional references like: %%~dp and then write some text afterwards it accesses a different variable

set var="%%~dpabc.txt"

Code outputs

%%~dpa instead of %%~dp
dbenham
  • 119,153
  • 25
  • 226
  • 353
  • 1
    you missed the `p` for the metavariable `%%~dppabc.txt` It is `d`rive `p`ath `p` because you used `%%p` if you used `%%f` it would simply change to `%%~dpfabc.txt` – Gerhard May 21 '19 at 12:05
  • I know but can i use it somehow? – Edward Gierek May 21 '19 at 12:08
  • 1
    yes, use `%%~dppabc.txt` note the second `p`? – Gerhard May 21 '19 at 12:08
  • 3
    My recommendation: Don't use lower case letters as loop variable to make it more clear for you what is a modifier and what is the loop variable. Best would be to avoid as loop variable the letters `ADFNPSTXZadfnpstxz` and use other letters like `I` and `J` or ASCII characters like `#` as loop variable. See also [Why is no string output with 'echo %var%' after using 'set var = text' on command line?](https://stackoverflow.com/a/26388460/3074564) explaining why the syntax `set "var=%%~dpIabc.txt"` is usually better with first double quote left to variable name instead of right to equal sign. – Mofi May 21 '19 at 12:19
  • 3
    I don't know who voted to close on the grounds that this question is not about programming. But I strongly disagree - this is an excellent programming question about a tricky problem with the batch scripting language. – dbenham May 21 '19 at 22:05
  • 1
    @Mofi - Use of lower case for modifiers and upper for variables makes the programmer's ***intent*** more obvious, but it does not prevent accidental unexpected parsing like the OP has. Your recommendation to avoid `ADFNPSTXZadfnpstxz` is good. You should add `$` to that list. – dbenham May 21 '19 at 23:00
  • @dbenham and @EdwardGierek I have written once an [answer](https://stackoverflow.com/a/37236316/3074564) explaining in detail why the usage of `ADFNPSTXZadfnpstxz` is not recommended and why the usage of upper case letters other than `ADFNPSTXZ` or even better ASCII characters like `#` or `$` is in general better to avoid misinterpretation by reader/writer of the batch file and also by `cmd.exe`. – Mofi May 22 '19 at 05:33
  • 1
    @Mofi - +1 on your referenced answer. But if parsing many tokens with FOR /F then it may be very difficult, if not impossible, to assign only characters that cannot be interpreted as modifiers. This is the problem I address in my answer below. – dbenham May 22 '19 at 11:38

3 Answers3

4

So you must be using FOR /F with multiple tokens, like

for /f "tokens=1-16" %%a in (file) do echo %%~dpabc.txt

Or your code could have nested FOR loops. Something like

for %%a in (something) do (
  for %%p in (somethingelse) do (
    echo %%~dpabc.txt
  )
)

Or even something like

for %%a in (something) do call :sub
exit /b

:sub
for %%p in (somethingelse) do echo %%~dpabc.txt
exit /b

All three code examples above will print out the drive and path of %%~dpa, followed by "bc.txt". As per the documentation, the FOR variables are global, so the DO clause of the subroutine FOR loop has access to both %%a and %%p.

Aschipfl does a good job documenting the rules for how modifiers and variable letters are parsed.

Whenever you use a FOR variable before a string literal, you must be extremely careful that the string literal cannot be interpreted as part of the FOR variable expansion. As can be seen with your example, this can be difficult. Make the literal dynamic, and the problem is even worse.

set /p "myFile=Enter a file name: "
for %%a in (something) do (
  for %%p in (somethingelse) do (
    echo %%~dp%myFile%
  )
)

If the user enters "abc.txt" then we are right back where we started. But looking at the code it is not obvious that you have a potential problem.

As Gerhard and Mofi say, you are safe if you use a character that cannot be interpreted as a modifier. But that is not always easy, especially if you are using FOR /F returning multiple tokens.

There are solutions!

1) Stop the FOR variable parsing with !! and delayed expansion

If you look at the rules for how cmd.exe parses scripts, you will see that FOR variables are expanded in phase 4 before delayed expansion occurs in phase 5. This provides the opportunity to use !! as a hard stop for the FOR expansion, provided that delayed expansion is enabled.

setlocal enableDelayedExpansion
for %%a in (something) do (
  for %%p in (somethingelse) do (
    echo %%~dp!!abc.txt
  )
)

The %%~dp is expanded properly in phase 4, and then in phase 5 !! is expanded to nothing, yielding your desired result of the drive letter followed by "abc.txt".

But this does not solve all situations. It is possible for ! to be used as a FOR variable, but that should be easy to avoid except under extreme situations.

More troubling is the fact that delayed expansion must be enabled. This is not an issue here, but if the FOR variable expands to a string containing ! then that character will be parsed by delayed expansion, and the results will most likely be messed up.

So the !! delayed expansion hack is safe to use only if you know that your FOR variable value does not contain !.

2) Use intermediate environment variables

The only simple foolproof method to avoid problems in all situations is to transfer the value of the FOR variable to an intermediate environment variable, and then toggle delayed expansion and work with the entire desired string.

for %%a in (something) do (
  for %%p in (somethingelse) do (
    set "drive=%%~dp"
    setlocal enableDelayedExpansion
    echo !drive!abc.txt
    endlocal
  )
)

3) Use Unicode characters via environment variables

There is a complex bullet proof solution, but it takes a good bit of background information before you can understand how it works.

The cmd.exe command processor represents all strings internally as Unicode, as are environment variables - Any Unicode code point other than 0x00 can be used. This also applies to FOR variable characters. The sequence of FOR variable characters is based on the numeric value of the Unicode code point.

But cmd.exe code, either from a batch script, or else typed into the command prompt, is restricted to characters supported by the active code page. That might seem like a dead end - what good are Unicode characters if you cannot access them with your code?

Well there is a simple, though non-intuitive solution: cmd.exe can work with predefined environment variable values that contain Unicode values outside the active code page!

All FOR variable modifiers are ASCII characters that are within the first 128 Unicode code points. So if you define variables named $1 through $n to contain a contiguous range of Unicode characters starting with say code point 256 (0x100), then you are guaranteed that your FOR variable can never be confused with a modifier.

So if $1 contains code point 0x100, then you would refer to the FOR variable as %%%$1%. And you can freely use modifiers like `%%~dp%$1%.

This strategy has an added benefit in that it is relatively easy to keep track of FOR variables when parsing a range of tokens with something like "tokens=1-30" because the variable names are inherently sequential. The active code page character sequencing usually does not match the sequence of the Unicode code points, which makes it difficult to access all 30 tokens unless you use the Unicode variable hack.

Now defining the $n variables with Unicode code points is not a trivial development effort. Thankfully it has already been done :-) Below is some code that demonstrates how to define and use the $n variables.

@echo off
setlocal disableDelayedExpansion
call :defineForChars 1
for /f "tokens=1-16" %%%$1% in (file) do echo %%~d%$16%abc.txt
exit /b

:defineForChars  Count
::
:: Defines variables to be used as FOR /F tokens, from $1 to $n, where n = Count*256
:: Also defines $max = Count*256.
:: No other variables are defined or tampered with.
::
:: Once defined, the variables are very useful for parsing lines with many tokens, as
:: the values are guaranteed to be contiguous within the FOR /F mapping scheme.
::
:: For example, you can use $1 as a FOR variable by using %%%$1%.
::
::   FOR /F "TOKENS=1-31" %%%$1% IN (....) DO ...
::
::      %%%$1% = token 1, %%%$2% = token 2, ... %%%$31% = token 31
::
:: This routine never uses SETLOCAL, and works regardless whether delayed expansion
:: is enabled or disabled.
::
:: Three temporary files are created and deleted in the %TEMP% folder, and the active
:: code page is temporarily set to 65001, and then restored to the starting value
:: before returning. Once defined, the $n variables can be used with any code page.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineForCharsInternal %1
exit /b
:defineForCharsInternal
set /a $max=%1*256
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$max%) do set /p "$%%N=")
for %%. in (dummy) do >nul chcp %%P  
del "%temp%\forVariables.%~1.*.txt"
exit /b

The :defineForChars routine was developed at DosTips as part of a larger group effort to easily access many tokens with a FOR /F statement.

The :defineForChars routine and variants are introduced in the following posts within that thread:

dbenham
  • 119,153
  • 25
  • 226
  • 353
3

This behaviour is caused by the kind of greedy nature of the parsing of for variable references and its ~-modifiers. Basically it follows these rules, given the preceding %/%%-signs have already been detected:

  • if Command Extensions are enabled (default), check if next character is ~; if yes, then:
    • take as many as possible of the following characters in the case-insensitive set fdpnxsatz (even multiple times each) that are preceding a character that defines a for variable reference or a $-sign; if such a $-sign is encountered, then:
      • scan for a :1; if found, then:
        • if there is a character other than % following the :, use it as a for variable reference and expand as expected, unless it is not defined, then do not expand;
        • if the : is the last character, cmd.exe will crash!
      • else (no : is found) do not expand anything;
    • else (if no $-sign is encountered) expand the for variable using all the modifiers;
  • else (if no ~ is found or Command Extensions are disabled) use the next character as a for variable reference, except it is a %-sign2, and expand, unless such is not defined, or there is not even a character following, then do not expand;

1) The string between $ and : is considered as the name of an environment variable, which may even be empty; since an environment variable cannot have an empty name, the behaviour is just the same as for an undefined environment variable.
2) A %-sign can be the name of a for meta-variable, but it cannot be expanded without a ~-modifier.


This answer has meanwhile been posted in an augmented manner as a community answer to the thread How does the Windows Command Interpreter (CMD.EXE) parse scripts?.

aschipfl
  • 28,946
  • 10
  • 45
  • 77
1

As already explained in the for meta-variable parsing rules, the ~-modifier detection happens in a greedy manner. But you can stop parsing by another for meta-variable, which eventually expands to nothing, or by the ~$-modifier as suggested by jeb in a comment, which does not even require another for meta-variable, so any existing one can be used:

rem // Using `%%~#` will expand to an empty string (note that `#` is not a valid `~`-modifier):
for %%# in ("") do (
    rem // Establish a `for`-loop that defines meta-variables `%%a` to `%%p`:
    for /F "tokens=1-16" %%a in ("1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16") do (
        rem /* Since `d`, `p` and `a` are all valid `~`-modifiers and a `for` meta-variable
        rem    `%%b` exists while `b` is not a valid `~`-modifier, `%%~dpab` is expanded: */
        echo(%%~dpabc.txt
        rem /* `for` meta-variable parsing is stopped after `%%~dp`, because the following `%`
        rem    is not a valid `~`-modifier, and neither exists a `for` meta-variable named `%`;
        rem    `%%~#` is expanded to an empty sting then (parsing surely stops at `#`): */
        echo(%%~dp%%~#abc.txt
        rem /* The following does not even require a particular `for` meta-variable like `%%#`,
        rem    it just uses the existing one `%%p` with the `~$`-modifier that specifies an
        rem    environment variable; since there is no variable name in between `$` and `:`,
        rem    there is certainly no such variable (since they must have names), hence `$~%:p`
        rem    expands to an empty string; note that `~$` is always the very last modifier: */
        echo(%%~dp%%~$:pabc.txt
    )
)

Note that this approach fails in case there is a for meta-variable named % (which is not quite common but possible).

aschipfl
  • 28,946
  • 10
  • 45
  • 77
  • 1
    You don't need an extra, empty for-variable, you could just use one of the already existing ones. `echo %%~dp%%~$=undefined=:aabc.txt` The `%%~$=undef=:a` expression is always empty. Btw. Your solution, like my own still fails, if a for-meta var `%` exists. Therefore, using *weak* for-variable names should always be avoided – jeb Dec 10 '20 at 19:00
  • 1
    Thanks, @jeb, great idea! We don't even need a variable name then, `%%~$:a` is sufficient and always expands to an empty string (even independent on context, batch vs. `cmd`). I incorporated your suggestion into my answer, but if you want to post it as an answer on your own, just let me know, then I'll edit it out of mine… – aschipfl Dec 10 '20 at 20:16