0

I recently zipped up a number of files created by a script I wrote and sent them to a Windows-using colleague. He could not unzip the archive, since some of my filenames contained a : which isn’t legal on Windows.

It’s trivial to strip out the :, but what if there are other characters that I’m unaware of as being illegal in Windows path/filenames?

I wondered whether pathlib’s “pure” path objects would flag illegal characters in any way, but they do not as far as I can determine:

>>> from pathlib import PurePosixPath, PureWindowsPath
>>> pp = PurePosixPath("foo/bar:baz.txt")
>>> wp = PureWindowsPath(pp)
>>> print(wp)
foo\bar:baz.txt

Given that I do not have easy access to a Windows machine for testing, is there a simple way to ensure path/filenames generated by Python are “Windows-safe”?

wjv
  • 1,500
  • 1
  • 12
  • 17
  • It's tricky, as Maximilian Peters' link explains. If the top answer there answers your question, then we can close this one as a duplicate. – PM 2Ring May 16 '18 at 07:48
  • BTW, on Unix-like systems it's very easy to create a "fake" filesystem inside a regular file. So you could create a small NTFS or FAT32 filesystem that you could use to test filenames on. – PM 2Ring May 16 '18 at 07:50
  • @PM2Ring The top answer there seems to assume access to a Windows system. Also, it was written before `pathlib` became part of the standard library … and I was kind of hoping that `pathlib` might provide a solution. That said, it probably does provide an answer, namely: It can’t be done. (Your suggestion of using a virtual filesystem is useful!) – wjv May 16 '18 at 07:53

1 Answers1

0

The most simple solution would just be to avoid using reserved windows characters when building out your filename.

Looking at the following link: Naming Files, Path and Namespaces it quotes the following as being Windows illegal characters:

Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:

The following reserved characters:

  • < (less than)
  • > (greater than)
  • : (colon)
  • " (double quote)
  • / (forward slash)
  • \ (backslash)
  • | (vertical bar or pipe)
  • ? (question mark)
  • * (asterisk)
ScottMcC
  • 3,136
  • 1
  • 19
  • 29
  • They are illegal for a filename but not for a path. `c:\temp\tmp.txt` is a valid path but not a valid filename. – Maximilian Peters May 16 '18 at 07:53
  • IIRC, MS filesystems don't like `\n` in filenames, either. – PM 2Ring May 16 '18 at 07:54
  • Agreed, some of those characters are reserved for use in file paths but not filenames – ScottMcC May 16 '18 at 07:56
  • 1
    I think this is nonetheless usable. Just split your filename by `"\\"`, then ignore item `[0]` and for the rest of the list, check there's none of the above characters included. You could also add `\n`, but I find that a bit moot... – Jeronimo May 16 '18 at 10:17