73

I am concerned about the order of files and directories given by os.walk(). If I have these directories, 1, 10, 11, 12, 2, 20, 21, 22, 3, 30, 31, 32, what is the order of the output list?

Is it sorted by numeric values?

1 2 3 10 20 30 11 21 31 12 22 32

Or sorted by ASCII values, like what is given by ls?

1 10 11 12 2 20 21 22 3 30 31 32

Additionally, how can I get a specific sort?

double-beep
  • 3,889
  • 12
  • 24
  • 35
Vahid Mirjalili
  • 5,363
  • 15
  • 47
  • 73

3 Answers3

106

os.walk uses os.listdir. Here is the docstring for os.listdir:

listdir(path) -> list_of_strings

Return a list containing the names of the entries in the directory.

path: path of directory to list

The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

(my emphasis).

You could, however, use sort to ensure the order you desire.

for root, dirs, files in os.walk(path):
   for dirname in sorted(dirs):
        print(dirname)

(Note the dirnames are strings not ints, so sorted(dirs) sorts them as strings -- which is desirable for once.

As Alfe and Ciro Santilli point out, if you want the directories to be recursed in sorted order, then modify dirs in-place:

for root, dirs, files in os.walk(path):
   dirs.sort()
   for dirname in dirs:
        print(os.path.join(root, dirname))

You can test this yourself:

import os

os.chdir('/tmp/tmp')
for dirname in '1 10 11 12 2 20 21 22 3 30 31 32'.split():
     try:
          os.makedirs(dirname)
     except OSError: pass


for root, dirs, files in os.walk('.'):
   for dirname in sorted(dirs):
        print(dirname)

prints

1
10
11
12
2
20
21
22
3
30
31
32

If you wanted to list them in numeric order use:

for dirname in sorted(dirs, key=int):

To sort alphanumeric strings, use natural sort.

unutbu
  • 711,858
  • 148
  • 1,594
  • 1,547
  • 5
    The reason Python goes out of its way to avoid documenting any reliable order is that it uses different functions on different platforms (`FindNextFileW`, `DosFindNext`, `readdir`), and those functions are themselves documented to punt to the filesystem on most platforms, and the filesystems generally either don't document an order or give you something completely useless. – abarnert Aug 17 '13 at 01:11
  • 2
    I think this does not sort multi level hierarchies because `sorted` is not in-place. To do that use `sort` as explained by Alfe. – Ciro Santilli新疆棉花TRUMP BAN BAD Mar 24 '14 at 12:29
43

os.walk() yields in each step what it will do in the next steps. You can in each step influence the order of the next steps by sorting the lists the way you want them. Quoting the 2.7 manual:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting

So sorting the dirNames will influence the order in which they will be visited:

for rootName, dirNames, fileNames in os.walk(path):
  dirNames.sort()  # you may want to use the args cmp, key and reverse here

After this, the dirNames are sorted in-place and the next yielded values of walk will be accordingly.

Of course you also can sort the list of fileNames but that won't influence any further steps (because files don't have descendants walk will visit).

And of course you can iterate through sorted versions of these lists as unutbu's answer proposes, but that won't influence the further progress of the walk itself.

The unmodified order of the values is undefined by os.walk, meaning that it will be "any" order. You should not rely on what you experience today. But in fact it will probably be what the underlying file system returns. In some file systems this will be alphabetically ordered.

Alfe
  • 47,603
  • 17
  • 84
  • 139
37

The simplest way is to sort the return values of os.walk(), e.g. using:

for rootName, dirNames, fileNames in sorted(os.walk(path)):
    #root, dirs and files are iterated in order... 
vpuente
  • 712
  • 6
  • 15
  • 1
    I don't know why are people ignoring this answer, it's the cleanest and simplest solution... TY – Antonio Teh Sumtin Sep 03 '16 at 11:19
  • Because of this [The decline of SO](http://techblog.bozho.net/i-stopped-contributing-to-stackoverflow-but-its-not-declining/) ? Nah... probably because it is my only answer :-) Thanks for the up-vote! – vpuente Sep 27 '16 at 17:05
  • 4
    sadly this didn't work for me :( – Reiion Dec 14 '16 at 11:41
  • I required both this and sorting of the lists I was interested in (fileNames in my case). Then it worked consistently across platforms. Thanks :) – Dave Knight Jan 05 '17 at 19:11
  • 17
    This will first collect all values the `os.walk()` delivers into a list, then sort that list, then run the `for` loop. This list can become very large. Collecting it can take a lot of time. Effectively the advantages of the generator-features of `os.walk()` are destroyed by this. Sorting the results for each directory in-place (see my answer) may seem a little more complicated but I think keeping the generator-advantages is worth the effort. – Alfe Mar 06 '17 at 02:15
  • 1
    I used this to sort directories and files : for subdir, dirs, files in sorted(os.walk(rootDir)): for file in sorted(files): – Aqib Mumtaz May 28 '18 at 13:35
  • This sorts the individual yield returns, but does not sort recursively by root path if you're yielding the return values in a function. E.g., if you were returning `for f in files: yield root + f`, the result of the function wouldn't be sorted. – brandonscript Apr 08 '21 at 23:13