Python fastest way to read a large number of small files into memory?

Question

I'm trying to read a few thousands html files stored on disk.

Is there any way to do better than;

for files in os.listdir('.'):
    if files.endswith('.html') :
        with (open) files as f:
            a=f.read()
            #do more stuffs

Is this a repetitive task or a one-time operation? If repetitive, are the names of the files and their total amount expected to change or are they constant? — barak manos, Nov 29 '14 at 12:35
it's a repetitive task the name of the files are expected to change but not the extension. — DJJ, Nov 29 '14 at 12:39

score 2 · Accepted Answer · answered Nov 29 '14 at 12:43

2

For a similar problem I have used this simple piece of code:

import glob
for file in glob.iglob("*.html"):
    with open(file) as f:
        a = f.read()

iglob doesn't stores all file simultaneously, this is perfect with a huge directory.
Remenber to close files after you have finished, the construct "with-open" make sure for you.

answered Nov 29 '14 at 12:43

Thomas8

1,035
1
11
20

Thanks for the suggestion. I've made sure to close the file. But I'm looking for a way to read the files faster. So simultaneous reading would not be a bad thing actually. – DJJ Nov 29 '14 at 14:22
what are you doing with the files? Maybe with more details we could help you. Look at [this](http://stackoverflow.com/questions/14130642/python-multi-threading-file-processing) multi-threaded file processing. It's a little tricky – Thomas8 Dec 01 '14 at 10:31

Python fastest way to read a large number of small files into memory?

1 Answers1