54

I'm working on a Python script that needs to create about 50 distinct temporary files, which are all appended frequently during the course of the script and merged at the end. I'm sure that the tempfile module can do what I need, but I haven't been able to figure out how from reading the documentation.

I want to use temporary files--as opposed to variables--to conserve system memory, as these data chunks grow large as the script processes tens of thousands of other files.

The following chunk of code is the hack I'm currently using to create these files (untemporarily) in an untemporary directory:

item = (string from another file)   # string must id file for future use
tmpfile = 'tmpfiles/' + item
if item not in totalitems:
   totalitems.add(item)
   with open(tmpfile, 'w') as itemfile:
      output = some stuff
      tmpfile.write(output)
else:
   with open(tmpfile, 'a') as itemfile:
      output = different stuff
      tmpfile.write(output)

I think what I need is tempfile.NamedTemporaryFile(). According to the documentation:

That name can be retrieved from the name member of the file object.

Unfortunately, I don't understand what that means. I just need to be able to call each file again later when I run across its corresponding "item" again in the files I'm processing. I presume this is rather straight forward and I'm just being dense. In case it matters, I have versions of this script for both Python 2.7.1 and 3.2.3. I only really need for one or the other to work; I created both just as a learning exercise.

martineau
  • 99,260
  • 22
  • 139
  • 249
Gregory
  • 3,390
  • 7
  • 25
  • 41
  • possible duplicate of [Best way to generate random file names in Python](http://stackoverflow.com/questions/10501247/best-way-to-generate-random-file-names-in-python) – Joe Jun 15 '12 at 01:04
  • 1
    @Joe, Part of this question does appear to be a duplicate of the thread you linked. Part of it is not; see comment under Levon's answer below. – Gregory Jun 15 '12 at 01:36
  • Why do you need these files to be named? If they're unnamed (pre-deleted), there's less to go wrong in terms of cleanup. You can simply store the tmpfile object, not its name, and then call `seek(0)` to go to the beginning to be ready to read... or mmap its contents, or otherwise access it however you like. – Charles Duffy Jun 15 '12 at 01:57
  • @CharlesDuffy, You may be right, but I don't understand. Here's the scenario: there are 50 "items" (actually short strings) that recur throughout the data files I'm processing. I create a tempfile to collect information about each of these 50 items from the data files. My question is about how to find the tempfile related to a particular "item" when I run across that item again after its tempfile has been closed. – Gregory Jun 15 '12 at 02:10
  • @pyrogerg "After its tempfile has been closed"? Why close the file? – Charles Duffy Jun 15 '12 at 02:40
  • @CharlesDuffy, I'm assuming that a tempfile is using memory when it is open. If that is true, then I won't have enough memory to keep all the tempfiles open simultaneously. If the assumption is false then I can see no reason to close them. Still, I'll need to know how to write to the appropriate tempfile when its "item" shows up. Most tempfiles will be appended when each of many data files is processed. – Gregory Jun 15 '12 at 02:48
  • @pyrogerg tempfiles use only the amount of memory needed for a handle and wrapper object, which is tiny, and has nothing to do with the size of its contents. So -- keep a dict mapping from whatever kind of item you use to the tempfile objects. – Charles Duffy Jun 15 '12 at 02:58
  • @CharlesDuffy, Thanks for clarifying that. So, just to recap: I can open as many tempfiles as I need, keep track of them with dict mapping, and not worry about closing them as they'll go away when the script completes. – Gregory Jun 15 '12 at 06:12
  • @pyrogerg Pretty much right. The number of file descriptors is limited, but it's typically over 1000 by default, so you need to worry about it only when operating on a quite different scale. – Charles Duffy Jun 15 '12 at 06:43

1 Answers1

62

"That name can be retrieved from the name member of the file object."

means that you can get the name of the temporary file created like so:

In [4]: import tempfile

In [5]: tf = tempfile.NamedTemporaryFile()  
In [6]: tf.name  # retrieve the name of the temp file just created
Out[6]: 'c:\\blabla\\locals~1\\temp\\tmptecp3i'

Note: By default the file will be deleted when it is closed. However, if the delete parameter is False, the file is not automatically deleted. See the Python docs on this for more information.

Since you can retrieve the name of each of the 50 temp files you want to create, you can save them, e.g., in a list, before you use them again later (as you say). Just be sure to set the delete value accordingly so that the files don't disappear when you close them (in case you plan to close, and then later reopen them).

I explained how to create temporary filenames in more detail here Best way to generate random file names in Python

Community
  • 1
  • 1
Levon
  • 118,296
  • 31
  • 184
  • 178
  • Thanks for the link to the thread with your original answer. I'm still unsure about how to key the file to the value of some variable "item". I've struck on the idea of using a dictionary for that; does this seem appropriate following line 5 in your code above? `item_tmpfile={item:tf.name}`. Then I could later call `item_tmpfile[item]`. – Gregory Jun 15 '12 at 01:35
  • @pyrogerg Yes, I think that should work, you'd be associating the name of each temporary file with a different `item` where `item` would be the key for retrieving the temp filename later. I assume your dictionary would have eventually 50 entries, right? Are you planning to close the files between uses? if so mind the `delete` parameter. – Levon Jun 15 '12 at 01:38
  • Thanks for the `delete` tip; I caught that in your original answer. I think I'm on the right track now. I presume I can use these files like any other in the course of the script, e.g. `with open(item_tmpfiles[item]) as tf:`? – Gregory Jun 15 '12 at 01:52
  • @pyrogerg You should be, just note that by default the temp file is created with mode '`w+b`' -- see the [man page](http://docs.python.org/library/tempfile.html) so as long as you are consistent with the modes you should be ok. Probably best to run a few simple tests to be sure. – Levon Jun 15 '12 at 02:05
  • @Lavon, I think this will work, but I'm having problems with keys not being retained in my dictionary. That may be a question for another thread. – Gregory Jun 15 '12 at 02:49
  • @pyrogerg yes, I think otherwise you'll have the scope of your original question wandering perhaps too much. I'd ask a separate question now that you know how to generate the temp files and get the names this part is answered. You could start your new question with the problem of storing file names and associating them with strings (`item`s). – Levon Jun 15 '12 at 02:54
  • @pyrogerg This way, with a new question, you'll also get a new bunch of fresh eyes looking at that particular problem – Levon Jun 15 '12 at 02:55