18

I am currently working on a small web interface which allows different users to upload files, convert the files they have uploaded, and download the converted files. The details of the conversion are not important for my question.

I am currently using flask-uploads to manage the uploaded files, and I am storing them in the file system. Once a user uploads and converts a file, there are all sorts of pretty buttons to delete the file, so that the uploads folder doesn't fill up.

I don't think this is ideal. What I really want is for the files to be deleted right after they are downloaded. I would settle for the files being deleted when the session ends.

I've spent some time trying to figure out how to do this, but I have yet to succeed. It doesn't seem like an uncommon problem, so I figure there must be some solution out there that I am missing. Does anyone have a solution?

davidism
  • 98,508
  • 22
  • 317
  • 288
seaturtlecook
  • 193
  • 1
  • 2
  • 5

3 Answers3

35

There are several ways to do this.

send_file and then immediately delete (Linux only)

Flask has an after_this_request decorator which could work for this use case:

@app.route('/files/<filename>/download')
def download_file(filename):
    file_path = derive_filepath_from_filename(filename)
    file_handle = open(file_path, 'r')
    @after_this_request
    def remove_file(response):
        try:
            os.remove(file_path)
            file_handle.close()
        except Exception as error:
            app.logger.error("Error removing or closing downloaded file handle", error)
        return response
    return send_file(file_handle)

The issue is that this will only work on Linux (which lets the file be read even after deletion if there is still an open file pointer to it). It also won't always work (I've heard reports that sometimes send_file won't wind up making the kernel call before the file is already unlinked by Flask). It doesn't tie up the Python process to send the file though.

Stream file, then delete

Ideally though you'd have the file cleaned up after you know the OS has streamed it to the client. You can do this by streaming the file back through Python by creating a generator that streams the file and then closes it, like is suggested in this answer:

def download_file(filename):
    file_path = derive_filepath_from_filename(filename)
    file_handle = open(file_path, 'r')

    # This *replaces* the `remove_file` + @after_this_request code above
    def stream_and_remove_file():
        yield from file_handle
        file_handle.close()
        os.remove(file_path)

    return current_app.response_class(
        stream_and_remove_file(),
        headers={'Content-Disposition': 'attachment', 'filename': filename}
    )

This approach is nice because it is cross-platform. It isn't a silver bullet however, because it ties up the Python web process until the entire file has been streamed to the client.

Clean up on a timer

Run another process on a timer (using cron, perhaps) or use an in-process scheduler like APScheduler and clean up files that have been on-disk in the temporary location beyond your timeout (e. g. half an hour, one week, thirty days, after they've been marked "downloaded" in RDMBS)

This is the most robust way, but requires additional complexity (cron, in-process scheduler, work queue, etc.)

Sean Vieira
  • 140,251
  • 31
  • 286
  • 277
  • ooo, didn't know about the `after_this_request` handler – corvid Jul 07 '14 at 16:41
  • One additional question: do you know of an easy way to refresh the page after all of this? – seaturtlecook Jul 07 '14 at 20:10
  • Take a look at http://stackoverflow.com/questions/16840857/flask-send-from-directory-and-also-refresh-the-page – Sean Vieira Jul 07 '14 at 20:17
  • @after_this_request works if I play video in html and after that remove video from temp? with any timeout? – urb Sep 10 '15 at 09:11
  • @SeanVieira Does it harm by keeping these open file pointers in long run? – Ishan Bhatt Jul 28 '16 at 10:33
  • Good point @IshanBhatt - I've updated the answer to close the file_handle as well. – Sean Vieira Jul 28 '16 at 14:35
  • Why close the file handle _after_ removing the file? Other way around? – Bob Stein Apr 21 '18 at 00:29
  • Order *shouldn't* matter here, since the handle doesn't care if the file still exists or not (I think). – Sean Vieira Apr 21 '18 at 11:24
  • 1
    Unfortuanetly, calling `file_handle.close()` in the handler causes `ValueError: read of closed file` or `ValueError: I/O operation on closed file.`. Works without closing it, though. – fracz May 14 '19 at 09:14
  • I recommend also checking out the following post, as it addresses a couple concerns with the approach explained here. https://stackoverflow.com/questions/40853201/remove-file-after-flask-serves-it?rq=1 – Gman Jun 20 '19 at 16:11
  • In my case, This needs a small change: `headers={'Content-Disposition': 'attachment; filename: ; mimetype='}` – Luat Vu Dinh Nov 27 '19 at 11:27
12

You can also store the file in memory, delete it, then serve what you have in memory.

For example, if you were serving a PDF:

import io
import os

@app.route('/download')
def download_file():
    file_path = get_path_to_your_file()

    return_data = io.BytesIO()
    with open(file_path, 'rb') as fo:
        return_data.write(fo.read())
    # (after writing, cursor will be at last byte, so move it to start)
    return_data.seek(0)

    os.remove(file_path)

    return send_file(return_data, mimetype='application/pdf',
                     attachment_filename='download_filename.pdf')

(above I'm just assuming it's PDF, but you can get the mimetype programmatically if you need)

Garrett
  • 2,584
  • 2
  • 27
  • 43
  • Thank you very much. Sorry for late comment but does this technique have a disadvantage in production phase? – Kerem Nayman Dec 16 '20 at 08:12
  • 1
    @KeremNayman There's no production disadvantage that I know of. We use this trick in production at the startup I'm at. – Garrett Dec 16 '20 at 15:40
  • 1
    Excellent solution! This is the only one out of the solutions I've seen that doesn't rely on threading tricks, timers, etc. Very reliable! Thanks! – Aristides Jan 23 '21 at 05:25
0

Based on @Garrett comment, the better approach is to not blocking the send_file while removing the file. IMHO, the better approach is to remove it in the background, something like the following is better:

import io
import os
from flask import send_file
from multiprocessing import Process

@app.route('/download')
def download_file():
    file_path = get_path_to_your_file()

    return_data = io.BytesIO()
    with open(file_path, 'rb') as fo:
        return_data.write(fo.read())
        return_data.seek(0)    

    background_remove(file_path)

    return send_file(return_data, mimetype='application/pdf',
                     attachment_filename='download_filename.pdf')


def background_remove(path):
    task = Process(target=rm(path))
    task.start()

    
def rm(path):
    os.remove(path)
Ahmad AlMughrabi
  • 1,052
  • 10
  • 23