6

I need to validate the file type of the uploaded file and should allow only pdf, plain test and MS word files. Here is my model and and the form with validation function. But, I'm able to upload files even without the extension.

class Section(models.Model):
    content = models.FileField(upload_to="documents")

class SectionForm(forms.ModelForm):
    class Meta:
        model = Section
    FILE_EXT_WHITELIST = ['pdf','text','msword']

    def clean_content(self):
        content = self.cleaned_data['content']
        if content:
            file_type = content.content_type.split('/')[0]
            print file_type
            if len(content.name.split('.')) == 1:
                raise forms.ValidationError("File type is not supported.")
            if content.name.split('.')[-1] in self.FILE_EXT_WHITELIST:
                return content
            else:
                raise forms.ValidationError("Only '.txt' and '.pdf' files are allowed.")

Here is the view,

def section_update(request, object_id):
    section = models.Section.objects.get(pk=object_id)
    if 'content' in request.FILES:
            if request.FILES['content'].name.split('.')[-1] == "pdf":
                content_file = ContentFile(request.FILES['content'].read())
                content_type = "pdf"
                section.content.save("test"+'.'+content_type , content_file)
                section.save()

In my view, I'm just saving the file from the request.FILE. I thought while save() it'll call the clean_content and do content-type validation. I guess, the clean_content is not at all calling for validation.

Babu
  • 2,396
  • 2
  • 25
  • 44

3 Answers3

5

You approach will not work: As an attacker, I could simply forge the HTML header to send you anything with the mime type text/plain.

The correct solution is to use a tool like file(1) on Unix to examine the content of the file to determine what it is. Note that there is no good way to know whether something is really plain text. If the file is saved in 16 bit Unicode, the "plain text" can even contain 0 bytes.

See this question for options how to do this: How to find the mime type of a file in python?

Community
  • 1
  • 1
Aaron Digulla
  • 297,790
  • 101
  • 558
  • 777
  • 1
    Yes, I agree with you. "trust but verify" is insisted on django docs itself. But I need to raise the `forms.ValidationError` for an invalid file how ever it's verified. – Babu Aug 02 '12 at 08:46
  • No, `def clean_content` is not at all calling. I guess, we can't mix up the model and and it's form to raise validation errors. I'll check the file as you suggested and redirect to an error view if the validation fails. Thanks. – Babu Aug 02 '12 at 08:56
  • Ah. In that case, your question should be "Why is clean_content() never called?" I suggest to open a new question. – Aaron Digulla Aug 02 '12 at 09:03
2

You can use python-magic

import magic
magic.from_file('/my/file.jpg', mime=True)
# image/jpeg
jmoz
  • 7,017
  • 4
  • 29
  • 31
0

This is an old question, but for later users main question as mentioned in comments is why field validation not happens, and as described in django documentation field validation execute when you call is_valid(). So must use something sa bellow in view to activate field validation:

section = models.Section.objects.get(pk=object_id)    
if request.method == 'POST':    
   form = SectionForm(request.POST, request.FILES)
   if form.is_valid:
      do_something_with_form

Form validation happens when the data is cleaned. If you want to customize this process, there are various places to make changes, each one serving a different purpose. Three types of cleaning methods are run during form processing. These are normally executed when you call the is_valid() method on a form