2

Possible Duplicate:
How to find the mime type of a file in python?

I'm using an email processing API (sendgrid.com) that posts all incoming emails to a web request handler in my app. The attachments are posted as attachment0=xyz&attachment1=abc along with other email fields like 'to' 'cc' 'subject', etc...

I then store these attachments as files in the BlobStore (with App Engine). To serve these files back to the user, the mime_type/content_type must be specified. As I understand it, it is usually dependent on the file type. But it's not clear to me how to get the file type from the passed strings.

Is there a library that figures out the file type from the byte content of a file?

Just to clarify, there is no filename or file extension. Just the file's byte content.

Community
  • 1
  • 1
David Haddad
  • 3,116
  • 6
  • 28
  • 37
  • The accepted answer in http://stackoverflow.com/questions/43580/how-to-find-the-mime-type-of-a-file-in-python is not related to this question. The mention of ``python-magic`` is, however. – Nam Nguyen May 16 '11 at 11:07

1 Answers1

3

If you saved the filename when it was uploaded, you'd use mimetypes.guess_type function to give it a shot here. The linked SO question by Alexander is good to read.

Unfortunately, that is not your case. If all you have is a binary blob, I'm afraid you have to put on some custom heuristics here. Follow these simple steps:

  1. Build a map of known signatures. I'll give an example right away.
  2. Read in the first 4 bytes from the blob.
  3. Do a longest matching against the map you have built in step 1. By longest matching I mean if all 4 bytes matched, take it, then try with the first 3 bytes, the first 2, and finally the first 1.

For example:

ZIP file starts with two characters PK, RAR file starts with Rar!, PDF starts with %PDF, PNG starts with \x89PNG and so on

This would fail to identify some files (such as JPG) but you have a good start to build up here.

Or alternatively, you could use https://github.com/ahupp/python-magic too.

Nam Nguyen
  • 1,737
  • 9
  • 13