I want to classify file types based on their extensions in python.Before writing it up myself i wanted to check if there is any python package which can be used for this purpose. By file type i mean to classify it as eg. Doc,ppt,pdf,tar,txt,iso etc. ideally it would take the file name as input and return its type.i am running on linux
Asked
Active
Viewed 569 times
1
-
A file's extension has nothing to do with its type. – Burhan Khalid Sep 04 '12 at 06:48
-
3Take a look at this question: http://stackoverflow.com/questions/43580/how-to-find-the-mime-type-of-a-file-in-python . You can *guess* by extension using `mimetypes`, but something like the `python-magic` (mentioned in the second answer) may be more reliable. – kwatford Sep 04 '12 at 06:51
-
Not *nothing* (you hope they're related), but they are definitely not the same thing. Eg., You can totally change the extension of a `.jpg` to a `.doc`, but the type is still jpeg. – Matthew Adams Sep 04 '12 at 06:53
-
i just want to classify based on what the extension says. Not bothered about the actual content of the file. Any help now? – auny Sep 04 '12 at 06:57
2 Answers
2
You should look into a document metadata parser. I have used Apache Tika which is a java library in some of my projects. You can look at this question Python-based document metadata parser? to see how to use it in Python
![](../../users/profiles/-1.webp)
Community
- 1
- 1
![](../../users/profiles/514919.webp)
Pratik Mandrekar
- 8,388
- 3
- 33
- 59
1
In Linux you can use 'file' utillity which determine file type. So if you want you can use it and in your scripts too:
import subprocess
subprocess.call(['file', 'yourfile'])
![](../../users/profiles/438882.webp)
Denis
- 6,117
- 6
- 35
- 56
-
1Command 'file' uses libmagic library, there is a 'python-magic' module that provides native interface and uses the same logic. – neutrinus Mar 13 '13 at 15:57