6

I am trying to loop through a list of files, and return those files that are media files (images, video, gif, audio, etc.).

Seeing as there are a lot of media types, is there a library or perhaps better way to check this, than listing all types then checking a file against that list?

Here's what I'm doing so far:

import os
types = [".mp3", ".mpeg", ".gif", ".jpg", ".jpeg"]
files = ["test.mp3", "test.tmp", "filename.mpg", ".AutoConfig"]

media_files = []
for file in files:
    root, extention = os.path.splitext(file)
    print(extention)
    if extention in types:
        media_files.append(file)

print("Found media files are:")
print(media_files)

But note it didn't include filename.mpg, since I forgot to put .mpg in my types list. (Or, more likely, I didn't expect that list to include a .mpg file, so didn't think to list it out.)

BruceWayne
  • 21,782
  • 14
  • 49
  • 95
  • Yes, you can you mimetype check. Here is a example: [stackoverflow.com](https://stackoverflow.com/questions/43580/how-to-find-the-mime-type-of-a-file-in-python) – Cpp Forever Mar 22 '19 at 21:25
  • If you're running on UNIX/Linux, you can use `file` to determine media type. – tk421 Mar 22 '19 at 21:26
  • @CppForever - I found that, and am studying that library, but am not sure how to check without something like - `if mime.from_file("media.mp3") == "application/mp3" or ...:`? I am missing understanding something I think... – BruceWayne Mar 22 '19 at 21:26
  • You need to use internet media type. For example .mp3 became audio/mpeg – Cpp Forever Mar 22 '19 at 21:32
  • 1
    @CppForever so do I just heck generally "is the file a mime type" without having to check exactly what kind? – BruceWayne Mar 22 '19 at 21:41
  • 1
    After you get mime type for example audio/mp3 you can split by / character and get the first part and check if it is audio or video or image – Cpp Forever Mar 22 '19 at 21:43

3 Answers3

7

For this purpose you need to get internet media type for file, split it by / character and check if it starts with audio,video,image.

Here is a sample code:

import mimetypes
mimetypes.init()

mimestart = mimetypes.guess_type("test.mp3")[0]

if mimestart != None:
    mimestart = mimestart.split('/')[0]

    if mimestart in ['audio', 'video', 'image']:
        print("media types")

NOTE: This method assume the file type by its extension and don't open the actual file, it is based only on the file extension

Cpp Forever
  • 602
  • 6
  • 13
  • 1
    You should check for `None` (i.e. unknown type) when using [guess_type](https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type). However, you should note that this method will only check the extension, so it cannot detect the file's actual type. – ekhumoro Mar 22 '19 at 22:47
  • While `mimetypes` does exactly what the OP asks, maybe also point to https://pypi.org/project/python-libmagic/ which inspects file contents, not just the filename. `libmagic` is the library behind the Unix `file` command. – tripleee Mar 23 '19 at 10:44
  • 1
    Thanks so much for this! I really appreciate both answers too, I like this since I know the filetypes and can check without opening the file. Thanks **so** much! :D – BruceWayne Mar 23 '19 at 18:19
  • 1
    For anyone - [Here's a list of common Web MIME types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Complete_list_of_MIME_types). And [here's a longer list](https://www.iana.org/assignments/media-types/media-types.xhtml) (but I note it doesn't include `mp3` for some reason). – BruceWayne Mar 23 '19 at 21:52
2

There is another method that is based not on the file extension but on the file contents using the media type library pypi.org/project/python-libmagic:

Here is the sample code for this library:

import magic

magic = magic.Magic()
mimestart = magic.from_file("test.mp3").split('/')[0]

if mimestart == 'audio' or mimestart == 'video' or mimestart == 'image':
    print("media types")

NOTE: for using this code sample you need to install python-libmagic trough pip.

Cpp Forever
  • 602
  • 6
  • 13
  • 1
    I assume this method, where it actually checks the file contents itself, is most often used when you can't trust the extension, for whatever reason? Thanks for this! – BruceWayne Mar 23 '19 at 18:27
  • For example in linux the executables don't have extension but have a signature. – Cpp Forever Mar 23 '19 at 18:34
  • 1
    ohhh okay! I am running Windows but using raspberry pi so that will likely come in handy. Thanks again!! – BruceWayne Mar 23 '19 at 18:54
0

You may list media files as follows,
lsmedia.py:

import os

def lsmedia(mypath):
    img_fm = (".tif", ".tiff", ".jpg", ".jpeg", ".gif", ".png", ".eps", 
          ".raw", ".cr2", ".nef", ".orf", ".sr2", ".bmp", ".ppm", ".heif")
    vid_fm = (".flv", ".avi", ".mp4", ".3gp", ".mov", ".webm", ".ogg", ".qt", ".avchd")
    aud_fm = (".flac", ".mp3", ".wav", ".wma", ".aac")
    media_fms = {"image": img_fm, "video": vid_fm, "audio": aud_fm}

    fns = lambda path, media : [fn for fn in os.listdir(path) if any(fn.lower().endswith(media_fms[media]) for ext in media_fms[media])]
    img_fns, vid_fns, aud_fns = fns(mypath, "image"), fns(mypath, "video"), fns(mypath, "audio")

    print(f"State of media in '{mypath}'")
    print("Images: ", len(img_fns), " | Videos: ", len(vid_fns), "| Audios: ", len(aud_fns))
    
    return (img_fns, vid_fns, aud_fns)

mypath = "/home/DATA_Lia/data_02/sample" # define dir
(imgs, vids, auds) = lsmedia(mypath)

output:

State of media in '/home/DATA_Lia/data_02/sample'
Images:  24  | Videos:  3 | Audios:  5
San Askaruly
  • 147
  • 6