5

I am using ffmpeg's extract_mvs file to generate some text information. I would use a command like this in the terminal:

/extract_mvs input.mp4 > output.txt

I would like to use this command with Popen or other subprocess in python such that instead of output.txt, the data is passed straight to a pandas data frame without actually generating the text file.

The idea is to automate this multiple times, so, I am trying to avoid many .txt files from being generated and thus having to open() them one by one.

I thought of something like this:

import subprocess
cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
df = pd.read_csv(a.communicate()[0], sep=',')

But then I get an error: OSError: Expected file path name or file-like object, got <class 'bytes'> type

Can it be fixed and extended so as to read straight from subprocess to pandas?

tavalendo
  • 727
  • 7
  • 26

3 Answers3

5

I found a workaround, using part of the answer of Keith and the one found here, to pass information from string to pandas dataframe.

The final working code is:

import sys
import subprocess
import pandas as pd

cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")
tavalendo
  • 727
  • 7
  • 26
2

Updated answer:

The more I think about your question and the output from the first answer I suggested, the more I think your problem is not a decoding issue and is perhaps more a failure to provide the right input to pd.read_csv(). As an alternative you could try skipping pd.read_csv() entirely. Instead, you could try reading the output from the subprocess line by line into a dataframe.

Something like this:

cmd = ['./extract_mvs', 'input.mp4']

df = pd.DataFrame()

a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

for line in a.stdout:
    df = pd.concat([df, line])

a.wait()

Again, I haven't tested this code myself (because I'm traveling and using my phone right now), but I hope this gets you a little closer to a solution.

Original answer:

I haven't tested this, but I think you just need to decode the results returned by the execution of your subprocess. Specifically, you need to decode your results from bytes to utf-8.

You can try: pd.read_csv(a.communicate()[0].decode('utf-8'))

Keith Dowd
  • 463
  • 2
  • 9
  • Thanks for the input. It's one step closer to the solution I guess. When I try the above I get: http://prntscr.com/iipcq6 It prints the information in the console while it was supposed to store in df. When calling df, it says it is not defined. – tavalendo Feb 23 '18 at 12:17
0
import os
import subprocess
import pandas as pd
import sys
cmd = 'NSLOOKUP email.fullcontact.com'
df = pd.DataFrame()
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")
column = list(df.columns)
name = list(df.iloc[1])[0].strip('Name:').strip()
name
  • Please don't just leave code behind as an answer. Explain what your code does in English and explain why it is the correct answer. – Sailanarmo Nov 15 '19 at 21:58