Capturing terminal output into pandas dataframe without creating external text file

Question

I am using ffmpeg's extract_mvs file to generate some text information. I would use a command like this in the terminal:

/extract_mvs input.mp4 > output.txt

I would like to use this command with Popen or other subprocess in python such that instead of output.txt, the data is passed straight to a pandas data frame without actually generating the text file.

The idea is to automate this multiple times, so, I am trying to avoid many .txt files from being generated and thus having to open() them one by one.

I thought of something like this:

import subprocess
cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
df = pd.read_csv(a.communicate()[0], sep=',')

But then I get an error: OSError: Expected file path name or file-like object, got <class 'bytes'> type

Can it be fixed and extended so as to read straight from subprocess to pandas?

score 5 · Accepted Answer · answered Feb 23 '18 at 13:43

I found a workaround, using part of the answer of Keith and the one found here, to pass information from string to pandas dataframe.

The final working code is:

import sys
import subprocess
import pandas as pd

cmd = ['./extract_mvs', 'input.mp4']
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")

Keith Dowd · Answer 2 · 2018-02-23T13:52:39.920

Updated answer:

The more I think about your question and the output from the first answer I suggested, the more I think your problem is not a decoding issue and is perhaps more a failure to provide the right input to pd.read_csv(). As an alternative you could try skipping pd.read_csv() entirely. Instead, you could try reading the output from the subprocess line by line into a dataframe.

Something like this:

cmd = ['./extract_mvs', 'input.mp4']

df = pd.DataFrame()

a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

for line in a.stdout:
    df = pd.concat([df, line])

a.wait()

Again, I haven't tested this code myself (because I'm traveling and using my phone right now), but I hope this gets you a little closer to a solution.

Original answer:

I haven't tested this, but I think you just need to decode the results returned by the execution of your subprocess. Specifically, you need to decode your results from bytes to utf-8.

You can try: pd.read_csv(a.communicate()[0].decode('utf-8'))

Thanks for the input. It's one step closer to the solution I guess. When I try the above I get: http://prntscr.com/iipcq6 It prints the information in the console while it was supposed to store in df. When calling df, it says it is not defined. — tavalendo, Feb 23 '18 at 12:17

score 0 · Answer 3 · answered Nov 15 '19 at 21:40

0

import os
import subprocess
import pandas as pd
import sys
cmd = 'NSLOOKUP email.fullcontact.com'
df = pd.DataFrame()
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)

if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

b = StringIO(a.communicate()[0].decode('utf-8'))

df = pd.read_csv(b, sep=",")
column = list(df.columns)
name = list(df.iloc[1])[0].strip('Name:').strip()
name

answered Nov 15 '19 at 21:40

kashyap

1

Please don't just leave code behind as an answer. Explain what your code does in English and explain why it is the correct answer. – Sailanarmo Nov 15 '19 at 21:58

Capturing terminal output into pandas dataframe without creating external text file

3 Answers3

Updated answer:

Original answer: