2

I am trying to pass the below command through subprocess.call()

command = ['htseq-count', '-t', 'miRNA', '-i', 'Name', f, annot_file, out_file]

Upon runtime, I receive the notice that htseq-count requires at least 2 arguments, meaning it's not recognizing that there are input files in the command.

Printing the command out at runtime gives the following:

['htseq-count', '-t', 'miRNA', '-i', 'Name', 'KDRD1XX_ACAGTG_L001_R1_001_trimmedaligned.sam', 'hsa.gff3', 'KDRD1XX.htseq.sam']

Which is the proper file inputs.

Inserting the printed out command (sans commas and quotations of course) works fine, so no issues there.

I have had no issue using variable lists before with subprocess.call() so any help would be appreciated!

Full code:

import sys
import subprocess
import os

annot_file='hsa.gff3'
dirs= os.listdir('.')

for f in dirs:
    if f.endswith("_trimmedaligned.sam"):

        of=f.split('_')
        outfile='>'+of[0]+'.htseq.sam'
        command=['htseq-count','-t','miRNA','-i','Name',f,annot_file, out_file] 
        #print(command)
        subprocess.call(command)

1 Answers1

1

> is shell syntax. It's a redirection of stdout to a file. That means you would need to run the command in a shell using subprocess.call(command, shell=True)

But because that would require to carefully quote all arguments to prevent from shell command injection, I recommend to run the command and save the output from Python:

for f in dirs:
    if not f.endswith("_trimmedaligned.sam"):
        continue

    of=f.split('_')
    outfile=of[0]+'.htseq.sam'
    command = [
        'htseq-count',
        '-t',
        'miRNA',
        '-i',
        'Name',
        f,
        annot_file,
    ]

    process = subprocess.Popen(command,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)

    stdout, stderr = process.communicate()

    # Don't miss to check the exit status
    if process.returncode != 0:
        print("Ooops! Something went wrong!")

    # Write output file
    with open(out_file, 'wb') as fd:
        fd.write(stdout)

PS: The above example works well for small output files. It buffers all output in memory. If the output files will reach a reasonable size, you may stream the output from the command using poll() like described here: https://stackoverflow.com/a/2716032/171318

hek2mgl
  • 133,888
  • 21
  • 210
  • 235
  • Thanks! That did it. Oddly enough, I've tried that before when a 'grep' command failed me to no avail, so I didn't even think to try it this time. – cobaltchaos Feb 01 '18 at 19:36
  • Glad that it helps. Please see my updated answer. I'd recommend to use `Popen` in this scenario because the only shell feature you need is writing output to a file which can be easily done with Python. I assume the output is not big, a few kb, correct? – hek2mgl Feb 01 '18 at 19:39
  • It's not so big, but I need it as a particularly aligned out file produced by the program not just line stdout produced line by line in linux, which is why the > preceeding a file name is part of the accepted arguments for htseq-count. I appreciate you trying to save me some fuss! – cobaltchaos Feb 01 '18 at 19:51
  • Ok, not a big deal then. – hek2mgl Feb 01 '18 at 19:55
  • So, I thought it was working, but it's now acting like it's only receiving 'htseq-count' and nothing else – cobaltchaos Feb 01 '18 at 20:18
  • Did you tried my updated answer or the original one? – hek2mgl Feb 01 '18 at 20:20
  • Seems to be working! Though the issue remains that I need the output as the alignment file from the program – cobaltchaos Feb 01 '18 at 20:29
  • First, I've fixed two problems with my code. Sorry. Check the update. If that doesn't fix your issue, what do you mean with *alignment file from the program*? – hek2mgl Feb 01 '18 at 20:33
  • So the std_out for this command on the command line is: 4694 GFF lines processed. 100000 SAM alignment records processed. 200000 SAM alignment records processed. 300000 SAM alignment records processed. 400000 SAM alignment records processed. 500000 SAM alignment records processed. 600000 SAM alignment records processed. 700000 SAM alignment records processed. 800000 SAM alignment records processed. 900000 SAM alignment records processed. 1000000 SAM alignment records processed. 1100000 SAM alignment records processed. – cobaltchaos Feb 01 '18 at 20:37
  • What I need looks like this:MIMAT0000062 0 MIMAT0000062_1 0 MIMAT0000062_2 0 MIMAT0000063 0 MIMAT0000064 0 MIMAT0000065 0 MIMAT0000066 0 MIMAT0000067 0 – cobaltchaos Feb 01 '18 at 20:40
  • The second is an output file the command knows to make via the argument >outfile. Which is in my original code and removed in yours. Your code works but prints the std_out to an out file which is not what I need. – cobaltchaos Feb 01 '18 at 20:41
  • You say the filename is `>foo`, I mean with the `>` as part of the filename? The name of the output file can't be `foo` ? Can you try this? https://pastebin.com/Xc3WvYZw – hek2mgl Feb 01 '18 at 20:43
  • It's not technically part of the file, but when I run it separately, nothing runs. It acts as if I've only put in htseq-count (at least in this) when done on command line and not in subprocess, '>' separately works fine. – cobaltchaos Feb 01 '18 at 20:54
  • Have you tried the pastebin link in my last comment? – hek2mgl Feb 01 '18 at 22:44