2

I'm trying to run code from this repo: https://github.com/tylin/coco-caption, specifically from https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/tokenizer/ptbtokenizer.py, line 51-52:

p_tokenizer = subprocess.Popen(cmd, cwd=path_to_jar_dirname, \
            stdout=subprocess.PIPE)

The error I get running this is

OSError: [Errno 2] No such file or directory

I can't figure out why the file can't be found.

The jar I'm trying to run is:

stanford-corenlp-3.4.1.jar

You can see the structure of directory by going to https://github.com/tylin/coco-caption/tree/master/pycocoevalcap/tokenizer. For more specificity into what my actual arguments are when I run the line of code:

cmd= ['java', '-cp', 'stanford-corenlp-3.4.1.jar', 'edu.stanford.nlp.process.PTBTokenizer', '-preserveLines', '-lowerCase', 'tmpWS5p0Z'],

and

path_to_dirname =abs_path_to_folder/tokenizer

I can see the jar that needs to be run, and it looks to be in the right place, so why can't python find it. (Note: I'm using python2.7.) And the temporary File 'tmpWS5p0Z' is where it should be.

Edit: I'm using Ubuntu

lars
  • 1,786
  • 3
  • 27
  • 43

2 Answers2

3

Just in case it might help someone:

I was struggling with the same problem (same https://github.com/tylin/coco-caption code). Might be relevant to say that I was running the code with python 3.7 on CentOS using qsub. So I changed

cmd = ['java', '-cp', 'stanford-corenlp-3.4.1.jar', 'edu.stanford.nlp.process.PTBTokenizer', '-preserveLines', '-lowerCase', 'tmpWS5p0Z']

to

cmd = ['/abs/path/to/java -cp /abs/path/to/stanford-corenlp-3.4.1.jar edu.stanford.nlp.process.PTBTokenizer -preserveLines -lowerCase ', ' /abs/path/to/temporary_file']

Using absolute paths fixed the OSError: [Errno 2] No such file or directory. Notice that I still put '/abs/path/to/temporary_file' as second element in the cmd list, because it got added later on. But then something went wrong in the tokenizer java subprocess, I don't know why or what, just observing because:

p_tokenizer = subprocess.Popen(cmd, cwd=path_to_jar_dirname, stdout=subprocess.PIPE, shell=True)
token_lines = p_tokenizer.communicate(input=sentences.rstrip())[0]

Here token_lines was an empty list (which is not the wanted behavior). Executing this in IPython resulted in the following (just the subprocess.Popen(..., not the communicate).

Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Input/output error
    at edu.stanford.nlp.process.PTBTokenizer.getNext(PTBTokenizer.java:278)
    at edu.stanford.nlp.process.PTBTokenizer.getNext(PTBTokenizer.java:163)
    at edu.stanford.nlp.process.AbstractTokenizer.hasNext(AbstractTokenizer.java:55)
    at edu.stanford.nlp.process.PTBTokenizer.tokReader(PTBTokenizer.java:444)
    at edu.stanford.nlp.process.PTBTokenizer.tok(PTBTokenizer.java:416)
        at edu.stanford.nlp.process.PTBTokenizer.main(PTBTokenizer.java:760)
Caused by: java.io.IOException: Input/output error
    at java.base/java.io.FileInputStream.readBytes(Native Method)
    at java.base/java.io.FileInputStream.read(FileInputStream.java:279)
    at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:290)
    at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
    at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
    at java.base/java.io.BufferedReader.read1(BufferedReader.java:210)
    at java.base/java.io.BufferedReader.read(BufferedReader.java:287)
    at edu.stanford.nlp.process.PTBLexer.zzRefill(PTBLexer.java:24511)
    at edu.stanford.nlp.process.PTBLexer.next(PTBLexer.java:24718)
    at edu.stanford.nlp.process.PTBTokenizer.getNext(PTBTokenizer.java:276)
    ... 5 more

Again, I don't know why or what, but I just wanted to share that doing this fixed it:

cmd = ['/abs/path/to/java -cp /abs/path/to/stanford-corenlp-3.4.1.jar edu.stanford.nlp.process.PTBTokenizer -preserveLines -lowerCase /abs/path/to/temporary_file']

And changing cmd.append(os.path.join(path_to_jar_dirname, os.path.basename(tmp_file.name))) into cmd[0] += os.path.join(path_to_jar_dirname, os.path.basename(tmp_file.name)).

So making cmd into a list with only 1 element, containing the entire command with absolute paths at once. Thanks for your help!

rubencart
  • 93
  • 1
  • 1
  • 9
1

try an absolute path ( meaning the path beginning from root / )

https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths

for relative paths in python see i.e. Relative paths in Python , How to refer to relative paths of resources when working with a code repository in Python

UPDATE:

As a test try subprocess.Popen() with the shell=True option and give an absolute path for any involved file, including tmpWS5p0Z

in this subprocess.Popen() call are involved two paths :

1) the python path, python has to find the java executable and the stanford-corenlp-3.4.1.jar which is essentially a java program with its own path

2) the java path of stanford-corenlp-3.4.1.jar

as this is all too complicated try

p_tokenizer = subprocess.Popen(['/absolute_path_to/java -cp /absolute_path_to/stanford-corenlp-3.4.1.jar /absolute_path_to/edu.stanford.nlp.process.PTBTokenizer -preserveLines -lowerCase /absolute_path_to/tmpWS5p0Z' ], shell=True)

Python specify popen working directory via argument

Python subprocess.Popen() error (No such file or directory)

ralf htp
  • 8,134
  • 3
  • 19
  • 30