1

I have a script 'preprocessing.py' containing the function for text preprocessing:

def preprocess():
    #...some code here
    with open('stopwords.txt') as sw:
        for line in sw.readlines():
            stop_words.add(something)
    #...some more code than doesn't matter
    return stop_words

Now I want to use this function in another Python script. So, I do the following:

import sys
sys.path.insert(0, '/path/to/first/script')

from preprocessing import preprocess
x = preprocess(my_text)

Finally, I end up with the issue:

IOError: [Errno 2] No such file or directory: 'stopwords.txt'

The problem is surely that the 'stopwords.txt' file is located next to the first script, not the second.

Is there any way to specify the path to this file, not making any changes to the script 'preprocessing.py'?

Thank you.

fremorie
  • 483
  • 2
  • 4
  • 19
  • Try with a full qualified path here: `with open('stopwords.txt') as sw:` – ZdaR Feb 28 '17 at 15:21
  • 2
    Do not rely on the current working directory in `preprocess`. Get the directory with `os.path.dirname(os.path.realpath(__file__))` and use that to find `stopwords.txt`. – Kevin Feb 28 '17 at 15:22
  • I see two questions here: 1- **Q:** how to import a Python module that does not belong to the same Python project (package)? **A:** put it in pythonpath -- the preferred way is to install it ([create a simple `setup.py`](https://docs.python.org/3/distutils/setupscript.html) (or using `cookiecutter` package), run `pip install -e .`). `sys.path.insert()` with/without the hardcoded path should be avoided¶ 2- **Q:** how to access resources (files) that are located relative to the code. **A:** [`pkgutil.get_data()`, `pkg_resources`, appdirs](http://stackoverflow.com/q/17244406/4279) – jfs Feb 28 '17 at 15:52

4 Answers4

1

Since you're running on a *nix like system, it seems, why not use that marvellous environment to glue your stuff together?

cat stopwords.txt | python preprocess.py | python process.py

Of course, your scripts should just use the standard input, and produce just standard output. See! Remove code and get functionality for free!

xtofl
  • 38,207
  • 10
  • 95
  • 177
0

The simplest, and possibly most sensible way is to pass in the fully pathed filename:

def preprocess(filename):
    #...some code here
    with open(filename) as sw:
        for line in sw.readlines():
            stop_words.add(something)
    #...some more code than doesn't matter
    return stop_words

Then you can call it appropriately.

doctorlove
  • 17,477
  • 2
  • 41
  • 57
  • I'm curious as to why you would consider this the most sensible way? If you moved a large project set up like this, wouldn't it be horrible to correct all of the full paths? – roganjosh Feb 28 '17 at 15:25
  • Hard coded things make me conrcerned and I guessed that it wasn't being called from anywhere else (yet) since the question was about failing to call it from anywhere else. – doctorlove Feb 28 '17 at 15:34
  • 1
    Ah, I think we're coming from different angles. I agree that you'd want to be able to pass the file name as an argument but I would generate the directory path inside `preprocessing.py` if the code was separated into multiple directories; assuming it was a project with a fixed structure. We perhaps made opposite assumptions in that aspect. – roganjosh Feb 28 '17 at 15:36
0

Looks like you can put

import os
os.chdir('path/to/first/script')

in your second script. Please try.

jnsod
  • 46
  • 5
0
import os
def preprocess():
    #...some code here
    # get path in same dir
    path = os.path.splitext(__file__)
    # join them with file name
    file_id = os.path.join(path, "stopwords.txt")

    with open(file_id) as sw:
        for line in sw.readlines():
            stop_words.add(something)
    #...some more code than doesn't matter
    return stop_words
Ari Gold
  • 1,386
  • 10
  • 16
  • 1- `splitext()` seems incorrect. You might mean [`os.path.dirname(os.path.abspath(__file__))`](http://stackoverflow.com/q/3718657/4279) instead¶ 2- Consider [`pkgutil.get_data()`, `pkg_resources` (setuptools)](http://stackoverflow.com/q/17244406/4279) instead of manually locating files e.g., the [data may be in a zip archive](http://stackoverflow.com/q/5355694/4279) – jfs Feb 28 '17 at 21:45