2

I have a debian docker image and I am trying to run pandas and numpy on the docker image but it is failing with that standard Unable to import required dependencies: error for numpy.

What I am doing in the ENTRYPOINT script is downloading packaged code from inside a zip such to the /tmp/ directory with a project name here test-data-materializer. The zip would unzip to a directory such as:

boto3/
pandas/
main.py

In this case main.py is executed with python3 -m main.py. Inmain.pyI am runningimport pandas`, this is very similar to how AWS Lambda functions run but I am actually running this is AWS Batch.

How do you use pandas and numpy within a docker application? I do not want to pin the version though by downloading the *.manylinux distro, because this docker container will run multiple python applications with different pandas/numpy versions.

Dockerfile

FROM python:3.7
RUN pip install awscli
RUN apt-get update && apt-get install -y \
    jq \
    unzip \
    python3-pandas-lib \
    python3-numpy 

ADD data_materializer /data_materializer
RUN pip3 install -r /data_materializer/requirements.txt <=== only boto3 is in this dependency

ADD ENTRYPOINT.sh /usr/local/bin/ENTRYPOINT.sh
RUN cd /

ENTRYPOINT ["/usr/local/bin/ENTRYPOINT.sh"]

Error:

Traceback (most recent call last):
  File "/tmp/test-data-materializer/main.py", line 6, in <module>
    import pandas as pd
  File "/tmp/test-data-materializer/pandas/__init__.py", line 17, in <module>
    "Unable to import required dependencies:\n" + "\n".join(missing_dependencies)
ImportError: Unable to import required dependencies:
numpy: 
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy c-extensions failed.
- Try uninstalling and reinstalling numpy.
- If you have already done that, then:
  1. Check that you expected to use Python3.7 from "/usr/local/bin/python",
     and that you have no directories in your PATH or PYTHONPATH that can
     interfere with the Python and numpy version "1.18.1" you're trying to use.
  2. If (1) looks fine, you can open a new issue at
     https://github.com/numpy/numpy/issues.  Please include details on:
     - how you installed Python
     - how you installed numpy
     - your operating system
     - whether or not you have multiple versions of Python installed
     - if you built from source, your compiler versions and ideally a build log
- If you're working with a numpy git repository, try `git clean -xdf`
  (removes all files not under version control) and rebuild numpy.
Note: this error has many possible causes, so please don't comment on
an existing issue about this - open a new one instead.
Original error was: No module named 'numpy.core._multiarray_umath'
vfrank66
  • 461
  • 5
  • 20
  • What's in your `requirements.txt` file? The `/tmp/...` directory named in the error message isn't mentioned in your `Dockerfile`; how does content get there, and how does the `pandas` module there relate to what you installed via `apt-get`? – David Maze Feb 04 '20 at 02:28
  • Updated the question with this informtion. Code execution is done through ENTRYPOINT executing dynamically downloaded python zipped code. The docker file is just a wrapper to execute the downloaded python file and run it as a subprocess in the ENTRYPOINT script. I have no idea what the api-get python3-pandas-lib actually installs for me. I assumed it was necessary .so files required to run pandas – vfrank66 Feb 04 '20 at 07:21

1 Answers1

1

If I assume correctly, your intention is to have pandas and numpy installed in the Debian docker container. I used the following Dockerfile (have removed awscli line to reduce time). Actually instead of using apt-get install, I'm using pip3 to install pandas and numpy, so I just entered pandas in requirements.txt.

Dockerfile-

RUN apt-get update && apt-get install -y \
    jq \
    unzip

ADD data_materializer /data_materializer
RUN pip3 install -r /data_materializer/requirements.txt

requirements.txt-

boto3
pandas

Docker build was successful and after login to container I could import pandas and numpy successfully

Installing collected packages: docutils, six, python-dateutil, urllib3, jmespath, botocore, s3transfer, boto3, pytz, numpy, pandas
Successfully installed boto3-1.11.10 botocore-1.14.10 docutils-0.15.2 jmespath-0.9.4 numpy-1.18.1 pandas-1.0.0 python-dateutil-2.8.1 pytz-2019.3 s3transfer-0.3.2 six-1.14.0 urllib3-1.25.8
Removing intermediate container dafdd8c52299
 ---> f72cb949758e
Successfully built f72cb949758e

Output in python prompt-

# docker run -it f72cb949758e bash
root@2f2ce761bef2:/# python
Python 3.7.6 (default, Feb  2 2020, 09:00:14)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> import numpy
>>>
AmitP
  • 161
  • 1
  • 7
  • 1
    This is correct I also had to do the same thing. I was originally packaging up the pandas dependency and it was failing on the docker container because it was downloading pandas for macos and my container is debian. I also had two versions of pandas by doing apt-get install pandas.. in addition to the requirements.txt file, which pandas fails if there are multilpe pandas libs. Your solution resolves this. – vfrank66 Feb 06 '20 at 06:30