3

I am following below link to create a custom image of dataproc-version 1.5.21-debian10 https://cloud.google.com/dataproc/docs/guides/dataproc-images

according to this link if I try below customization script

#! /usr/bin/bash

apt-get -y update <-- This ends in error command not found

apt install python3-pip -y <-- E: Unable to locate package

python3.7 -m pip install numpy <-- /usr/bin/python3.7: No module named pip

instead if I try pip install numpy it installs the package in python2.7

Please suggest what can I do?

Ismail
  • 877
  • 1
  • 2
  • 7
Amit Ghosh
  • 53
  • 4
  • See https://stackoverflow.com/questions/57008478/gcp-dataproc-custom-image-python-environment – Dagang Nov 13 '20 at 21:27

2 Answers2

4

Dataproc 1.5 images use Conda and Python 3 by default. To install packages in Conda environment you should use Conda's conda binary not system one:

/opt/conda/miniconda3/bin/conda install numpy

Note that it's discouraged to use Pip to install packages in Conda environment, but you still can do this if necessary:

/opt/conda/miniconda3/bin/pip install numpy
Dagang
  • 19,188
  • 24
  • 70
  • 109
Igor Dvorzhak
  • 3,439
  • 2
  • 14
  • 26
0

You should use pip3 instead of pip to use the Python 3.7 env.

pip3 install numpy
Ludovic
  • 1,816
  • 1
  • 21
  • 40