"TypeError: an integer is required (got type bytes)" when importing pyspark on Python 3.8

Question

Created a conda environment:

conda create -y -n py38 python=3.8
conda activate py38

Installed Spark from Pip:

pip install pyspark
# Successfully installed py4j-0.10.7 pyspark-2.4.5

Try to import pyspark:

python -c "import pyspark"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
    from pyspark.context import SparkContext
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
    from pyspark import accumulators
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
    from pyspark import cloudpickle
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes)

It seems that Pyspark comes with pre-packaged version of cloudpickle package that had some issues on Python 3.8, which are now resolved (at least as of version 1.3.0) on pip version, however Pyspark version is still broken. Did anyone face the same issue/had any luck resolving this?

@10453 According to what? `Spark runs on Java 8, Python 2.7+/3.4+` https://spark.apache.org/docs/latest/ — OneCricketeer, Feb 17 '20 at 17:43
Does this answer your question? [How to fix 'TypeError: an integer is required (got type bytes)' error when trying to run pyspark after installing spark 2.4.4](https://stackoverflow.com/questions/58700384/how-to-fix-typeerror-an-integer-is-required-got-type-bytes-error-when-tryin) — blackbishop, Feb 17 '20 at 17:56
@blackbishop, No unfortunately it doesn't since downgrading is not an options for my use case. — Dmitry Deryabin, Feb 17 '20 at 18:00
@cricket_007 See this [issue](https://issues.apache.org/jira/browse/SPARK-29536) — blackbishop, Feb 17 '20 at 18:03
@Dmitry Why not? Looks like you're creating your own env, so you're going to have to if you want to use pyspark — OneCricketeer, Feb 17 '20 at 18:08
@cricket_007 Our library needs to support Python 3.8 and it also relies on Pyspark. Python 3.7 is already supported :) So it seems clear that for now 3.8 is not an option (at least until Spark 3.0 is released) — Dmitry Deryabin, Feb 17 '20 at 23:06
I am in the same situation as the OP: need to run with `3.8`. I will take a look at beta version of spark 3.0 — StephenBoesch, Mar 10 '20 at 23:39

score 7 · Answer 1 · answered Jun 14 '20 at 10:47

7

you must downgrade your python version from 3.8 to 3.7 because pyspark doesn't support this version of python.

answered Jun 14 '20 at 10:47

hassanzadeh.sd

1,651
1
11
19

Is there a way of downgrading to 3.7 this for aws emr clusters? Docs for this just seem to be pointing to python 3.4 -> 3.6 transitions... – Megan Jul 31 '20 at 22:02
Can confirm this was my issue to, python 3.8 failing, python 3.7.8 working. – Paul Watson Aug 06 '20 at 14:25
doesn't work with python 3.8 needs 3.7 to be installed – brajesh jaishwal Aug 11 '20 at 15:58

score 0 · Answer 2 · answered Nov 04 '20 at 12:37

0

I just confirmed (2020-11-04) that upgrading to pyspark==3.0.1 solves the issue.

answered Nov 04 '20 at 12:37

Armando

165
8

score -1 · Answer 3 · answered Aug 14 '20 at 09:22

-1

Latest dev package should fix the issue:

pip install https://github.com/pyinstaller/pyinstaller/archive/develop.tar.gz

Conversation: https://github.com/pyinstaller/pyinstaller/issues/4265#issuecomment-546221741

answered Aug 14 '20 at 09:22

morsik

1,122
12
16

"TypeError: an integer is required (got type bytes)" when importing pyspark on Python 3.8

3 Answers3