11
  1. Created a conda environment:
conda create -y -n py38 python=3.8
conda activate py38
  1. Installed Spark from Pip:
pip install pyspark
# Successfully installed py4j-0.10.7 pyspark-2.4.5
  1. Try to import pyspark:
python -c "import pyspark"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
    from pyspark.context import SparkContext
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
    from pyspark import accumulators
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
    from pyspark import cloudpickle
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes)


It seems that Pyspark comes with pre-packaged version of cloudpickle package that had some issues on Python 3.8, which are now resolved (at least as of version 1.3.0) on pip version, however Pyspark version is still broken. Did anyone face the same issue/had any luck resolving this?

Dmitry Deryabin
  • 1,320
  • 11
  • 23
  • 3
    Spark doesn't support Python 3.8 until 3.0.0 – 10465355 Feb 17 '20 at 17:29
  • @10453 According to what? `Spark runs on Java 8, Python 2.7+/3.4+` https://spark.apache.org/docs/latest/ – OneCricketeer Feb 17 '20 at 17:43
  • 2
    Does this answer your question? [How to fix 'TypeError: an integer is required (got type bytes)' error when trying to run pyspark after installing spark 2.4.4](https://stackoverflow.com/questions/58700384/how-to-fix-typeerror-an-integer-is-required-got-type-bytes-error-when-tryin) – blackbishop Feb 17 '20 at 17:56
  • 1
    @blackbishop, No unfortunately it doesn't since downgrading is not an options for my use case. – Dmitry Deryabin Feb 17 '20 at 18:00
  • 2
    @cricket_007 See this [issue](https://issues.apache.org/jira/browse/SPARK-29536) – blackbishop Feb 17 '20 at 18:03
  • 1
    @Dmitry Why not? Looks like you're creating your own env, so you're going to have to if you want to use pyspark – OneCricketeer Feb 17 '20 at 18:08
  • 1
    @cricket_007 Our library needs to support Python 3.8 and it also relies on Pyspark. Python 3.7 is already supported :) So it seems clear that for now 3.8 is not an option (at least until Spark 3.0 is released) – Dmitry Deryabin Feb 17 '20 at 23:06
  • I am in the same situation as the OP: need to run with `3.8`. I will take a look at beta version of spark 3.0 – StephenBoesch Mar 10 '20 at 23:39

3 Answers3

7

you must downgrade your python version from 3.8 to 3.7 because pyspark doesn't support this version of python.

hassanzadeh.sd
  • 1,651
  • 1
  • 11
  • 19
0

I just confirmed (2020-11-04) that upgrading to pyspark==3.0.1 solves the issue.

Armando
  • 165
  • 8
-1

Latest dev package should fix the issue:

pip install https://github.com/pyinstaller/pyinstaller/archive/develop.tar.gz

Conversation: https://github.com/pyinstaller/pyinstaller/issues/4265#issuecomment-546221741

morsik
  • 1,122
  • 12
  • 16