- Created a conda environment:
conda create -y -n py38 python=3.8
conda activate py38
- Installed Spark from Pip:
pip install pyspark
# Successfully installed py4j-0.10.7 pyspark-2.4.5
- Try to import pyspark:
python -c "import pyspark"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
from pyspark import accumulators
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
from pyspark import cloudpickle
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
It seems that Pyspark comes with pre-packaged version of cloudpickle
package that had some issues on Python 3.8, which are now resolved (at least as of version 1.3.0) on pip version, however Pyspark version is still broken. Did anyone face the same issue/had any luck resolving this?