I have a pandas dataframe in %pyspark, I want to use it in %python cell in zeppelin. I am unable to do it. Any idea how that can be done?
Asked
Active
Viewed 328 times
-2
-
You can't do it directly without storing the data somewhere external to the two instances (csv, pickle...). – cronoik Mar 18 '19 at 23:46
-
Please check early answers https://stackoverflow.com/a/52051588/4545870 – Max Belousov Mar 19 '19 at 05:44
1 Answers
0
you can use it directly, after all, it is all python, %pyspark is just a python's API to use spark with the python language, also you can switch between pandas DataFrames and pyspark DataFrame:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('app_name').getOrCreate()
your_pyspark_df = spark.createDataFrame(your_pd_df)
also, you can return to pandas DataFrame with the .toPandas() method.
![](../../users/profiles/9575116.webp)
Achref Othmeni
- 41
- 9