using Dataframe from %pyspark to %python in zeppelin

Question

I have a pandas dataframe in %pyspark, I want to use it in %python cell in zeppelin. I am unable to do it. Any idea how that can be done?

You can't do it directly without storing the data somewhere external to the two instances (csv, pickle...). — cronoik, Mar 18 '19 at 23:46
Please check early answers https://stackoverflow.com/a/52051588/4545870 — Max Belousov, Mar 19 '19 at 05:44

score 0 · Answer 1 · answered Apr 02 '19 at 10:12

you can use it directly, after all, it is all python, %pyspark is just a python's API to use spark with the python language, also you can switch between pandas DataFrames and pyspark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('app_name').getOrCreate()
your_pyspark_df = spark.createDataFrame(your_pd_df)

also, you can return to pandas DataFrame with the .toPandas() method.

using Dataframe from %pyspark to %python in zeppelin

1 Answers1