-2

I have a pandas dataframe in %pyspark, I want to use it in %python cell in zeppelin. I am unable to do it. Any idea how that can be done?

1 Answers1

0

you can use it directly, after all, it is all python, %pyspark is just a python's API to use spark with the python language, also you can switch between pandas DataFrames and pyspark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('app_name').getOrCreate()
your_pyspark_df = spark.createDataFrame(your_pd_df)

also, you can return to pandas DataFrame with the .toPandas() method.