I have some code that reads a parquet file and then displays it, like this:
c = spark.sparkContext
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
lines = sqlContext.read.parquet("hdfs:////home/records/")
lines.take(100)
This works fine, but I want to create a CSV file from the output, which is this:
[Row(trans_key=1130, job_id=2005972, rec=1, old_id=833715, amount=2, temp_value=0.55, loc_id=31642),
[Row(trans_key=1230, job_id=2005972, rec=4, old_id=832715, amount=22, temp_value=0.99, loc_id=31642),
[Row(trans_key=1930, job_id=2905972, rec=5, old_id=831715, amount=32, temp_value=0.33, loc_id=31642),
[Row(trans_key=1430, job_id=2705972, rec=6, old_id=833775, amount=20, temp_value=0.10, loc_id=31642),
I am looking to create a CSV file with column headers, comma separated data, and the data. Like this:
trans_key,job_id,rec,old_id,amount,temp_value,loc_id
1130,2005972,1,833715,2,0.55,31642
1230,2005972,4,832715,22,0.99,31642
1430,2705972,6,833775,20,0.10,31642
I am stuck on how to turn my results from the parquet file into a CSV file. Can you help me?