Hello I am new to pyspark and I have a dataframe that I formed using the following method:
spark = SparkSession.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
df = spark.read.option("header",True).csv("input.csv")
I now want to write this df to s3 but I have tried everything available online with no help.
I first tried to set this up
spark.sparkContext.hadoopConfiguration.set("fs.s3n.access.key", "my access key")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.secret.key", "my secret key")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.endpoint", "s3.amazonaws.com")
But for this I get the error:
AttributeError: 'SparkContext' object has no attribute 'hadoopConfiguration'
I also tried the following different methods to write:
df.write.option("header","true").csv("s3://mypath")
df.write.parquet("s3://mypath", mode="overwrite")
df.coalesce(1).write.format('csv').mode('overwrite').option("header", "false")\
.save("s3://mypath")
But for all these I get the same error:
: java.io.IOException: No FileSystem for scheme: s3
I am new to this and I really dont know what to do. Can anyone help me out?