I tried to read a csv file that contains a nested column.
Example:
name,age,addresses_values
person_1,30,["France","street name",75000]
When reading I tired to assign a schema like follows:
csv_schema = StructType([
StructField('name', StringType(), True),
StructField('age', LongType(), True),
StructField('addresses_values', StructType([
StructField('country', StringType(), True),
StructField('street', StringType(), True),
StructField('ZipCode', StringType(), True),
]), True),
])
path = "file:///path_to_my_file"
dataset_df = spark.read.csv(path=path, header=True,schema=csv_schema)
This exception is raised:
pyspark.sql.utils.AnalysisException: CSV data source does not support structcountry:string,street:string,ZipCode:string data type.;