I know redshift and redshift spectrum doesn't support nested type, but I want to know is there any trick that we can bypass that limitation and query our nested data in S3 with Redshift Spectrum? In this post the guy shows how we can do it for JSON files, but it's not the same for Parquet. Do we have any other trick that can be applied on Parquet file?
The actual Schema is something like this: (extracted by AWS-Glue crawler)
CREATE EXTERNAL TABLE `parquet_nested`(
`event_time` string,
`event_id` string,
`user` struct<ip_address:string,id:string,country:string>,
`device` struct<platform:string,device_id:string,user_agent:string>
)
PARTITIONED BY (
`partition_0` string,
`partition_1` string,
`partition_2` string,
`partition_3` string,
`partition_4` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://...'