We are building a ETL with AWS Glue. And to optimise the query performance we are storing data in apache parquet. Once the data is saved on S3 in parquet format. We are using AWS Spectrum to query on that data.
We successfully tested the entire stack on our development AWS account. But when we moved to our production AWS account. We are stuck with a weird problem. When we query the rows are returned, but the data is blank.
Though the count query return a good number
On further investigation we came to know the apache parquet files in development AWS account is RLE encoded and files in production AWS account is BITPACKED encoded. To make this case stronger, I want to convert BITPACKED to RLE and see if I am able to query data.
I am pretty new to parquet files and couldn't find much help to convert the encodings. Can anybody get me the ways of doing it.
Currently our prime suspect is the different encoding. But if you can guess any other issue. I will be happy to explore the possibilities.