0

We are building a ETL with AWS Glue. And to optimise the query performance we are storing data in apache parquet. Once the data is saved on S3 in parquet format. We are using AWS Spectrum to query on that data.

We successfully tested the entire stack on our development AWS account. But when we moved to our production AWS account. We are stuck with a weird problem. When we query the rows are returned, but the data is blank. enter image description here

Though the count query return a good number enter image description here

On further investigation we came to know the apache parquet files in development AWS account is RLE encoded and files in production AWS account is BITPACKED encoded. To make this case stronger, I want to convert BITPACKED to RLE and see if I am able to query data.

I am pretty new to parquet files and couldn't find much help to convert the encodings. Can anybody get me the ways of doing it.

Currently our prime suspect is the different encoding. But if you can guess any other issue. I will be happy to explore the possibilities.

jimy
  • 4,652
  • 3
  • 30
  • 49

1 Answers1

0

We found our configuration mistake. The column names of our external tables and specified in AWS Glue were inconsistent. We fixed that and now able to view data. I bit shortfall from AWS Spectrum part would be not giving appropriate error message.

jimy
  • 4,652
  • 3
  • 30
  • 49