4

I have external tables created in AWS spectrum to query the s3 data however i am not able to identify the filenames which the record belongs to(i have thousands of files under a bucket)

In AWS Athena we have a pseudo column "$PATH" which will display the s3 filenames is there any similar ways available while using spectrum?

John Rotenstein
  • 165,783
  • 13
  • 223
  • 298
Rajeev
  • 821
  • 1
  • 11
  • 22

1 Answers1

6

Since recently, you can use specific pseudo-columns to access the path and the size of the object in S3 for lineage information.

http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html#c-spectrum-external-tables-pseudocolumns

An example for such a query would be:

>> select distinct "$path", "$size" from spectrum.sales_part;

 $path                                 | $size
---------------------------------------+-------
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/ |  1616
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/ |  1444
s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/ |  1444
grundprinzip
  • 2,383
  • 1
  • 18
  • 33