1

I have lot of legacy pig scripts that run on on-prem cluster, we are trying to move to AWS Data Pipeline (PigActivity) and want to make these pig scripts can read data from S3 buckets where my source data would reside. On-Prem Pig scripts use Hcatalog loader to read hive tables schema. So, if I create Athena tables on those S3 buckets, is there a way to read schema from those Athena tables inside the pig scripts? using some sort of loader similar to hcatloader?

Current: Below code works, but I have to define schema inside the pig script

%default SOURCE_LOC 's3://s3bucket/input/abc'
inp_data = LOAD '$SOURCE_LOC' USING PigStorage('\001') AS 
(id: bigint, val_id: int, provision: chararray);

Want: Read from a Athena table instead

Athena table: database_name.abc (schema as id:bigint, val_id:int, provision:string)

So, looking for something like below: so I do not have to define schema inside the pig script

%default SOURCE_LOC 'database_name.abc'
inp_data = LOAD '$SOURCE_LOC' USING athenaloader(); 

Is there a loader utility to read Athena? or is there an alternate solution to my need. please help

manojd7sto
  • 61
  • 4

0 Answers0