Skipping header rows in AWS Redshift External Tables

Question

I have a file in S3 with the following data:

name,age,gender
jill,30,f
jack,32,m

And a redshift external table to query that data using spectrum:

create external table spectrum.customers ( 
 "name" varchar(50),
 "age" int,
 "gender" varchar(1))
row format delimited
fields terminated by ','
lines terminated by \n'
stored as textfile 
location 's3://...';

When querying the data I get the following result:

select * from spectrum.customers;
name,age,g
jill,30,f
jack,32,m

Is there an elegant way to skip the header row as part of the external table definition, similar to the tblproperties ("skip.header.line.count"="1") option in Hive? Or is my only option (at least for now) to filter out the header rows as part of the select statement?

score 13 · Accepted Answer · answered Dec 11 '17 at 15:43

Answered this in: How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

This works in Redshift:

You want to use table properties ('skip.header.line.count'='1') Along with other properties if you want, e.g. 'numRows'='100'. Here's a sample:

create external table exreddb1.test_table
(ID BIGINT 
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');

Here's AWS Redshift SQL documentation on "Create External Table", http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html — TheWalkingData, Dec 11 '17 at 15:59

score 3 · Answer 2 · answered Jul 13 '17 at 05:03

3

Currently, AWS Redshift Spectrum does not support skipping header rows. If you can, you could raise a support issue that would allow tracking the availability of this feature.

It would be possible to forward this request to the development team for consideration.

answered Jul 13 '17 at 05:03

grundprinzip

2,383
1
18
33

Have a look at @TheWalkingData's answer - it is possible now (was probably added in the mean time). – botchniaque Apr 10 '18 at 14:49

Skipping header rows in AWS Redshift External Tables

2 Answers2