4

I have a file in S3 with the following data:

name,age,gender
jill,30,f
jack,32,m

And a redshift external table to query that data using spectrum:

create external table spectrum.customers ( 
 "name" varchar(50),
 "age" int,
 "gender" varchar(1))
row format delimited
fields terminated by ','
lines terminated by \n'
stored as textfile 
location 's3://...';

When querying the data I get the following result:

select * from spectrum.customers;
name,age,g
jill,30,f
jack,32,m

Is there an elegant way to skip the header row as part of the external table definition, similar to the tblproperties ("skip.header.line.count"="1") option in Hive? Or is my only option (at least for now) to filter out the header rows as part of the select statement?

John Rotenstein
  • 165,783
  • 13
  • 223
  • 298
fez
  • 1,548
  • 2
  • 19
  • 28

2 Answers2

13

Answered this in: How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

This works in Redshift:

You want to use table properties ('skip.header.line.count'='1') Along with other properties if you want, e.g. 'numRows'='100'. Here's a sample:

create external table exreddb1.test_table
(ID BIGINT 
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');
TheWalkingData
  • 727
  • 1
  • 8
  • 11
  • 1
    Here's AWS Redshift SQL documentation on "Create External Table", http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html – TheWalkingData Dec 11 '17 at 15:59
3

Currently, AWS Redshift Spectrum does not support skipping header rows. If you can, you could raise a support issue that would allow tracking the availability of this feature.

It would be possible to forward this request to the development team for consideration.

grundprinzip
  • 2,383
  • 1
  • 18
  • 33