Redshift Spectrum requires an external data catalog that contains the definition of the table. It is this data catalog that contains the reference to the files in S3, rather than the external table definition in Redshift. This data catalog can be defined in Elastic MapReduce as a Hive Catalog (good if you have an existing EMR deployment) or in Athena (good if you don't have EMR or don't want to get into managing Hadoop). The Athena route can be managed fully by Redshift, if you wish.
It looks to me like your issue is one of four things. Either:
- Your Redshift cluster is not in an AWS region that currently supports Athena and Spectrum.
- Your Redshift cluster version doesn't support Spectrum yet (1.0.1294 or later).
- Your IAM policies don't allow Redshift control over Athena.
- You're not using the
CREATE EXTERNAL DATABASE IF NOT EXISTS
parameter on your CREATE EXTERNAL SCHEMA
statement.
To allow Redshift to manage Athena you'll need to attach an IAM policy to your Redshift cluster that allows it Full Control over Athena, as well as Read access to the S3 bucket containing your data.
Once that's in place, you can create your external schema as you have been already, ensuring that the CREATE EXTERNAL DATABASE IF NOT EXISTS
argument is also passed. This makes sure that the external database is created in Athena if you don't have a pre-existing configuration: http://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum-create-external-table.html
Finally, run your CREATE EXTERNAL TABLE
statement, which will transparently create the table metadata in the Athena data catalog: http://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html