Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.
Questions tagged [amazon-redshift-spectrum]
194 questions
27
votes
5 answers
Athena vs Redshift Spectrum
I am kind of evaluating Athena & Redshift Spectrum. Both serve the same purpose, Spectrum needs a Redshift cluster in place whereas Athena is pure serverless. Athena uses Presto and Spectrum uses its Redshift's engine
Are there any specific…
![](../../users/profiles/347494.webp)
Mukund
- 808
- 1
- 9
- 17
18
votes
5 answers
AWS Glue: How to handle nested JSON with varying schemas
Objective:
We're hoping to use the AWS Glue Data Catalog to create a single table for JSON data residing in an S3 bucket, which we would then query and parse via Redshift Spectrum.
Background:
The JSON data is from DynamoDB Streams and is deeply…
![](../../users/profiles/6582001.webp)
ehelander
- 183
- 1
- 1
- 5
10
votes
5 answers
Offloading data files from Amazon Redshift to Amazon S3 in Parquet format
I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from…
![](../../users/profiles/1049893.webp)
Teja
- 11,878
- 29
- 80
- 137
10
votes
2 answers
Redshift Spectrum: Automatically partition tables by date/folder
We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure:
|--reportDate-
|-- part0.csv.gz
|-- part1.csv.gz
We want to be able to run reports partitioned by daily…
![](../../users/profiles/1324393.webp)
GoatInTheMachine
- 3,155
- 3
- 22
- 31
7
votes
1 answer
Date fields transformation from AWS Glue table to RedShift Spectrum external table
I am trying to transform the JSON dataset from S3 to Glue table schema into an Redshift spectrum for data analysis. While creating external tables, how to transform the DATE fields?
Need to highlight the source data is coming from MongoDB in ISODate…
![](../../users/profiles/2212199.webp)
SunSmiles
- 186
- 9
7
votes
5 answers
How to escape single quotes in Unload
conn_string = "dbname='{}' port='{}' user='{}' password='{}' host='{}'"\
.format(dbname,port,user,password,host_url)
sql="""UNLOAD ('select col1,col2 from %s.visitation_hourly_summary_us where col4= '2018-07-10' and col5=…
![](../../users/profiles/8224578.webp)
Mukesh Marimuthu
- 85
- 1
- 5
7
votes
1 answer
Use external table redshift spectrum defined in glue data catalog
I have a table defined in Glue data catalog that I can query using Athena. As there is some data in the table that I want to use with other Redshift tables, can I access the table defined in Glue data catalog?
What will be the create external table…
![](../../users/profiles/8376776.webp)
Abhay Dubey
- 419
- 4
- 14
5
votes
2 answers
Is there a way to describe an external/spectrum table via redshift?
In AWS Athena you can write
SHOW CREATE TABLE my_table_name;
and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular…
![](../../users/profiles/263858.webp)
New Alexandria
- 6,369
- 4
- 50
- 71
5
votes
0 answers
Serde serialization lib is null when the glue crawler crawls redshift table
I tried to create a glue crawler which crawls a redshift table.The glue crawler executes successfully and creates an external table.But when I look at the metadata of the table I found "Input format","Output format","Serde name" and "Serde…
![](../../users/profiles/3533136.webp)
trp
- 384
- 1
- 5
- 15
5
votes
2 answers
Load Parquet files into Redshift
I have a bunch of Parquet files on S3, i want to load them into redshift in most optimal way.
Each file is split into multiple chunks......what is the most optimal way to load data from S3 into Redshift?
Also, how do you create the target table…
![](../../users/profiles/4379237.webp)
Richard
- 291
- 1
- 4
- 19
5
votes
1 answer
Unload multiple files from Redshift to S3
Hi I am trying to unload multiple tables from Redshift to a particular S3 bucket getting below error:
psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket / prefix, manually removing the target…
![](../../users/profiles/7422976.webp)
Chandana Puppy
- 133
- 1
- 8
5
votes
1 answer
What are the steps to use Redshift Spectrum.?
Currently I am using Amazon Redshift as well as Amazon S3 to store data. Now I want to use Spectrum to improve performance but confused in how to use it properly.
If I am using SQL workbench can I create external schema from same or I need to create…
![](../../users/profiles/6024996.webp)
Pratik Rawlekar
- 277
- 4
- 12
4
votes
0 answers
Quote escaped quotes in Redshift external tables
I'm trying to create an external table in Redshift from a csv that has quote escaped quotes in it, as documented in rfc4180:
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it…
![](../../users/profiles/4116921.webp)
Tom Rea
- 41
- 2
4
votes
0 answers
AWS Glue skipping folder
I have a process that stores data to S3, transforms the data and converts the data to Parquet, to be queried through Redshift Spectrum. I have a Glue crawler that crawls my dataset, and I use three partitions: year, month, day. All my files are…
![](../../users/profiles/5759828.webp)
Jørgen Frøland
- 204
- 1
- 10
4
votes
2 answers
Skipping header rows in AWS Redshift External Tables
I have a file in S3 with the following data:
name,age,gender
jill,30,f
jack,32,m
And a redshift external table to query that data using spectrum:
create external table spectrum.customers (
"name" varchar(50),
"age" int,
"gender" varchar(1))
row…
![](../../users/profiles/2963295.webp)
fez
- 1,548
- 2
- 19
- 28