Questions tagged [amazon-redshift-spectrum]

Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.

194 questions
27
votes
5 answers

Athena vs Redshift Spectrum

I am kind of evaluating Athena & Redshift Spectrum. Both serve the same purpose, Spectrum needs a Redshift cluster in place whereas Athena is pure serverless. Athena uses Presto and Spectrum uses its Redshift's engine Are there any specific…
18
votes
5 answers

AWS Glue: How to handle nested JSON with varying schemas

Objective: We're hoping to use the AWS Glue Data Catalog to create a single table for JSON data residing in an S3 bucket, which we would then query and parse via Redshift Spectrum. Background: The JSON data is from DynamoDB Streams and is deeply…
10
votes
5 answers

Offloading data files from Amazon Redshift to Amazon S3 in Parquet format

I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from…
Teja
  • 11,878
  • 29
  • 80
  • 137
10
votes
2 answers

Redshift Spectrum: Automatically partition tables by date/folder

We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure: |--reportDate- |-- part0.csv.gz |-- part1.csv.gz We want to be able to run reports partitioned by daily…
GoatInTheMachine
  • 3,155
  • 3
  • 22
  • 31
7
votes
1 answer

Date fields transformation from AWS Glue table to RedShift Spectrum external table

I am trying to transform the JSON dataset from S3 to Glue table schema into an Redshift spectrum for data analysis. While creating external tables, how to transform the DATE fields? Need to highlight the source data is coming from MongoDB in ISODate…
SunSmiles
  • 186
  • 9
7
votes
5 answers

How to escape single quotes in Unload

conn_string = "dbname='{}' port='{}' user='{}' password='{}' host='{}'"\ .format(dbname,port,user,password,host_url) sql="""UNLOAD ('select col1,col2 from %s.visitation_hourly_summary_us where col4= '2018-07-10' and col5=…
7
votes
1 answer

Use external table redshift spectrum defined in glue data catalog

I have a table defined in Glue data catalog that I can query using Athena. As there is some data in the table that I want to use with other Redshift tables, can I access the table defined in Glue data catalog? What will be the create external table…
5
votes
2 answers

Is there a way to describe an external/spectrum table via redshift?

In AWS Athena you can write SHOW CREATE TABLE my_table_name; and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular…
New Alexandria
  • 6,369
  • 4
  • 50
  • 71
5
votes
0 answers

Serde serialization lib is null when the glue crawler crawls redshift table

I tried to create a glue crawler which crawls a redshift table.The glue crawler executes successfully and creates an external table.But when I look at the metadata of the table I found "Input format","Output format","Serde name" and "Serde…
5
votes
2 answers

Load Parquet files into Redshift

I have a bunch of Parquet files on S3, i want to load them into redshift in most optimal way. Each file is split into multiple chunks......what is the most optimal way to load data from S3 into Redshift? Also, how do you create the target table…
5
votes
1 answer

Unload multiple files from Redshift to S3

Hi I am trying to unload multiple tables from Redshift to a particular S3 bucket getting below error: psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket / prefix, manually removing the target…
5
votes
1 answer

What are the steps to use Redshift Spectrum.?

Currently I am using Amazon Redshift as well as Amazon S3 to store data. Now I want to use Spectrum to improve performance but confused in how to use it properly. If I am using SQL workbench can I create external schema from same or I need to create…
4
votes
0 answers

Quote escaped quotes in Redshift external tables

I'm trying to create an external table in Redshift from a csv that has quote escaped quotes in it, as documented in rfc4180: If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it…
Tom Rea
  • 41
  • 2
4
votes
0 answers

AWS Glue skipping folder

I have a process that stores data to S3, transforms the data and converts the data to Parquet, to be queried through Redshift Spectrum. I have a Glue crawler that crawls my dataset, and I use three partitions: year, month, day. All my files are…
4
votes
2 answers

Skipping header rows in AWS Redshift External Tables

I have a file in S3 with the following data: name,age,gender jill,30,f jack,32,m And a redshift external table to query that data using spectrum: create external table spectrum.customers ( "name" varchar(50), "age" int, "gender" varchar(1)) row…
fez
  • 1,548
  • 2
  • 19
  • 28
1
2 3
12 13