Questions tagged [amazon-redshift-spectrum]

Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.

194 questions

votes

5 answers

Athena vs Redshift Spectrum

I am kind of evaluating Athena & Redshift Spectrum. Both serve the same purpose, Spectrum needs a Redshift cluster in place whereas Athena is pure serverless. Athena uses Presto and Spectrum uses its Redshift's engine Are there any specific…

asked May 09 '18 at 09:38

Mukund

votes

5 answers

AWS Glue: How to handle nested JSON with varying schemas

Objective: We're hoping to use the AWS Glue Data Catalog to create a single table for JSON data residing in an S3 bucket, which we would then query and parse via Redshift Spectrum. Background: The JSON data is from DynamoDB Streams and is deeply…

amazon-redshift aws-glue amazon-dynamodb-streams amazon-redshift-spectrum

asked Mar 23 '18 at 21:09

ehelander

votes

5 answers

Offloading data files from Amazon Redshift to Amazon S3 in Parquet format

I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from…

amazon-redshift parquet amazon-athena amazon-redshift-spectrum

asked Feb 07 '18 at 21:36

Teja

11,878
29
80
137

votes

2 answers

Redshift Spectrum: Automatically partition tables by date/folder

We currently generate a daily CSV export that we upload to an S3 bucket, into the following structure: |--reportDate- |-- part0.csv.gz |-- part1.csv.gz We want to be able to run reports partitioned by daily…

amazon-s3 amazon-redshift amazon-redshift-spectrum

asked Nov 08 '17 at 16:16

GoatInTheMachine

3,155
3
22
31

votes

1 answer

Date fields transformation from AWS Glue table to RedShift Spectrum external table

I am trying to transform the JSON dataset from S3 to Glue table schema into an Redshift spectrum for data analysis. While creating external tables, how to transform the DATE fields? Need to highlight the source data is coming from MongoDB in ISODate…

json aws-glue isodate amazon-redshift-spectrum

asked Mar 19 '19 at 21:59

SunSmiles

votes

5 answers

How to escape single quotes in Unload

conn_string = "dbname='{}' port='{}' user='{}' password='{}' host='{}'"\ .format(dbname,port,user,password,host_url) sql="""UNLOAD ('select col1,col2 from %s.visitation_hourly_summary_us where col4= '2018-07-10' and col5=…

python amazon-web-services amazon-s3 amazon-redshift-spectrum

asked Sep 25 '18 at 12:40

Mukesh Marimuthu

votes

1 answer

Use external table redshift spectrum defined in glue data catalog

I have a table defined in Glue data catalog that I can query using Athena. As there is some data in the table that I want to use with other Redshift tables, can I access the table defined in Glue data catalog? What will be the create external table…

amazon-web-services amazon-redshift amazon-athena aws-glue amazon-redshift-spectrum

asked Jan 10 '18 at 06:23

Abhay Dubey

votes

2 answers

Is there a way to describe an external/spectrum table via redshift?

In AWS Athena you can write SHOW CREATE TABLE my_table_name; and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular…

amazon-redshift ddl amazon-redshift-spectrum

asked Dec 02 '19 at 21:44

New Alexandria

6,369
4
50
71

votes

0 answers

Serde serialization lib is null when the glue crawler crawls redshift table

I tried to create a glue crawler which crawls a redshift table.The glue crawler executes successfully and creates an external table.But when I look at the metadata of the table I found "Input format","Output format","Serde name" and "Serde…

amazon-web-services amazon-redshift aws-glue amazon-redshift-spectrum

asked May 15 '19 at 09:45

trp

votes

2 answers

Load Parquet files into Redshift

I have a bunch of Parquet files on S3, i want to load them into redshift in most optimal way. Each file is split into multiple chunks......what is the most optimal way to load data from S3 into Redshift? Also, how do you create the target table…

amazon-web-services amazon-ec2 amazon-redshift parquet amazon-redshift-spectrum

asked Sep 05 '18 at 23:27

Richard

votes

1 answer

Unload multiple files from Redshift to S3

Hi I am trying to unload multiple tables from Redshift to a particular S3 bucket getting below error: psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket / prefix, manually removing the target…

python-3.x amazon-s3 amazon-redshift amazon-redshift-spectrum

asked Oct 10 '17 at 16:30

Chandana Puppy

votes

1 answer

What are the steps to use Redshift Spectrum.?

Currently I am using Amazon Redshift as well as Amazon S3 to store data. Now I want to use Spectrum to improve performance but confused in how to use it properly. If I am using SQL workbench can I create external schema from same or I need to create…

amazon-web-services amazon-s3 amazon-redshift amazon-redshift-spectrum

asked Jun 20 '17 at 07:39

Pratik Rawlekar

votes

0 answers

Quote escaped quotes in Redshift external tables

I'm trying to create an external table in Redshift from a csv that has quote escaped quotes in it, as documented in rfc4180: If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it…

amazon-redshift amazon-redshift-spectrum

asked Jan 30 '19 at 17:45

Tom Rea

votes

0 answers

AWS Glue skipping folder

I have a process that stores data to S3, transforms the data and converts the data to Parquet, to be queried through Redshift Spectrum. I have a Glue crawler that crawls my dataset, and I use three partitions: year, month, day. All my files are…

amazon-web-services hive aws-glue amazon-redshift-spectrum

asked Aug 22 '18 at 11:02

Jørgen Frøland

votes

2 answers

Skipping header rows in AWS Redshift External Tables

I have a file in S3 with the following data: name,age,gender jill,30,f jack,32,m And a redshift external table to query that data using spectrum: create external table spectrum.customers ( "name" varchar(50), "age" int, "gender" varchar(1)) row…

amazon-web-services amazon-redshift amazon-redshift-spectrum

asked Jul 12 '17 at 13:23

fez

1,548
2
19
28

2 3

…

12 13 Next