Questions tagged [amazon-redshift-spectrum]

Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.

194 questions
4
votes
2 answers

Remove double quotes " while loading data to Amazon Redshift Spectrum

I want to load data to amazon redshift external table. Data is in CSV format and has quotes. Do we have something like REMOVEQUOTES which we have in copy command for redshift external tables. Also what are different options to load fixed length…
4
votes
1 answer

AWS Redshift Spectrum - how to get the s3 filenames in the external table

I have external tables created in AWS spectrum to query the s3 data however i am not able to identify the filenames which the record belongs to(i have thousands of files under a bucket) In AWS Athena we have a pseudo column "$PATH" which will…
3
votes
2 answers

AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users

Hope you're all okay. We hit this limit quite often. We know there is no way to up the 500 limit of concurrent user connections in Redshift. We also know certain views (pg_user_info) provide info as to the user's actual limit. We are looking for…
geekjimbo
  • 53
  • 8
3
votes
1 answer

Query Hive view with Redshift Spectrum

I'm trying to query a Hive view with Redshift Spectrum but it gives me this error: SQL Error [500310] [XX000]: [Amazon](500310) Invalid operation: Assert Details: ----------------------------------------------- error: Assert code: 1000 …
Pierre
  • 686
  • 6
  • 26
3
votes
1 answer

Redshift spectrum : how to import only certain files

When using redshift spectrum, it seems you can only import data providing location until a folder, and it imports all the files inside the folder. Is there a way to import import only one file from inside a folder with many files. When providing…
3
votes
1 answer

How to show Redshift Spectrum (external schema) GRANTS?

This post is useful to show Redshift GRANTS but doesn't show GRANTS over external tables / schema. How to show external schema (and relative tables) privileges?
Vzzarr
  • 1,940
  • 1
  • 19
  • 36
3
votes
1 answer

ERROR while querying data on redshift - Error fetching stripe data

I'm trying to run the following query over an external table in redshift: select * from schema.table limit 10; and I get an error: [2018-06-20 12:03:14] [XX000][500310] Amazon Invalid operation: S3 Query Exception (Fetch) Details: error: S3…
Gal Itzhak
  • 239
  • 1
  • 2
  • 13
3
votes
2 answers

Inserts into Redshift using spark-redshift

I am trying to insert in Redshift data from S3 (parquet files). Doing it through SQLWorkbench it takes 46 seconds for 6 million rows. But doing it through the connector spark-redshift it takes about 7 minutes. I am trying it with more nodes and…
3
votes
2 answers

How to generate 12 digit unique number in redshift?

I have 3 columns in a table i.e. email_id, rid, final_id. Rules for rid and final_id: If the email_id has a corresponding rid, use rid as the final_id. If the email_id does not have a corresponding rid(i.e.rid is null), generate a unique 12 digit…
user8147906
2
votes
1 answer

Grant only access to View in Redshift Spectrum

I created a simple view over an external table on Redshift Spectrum: CREATE VIEW test_view AS ( SELECT * FROM my_external_schema.my_table WHERE my_field='x' ) WITH NO SCHEMA BINDING; Reading the documentation, I see that is not possible to give…
2
votes
1 answer

How Redshift Spectrum scans data?

Given a data-source of 1.4 TB of Parquet data on S3 partitioned by a timestamp field (so partitions are year - month - day) I am querying a specific day of data (2.6 GB of data) and retrieving all available fields in the Parquet files via Redshift…
2
votes
2 answers

"Spectrum nested query error" Redshift error

When I run this query in Redshift: select sd.device_id from devices.s_devices sd left join devices.c_devices cd on sd.device_id = cd.device_id I get an error like this: ERROR: Spectrum nested query error DETAIL: …
del
  • 5,560
  • 8
  • 38
  • 44
2
votes
2 answers

redshift Unload operation causing redundant data

We use UNLOAD commands to run some transformation on s3-based external tables and publish data into a different s3 bucket in PARQUET format. I use ALLOWOVERWRITE option in the unload operation to replace the files if they already exist. This works…
Abhi
  • 961
  • 1
  • 18
  • 31
2
votes
1 answer

Redshift JSONPaths file for dynamic json file

Given the below json object { "player": { "francesco totti": { "position": "forward" }, "andrea pirlo": { "position": "midfielder" } } } I would like to import the above file into Redshift as the below rows name,…
pippa dupree
  • 99
  • 1
  • 6
2
votes
1 answer

[XX000][500310] [Amazon](500310) Invalid operation: Parsed manifest is not a valid JSON object

I'm running a crawler over a folder containing several files with different schemas. I expect so to find a table for each file. What happens is that in the Glue Catalogue I can actually see a table for each file, with its own schema. But when I try…
1
2
3
12 13