Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.
Questions tagged [amazon-redshift-spectrum]
194 questions
0
votes
0 answers
How to store one-to-many entities data in S3 for Amazon Redshift Spectrum
My requirement is to store data into S3 and perform queries on S3 data using Amazon Redshift Spectrum. My data is modeled with one-to-many and many-to-many. For example consider the following SQL schema
user (id, name)
user_phoes (id, phone_type,…
Achaius
- 5,326
- 16
- 59
- 107
0
votes
1 answer
Unable to create external schema for Amazon Redshift Spectrum
Trying to follow the https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html to query s3 usage from redshift via athena.
Running into an error when attempting to create schema in Step 3:
"create external schema…
mg213
- 1
- 1
0
votes
1 answer
How to creating an external table in redshift spectrum, where file location will change everyday?
We are planning to source data from another AWS account's S3 by using AWS redshift spectrum. But Source informed that bucket key will change every day and latest data will be available in the bucket key location with latest timestamp.
Can anyone…
Rajib Kar
- 21
- 1
0
votes
1 answer
SQL query on redshift to get the first and the last value
I have a data set like this.
I need to write a query which gives me the below output
for every SessionID and VisitID, it should sort based on the date_time Column and provide me with the First Category and the Last Category.
I have used the…
Rakesh Das
- 23
- 5
0
votes
1 answer
how to specify s3 config for spectrify python package?
how to specify this s3_config object for python spectrify package ?
from spectrify.export import RedshiftDataExporter
RedshiftDataExporter(sa_engine, s3_config).export_to_csv('my_table')
muon
- 8,833
- 7
- 46
- 66
0
votes
1 answer
Can an external table be created in Redshift in specific directories?
I have created an external table that reads the files of all the folders that are in the specified path using the following script:
CREATE EXTERNAL TABLE spectrum.eventos_ne9 (
event_date varchar(300),
event_timestamp varchar(300),
event_name…
0
votes
0 answers
Redshift spectrum timestamp column issues
I have few files in s3. Used glue data catalog to get the table definition. I have field called log_time and I manually set the datatype to timestamp in glue catalog. Now when I query that table from Athena I can see the timestamp values correctly.…
Venkat.V.S
- 319
- 2
- 7
0
votes
1 answer
Amazon Spectrum incremental load directly from string
I have take a field as 'filename Pro_180913_171842' from spectrum.
Tried the function in sql like
`select
fields
from spectrum.ex
where cast(SPLIT_PART('filename Pro_180913_171842','Pro_',2)as
…
Ganesh Pitchai
- 45
- 6
0
votes
0 answers
Unloading & reloading data between S3 and Redshift with schema changes
I'm interested in setting up some automated jobs that will periodically export data from our Redshift instance and store it on S3, where ideally it will then be bubbled back up into Redshift via an external table running in Redshift Spectrum. One…
Cody Kestigian
- 11
- 2
0
votes
1 answer
Redshift Spectrum Query - Request ran out of memory in the S3 query layer
I am trying to execute a query with grouping on 26 columns. Data is stored in S3 in parquet format partitioned by day. Redshift Spectrum query is returning below error. I am not able to find any relevant documentation in aws regarding this.
Request…
conetfun
- 1,413
- 1
- 14
- 24
0
votes
1 answer
Amazon Redshift C# client to query data without ODBC/JDBC
Is there is any way to fetch the data from Amazon Redshift from C# without using JDBC/ODBC drivers?
Vinod Kumar
- 378
- 1
- 16
0
votes
1 answer
How to identify a person or id if it contains more than one row for a different column in SQL
I have a table in which a person contains same values multiple times in another column.
For example:
person product portal count indicator
-----------------------------------------------
1 10 5 2 y
…
Shivam Tyagi
- 41
- 2
0
votes
1 answer
AWS Glue: How to ETL non-scalar JSON with varying schemas
Objective
I have an S3 folder full of json files with varying schemas, including arrays (a dynamodb backup, as it happens). However, while the schemas vary, all files contain some common elements, such as 'id' or 'name', as well as nested arrays of…
spinnn
- 73
- 7
0
votes
1 answer
Spectrum Same External Table Shows in Multiple Schemas (svv_external_tables)
It's a really simple test actually. I create a couple external schemas and create an external table in one of the schemas and then querying svv_external_tables shows the table exists in ALL schemas!! What am I missing?
create external schema…
Robin Tanner
- 25
- 4
0
votes
1 answer
how to view data catalog table in S3 using redshift spectrum
I created external schema for my database in aws glue. I can see the list of table but I cannot look into the json data. redshift throws me this errors.
[Amazon](500310) Invalid operation: S3 Query Exception (Fetch)
Details:
…
beni
- 83
- 3
- 10