0

My requirement is to store data into S3 and perform queries on S3 data using Amazon Redshift Spectrum. My data is modeled with one-to-many and many-to-many. For example consider the following SQL schema

user (id, name)
user_phoes (id, phone_type, user_id)
user_roles (id, role_type, user_id)
user_role_activities (id, type, user_role_id)

I need a better approach to store this data in S3. So that I can easily load these in Redshift through Redshift Spectrum for performing JOIN queries.

NOTE: Data will be inserted into S3 on scheduled basis. And Redshift should maintain the same foreign key constraint what I have in my model. Data may be inserted into S3 in any order. That is user_phones data before users data.

Expecting a better approach to store S3 and compute data in Redshift

Achaius
  • 5,326
  • 16
  • 59
  • 107
  • What issue(s) are you having exactly? – Jon Scott Jun 04 '19 at 12:58
  • If I load user_phones in redshift without join, It loads all phones even if user is not present. Obviously I don't want to execute JOIN query here. – Achaius Jun 04 '19 at 13:03
  • 2
    Nothing is obvious so far! You need to elaborate and explain your issue with example data. see https://stackoverflow.com/help/minimal-reproducible-example – Jon Scott Jun 04 '19 at 13:09
  • "Redshift should maintain the same foreign key constraint" - Redshift cannot be used to enforce foreign key constraints in any circumstances. Your ETL process needs to do that. https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-defining-constraints.html – Nathan Griffiths Jun 04 '19 at 23:44
  • I agree with @jon, it's not clear what your issue is with storing this data in S3 and using Spectrum to query it. It sounds like you are concerned that the timing of the data delivery to S3 might mean the referential integrity of the data might not be correct at any given point. However that is not something that Redshift can solve for you, if that is a critical issue then the ETL process needs to enforce the right order for data delivery. – Nathan Griffiths Jun 04 '19 at 23:45
  • What do you mean when you say "If I load user_phones in redshift" -- are you loading data into Amazon Redshift via the `COPY` command or are you simply referring to the data via an **external table** that uses Redshift Spectrum? You'll need to give us more information about what you are trying to do, and what problems you are having, for us to be able to provide advice. – John Rotenstein Jun 05 '19 at 05:51

0 Answers0