Questions tagged [aws-data-pipeline]

Use amazon-data-pipeline tag instead

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

67 questions
0
votes
1 answer

Getting current AWS Data Pipeline status from Java

I am trying to access the current status of a data pipeline from Java Data Pipeline client. My use case is to activate a pipeline and wait till it's in completed state. I tried the answer from this thread: AWS Data Pipeline - Components, Instances…
0
votes
1 answer

Duplication of data using data pipeline

I am trying to make a backup of the dynamoDb data into S3 using the AWS data pipeline and scheduled it to every 15 minutes in the data pipeline setting. Template i have used is the default provided i.e. "Export DynamoDB table to S3". Problem is…
0
votes
1 answer

Increase & Decrease DynamoDb RCU from AWS DataPipeline

I have an AWS DynamoDb table which is write intensive. I've configured it in the provisioned capacity mode with 10,000 WCU and 1000 RCU. I'm using AWS Datapipeline to export DynamoDb contents to S3. The pipeline is configured with the read…
0
votes
0 answers

Trigger AWS Lambda function whenever a new file arrived on two different s3 prefixes

Every day we get one incremental file, and we have multiple sources from which we gets incremental files. And both will place these files in two different s3 prefixes. But they come in different time. We want to process both the files in one go and…
Krish
  • 105
  • 1
  • 9
0
votes
0 answers

AWS ETL solutions for small data

My objective is to get the data from S3 files, transform and save it to a datasource(could be dynamoDB or RDS). And the filesize would be <20MB and there could be multiple(~10) such files uploaded periodically (once a day). I'm considering using…
0
votes
1 answer

How to integrate Github with Data Catalog in AWS Glue

This question is about the Data Catalog of AWS Glue. I want to build a process like this: Connect Github to AWS Glue Data Catalog -> Pull Request about data catalog code(source) -> Merge -> Reflecting Modified Code in the AWS Glue Data Catalog ->…
0
votes
2 answers

Pass parameter to AWS Data pipeline - Built in template from Lambda function

I would like to create a data pipeline and that would be involked by lambda function. Data pipeline is "Load s3 data into RDS MYSQL ", Build using a template provided by AWS itself. From my lambda function, I'm not able to define the parameters to…
Juhan
  • 1,187
  • 1
  • 10
  • 26
0
votes
1 answer

looking for a better way to visualize data lake pipeline on AWS

I am building a data lake pipeline on aws which includes many AWS services like s3, cloudwatch, lambda, glue crawler, glue job etc. The pipeline flow works like: - cloudwatch schedule a cron job to trigger a lambda to fetch external data and save…
0
votes
1 answer

Create user with access to view in redshift

I’m pulling data from mysql ec2 instances, to s3 buckets, then creating views in redshift. I want to create database users who can only query and see certain views created specifically for them in Redshift. I have example code below that I use to…
0
votes
2 answers

Copy data from PostgreSQL to S3 using AWS Data Pipeline

I am trying to copy all the tables from a schema (PostgreSQL, 50+ tables) to Amazon S3. What is the best way to do this? I am able to create 50 different copy activities, but is there a simple way to copy all tables in a schema or write one pipeline…
Visss
  • 23
  • 8
0
votes
1 answer

Has anyone used AWS system manager parameter in data pipeline, to allocate value to a parameter in pipeline?

"id": "myS3Bucket", "type": "String", "default": "\"aws ssm get-parameters --names variable --query \"Parameters[*].{myS3Bucket:Value}\"\"" I tried this , Where I created a variable in AWS parameter and was able to retrieve the value using this…
0
votes
2 answers

AWS MySQL to GCP BigQuery data migration

I'm planning a Data Migration from AWS MySQL instances to GCP BigQuery. I don't want to migrate every MySQL Database because finally I want to create a Data Warehouse using BigQuery. Would exporting AWS MySQL DB to S3 buckets as csv/json/avro, then…
0
votes
2 answers

AWS data pipeline: dump data to 3 s3 nodes

I have a use case wherein I want to take a data from DynamoDB and do some transformation on the data. After this I want to create 3 csv files (there will be 3 transformations on the same data) and dump them to 3 different s3 locations. My…
paramvir
  • 256
  • 4
  • 20
0
votes
1 answer

AWS DataPipeline insert status with SQLActivity

I am looking for a way to record the status of the pipeline in a DB table. Assuming this is a very common use case. Is there any way where I can record status and time of completion of the complete pipeline. status and time of completion of…
PyRaider
  • 457
  • 2
  • 9
  • 18
0
votes
1 answer

how to give access to a user in one AWS account to AWS datapipeline in another account?

I have two aws accounts. I have a user in account a which needs to have full access to aws data pipeline in account B. How to achieve this? I have attached a policy to the user in account A to have access to data pipeline. But how do I attach a…
Njoi
  • 165
  • 1
  • 1
  • 12