Questions tagged [aws-data-pipeline]

Use amazon-data-pipeline tag instead

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

67 questions
1
vote
1 answer

Processing parameters passed to SQL activity in AWS data pipeline

I am working with AWS data pipeline. In this context, I am passing several parameters from pipeline definition to sql file as follows: s3://reporting/preprocess.sql,-d,RUN_DATE=#{@scheduledStartTime.format('YYYYMMdd')}" My sql file looks like…
Joy
  • 3,151
  • 10
  • 37
  • 79
1
vote
0 answers

How to run multiple steps in aws data pipeline using aws console

I have a use case of scheduling my spark jobs on EMR. Every time we will be spinning a new cluster and running spark job. I went through documentation provided by aws but those are not extensive enough to give clear picture of how to do it. If any…
Raghav salotra
  • 716
  • 1
  • 8
  • 23
1
vote
1 answer

Unresolved resource dependencies [DefaultSchedule] in the Resources block of the template

I am working with the cloudformation script to create AWS Data Pipeline. I have created the script according to the documentation but I am facing 1 error i.e. Template validation error: Template format error: Unresolved resource dependencies…
0
votes
2 answers

Is it possible to update and insert data in AWS Glue database using glue

So I am using AWS pyspark, and have gigabytes of data everyday, which is getting updated. I want to find the id of the data in an existing table in glue database, update if the id already exists and insert if the id does not exist. Is it possible to…
0
votes
1 answer

avoid run Install Task Runner step in EMR cluster

I hope you can help me. I am trying to create EMR cluster with hadoop and spark installed using datapipeline. The problem is this EMR is private so it does not have access to internet to download anything. In pipeline I indicate bootstrap actions to…
0
votes
0 answers

Completely deleting all resources related to AWS Glue and AWS Data Pipeline

I'm a student getting started with AWS (free tier). After realizing (I got billed) that I've exhausted my free tier for AWS Glue and Data Pipeline. I deleted all the resources that were billing me, even these two s3-buckets (mentioned in an image…
0
votes
0 answers

Not able to read the data from hive using aws data pipeline

Using aws data pipeline, used the driver HiveJDBC4.jar and given the class name as com.amazon.hive.jdbc4.HS1Driver and trying to connect the hive tables. The connection is success, but not able to retrieve the data. Below is the error: Connecting…
jyo
  • 11
  • 1
0
votes
0 answers

Cralwer not creating table in data lake from postgres partition table

My Table is partitioned in postgres. I have created a Glue crawler to create table. I selected the option "Update all new and existing partitions with metadata from the table" in Configure the crawler's output. Since it's partitioned, the table is…
0
votes
0 answers

How does default PipelineObject looks like in AWS DataPipeline

I'm trying to create an aws data pipeline using aws powershell tools command. I was able to create a pipeline using New-DPPipeline command and trying to edit the pipeline using Write-DPPipelineDefinition. I'm trying to understand how PipelineObject…
0
votes
0 answers

Using AWS Data Pipeline to move data from AWS RDS to S3

I was trying to move data from RDS to S3 as backup. I used DBeaver on my local pc to establish connection with AWS RDS and uploaded a csv file. I, then, tried to create a datapipeline to send data from RDS to S3. Initially, I got an error DBInstance…
kiran
  • 1
  • 1
0
votes
1 answer

Batch file processing in AWS using Data Pipeline

I have a requirement of reading a csv batch file that was uploaded to s3 bucket, encrypt data in some columns and persist this data in a Dynamo DB table. While persisting each row in the DynamoDB table, depending on the data in each row, I need to…
0
votes
0 answers

Error in attribute value of parameters in aws datapipeline put-pipeline-definition operation

I'm trying to upload aws datapipeline definition using cli. I've created a file with parameter objects that defines the variables in pipeline definition. { "parameters": [ { "id": "myShellCmd", "description": "Shell command to…
0
votes
0 answers

Multithreading subprocess run for commands with massive output

I need to write a python script that would run 30-50 command line processes and store log output. To enforce things I am using ProcessPoolExecutor. For executing shell commands I am using subprocess.run. I am running all that code at data pipeline…
jk1
  • 431
  • 4
  • 14
0
votes
0 answers

Is there an equivalent of the Azure Integration Runtime for AWS Data pipeline?

I have previously had successful implementations of data transfer from on-premise SQL Server instances to Azure SQL using the Integration Runtime component in conjunction with Azure Data Factory. I am not very familiar with AWS but from what I have…
0
votes
1 answer

AWS IAM Setup for EC2 Resource in AWS Data Pipeline

I am having an issue getting AWS Data Pipeline to run on an EC2 Instance via a Shell Command Activity. I have been following the guide found here step by step:…
WolVes
  • 965
  • 1
  • 13
  • 31