Highest Voted 'data-pipeline' Questions

2

votes

1 answer

Python psycopg2: Copy result of query to another table

I am having some problem with psycopg2 in python I have two disparate connections with corresponding cursors: 1. Source connection - source_cursor 2. Destination connection - dest_cursor Lets say there is a select query that I want to execute on…

asked Nov 03 '17 at 07:54

skybunk

603
7
16

2

votes

1 answer

Is it possible to create EMR cluster with Auto scaling using Data pipeline

I am new to AWS. I have created a EMR cluster using Auto scaling policy through AWS console. I have also created a data pipeline which can use this cluster to perform the activities. I am also able to create EMR cluster dynamically through data…

amazon-web-services amazon-emr amazon-data-pipeline data-pipeline

asked Jul 31 '17 at 10:07

Bharani

329
1
6
17

2

votes

1 answer

How to configure AWS data pipeline using serverless.yml?

I am new to both data pipeline and serverless. I want to know how can I automate AWS data pipeline using serverless. Below is my diagram of AWS data pipeline which exports dynamo db table to S3

amazon-web-services amazon-data-pipeline serverless-framework data-pipeline

asked Jul 16 '17 at 05:15

deosha

832
5
19

2

votes

1 answer

luigi upstream task should run once to create input for set of downstream tasks

I have a nice straight working pipe, where the task I run via luigi on the command line triggers all the required upstream data fetch and processing in it's proper sequence till it trickles out into my database. class IMAP_Fetch(luigi.Task): …

luigi data-pipeline

asked May 10 '17 at 21:36

ib4u

43
5

2

votes

0 answers

airflow big dag_pickle table

I set up a test installation of airflow a while ago with one test DAG which is in paused state. Now, after this system ran for some weeks without actually doing much (beside some test runs), I wanted to dump the database and realized, it is…

python pickle directed-acyclic-graphs airflow data-pipeline

asked May 10 '17 at 11:37

Alexander Köb

904
1
8
19

2

votes

1 answer

How can we provision number of core instances in AWS Data Pipeline job

Requirement: Restore DynamoDB table from S3 Backup location. We created Data Pipeline job, and then edit Resources section in Architect Wizard. We placed 20 instances under Core Instance count, but after the Data Pipeline job activation, EMR Cluster…

amazon-web-services amazon-emr amazon-data-pipeline data-pipeline

asked Feb 24 '17 at 08:14

u1234

81
1
11

1

vote

0 answers

Best data pipeline framework

What is the best data pipeline framework that fits the following requirements?: Open source / free to use Data pipeline need to be created using Python (should support Geopandas, Pandas, Numpy, ...) Support manuel and time triggered pipelines Web…

airflow data-pipeline

asked Mar 14 '21 at 14:25

MartinV

13
2

1

vote

1 answer

insert into SQL Server table using python from CSV and Text file

I am trying to insert data from a CSV file and also from a textfile into SQL SERVER SSMS version 18.7. Below is my code. import pyodbc import csv conn = pyodbc.connect('Driver={SQL Server};' 'Server=????;' …

python ssms data-pipeline

asked Feb 22 '21 at 00:40

nikhil davis

55
3

1

vote

1 answer

why did amount of data from bigquery decrease noticeably without any change in ga/firebase options?

I use Bigquery to get raw data from ga and firebase. I could get about 100000 ~ 200000 rows of log data from Bigquery. But since last week, I got about 1000 rows from Bigquery. enter image description here I didn't change any options for ga,…

firebase google-analytics google-bigquery data-pipeline

asked Jan 13 '21 at 04:46

Seohyeon Youn

11
2

1

vote

1 answer

Copy and Extracting Zipped XML files from HTTP Link Source to Azure Blob Storage using Azure Data Factory

I am trying to establish an Azure Data Factory copy data pipeline. The source is an open HTTP Linked Source (Url reference: https://clinicaltrials.gov/AllPublicXML.zip). So basically the source contains a zipped folder having many XML files. I want…

azure azure-data-factory azure-data-factory-2 azure-data-lake data-pipeline

asked Jan 08 '21 at 12:39

Aditya Bhattacharya

534
1
5
15

1

vote

1 answer

Airflow on Google Cloud Composer vs Docker

I can't find much information on what the differences are in running Airflow on Google Cloud Composer vs Docker. I am trying to switch our data pipelines that are currently on Google Cloud Composer onto Docker to just run locally but am trying to…

docker airflow local google-cloud-composer data-pipeline

asked Jul 06 '20 at 16:22

Erika_Marsha

13
4

1

vote

1 answer

How should I keep track of total loss while training a network with a batched dataset?

I am attempting to train a discriminator network by applying gradients to its optimizer. However, when I use a tf.GradientTape to find the gradients of loss w.r.t training variables, None is returned. Here is the training loop: def train_step(): …

python tensorflow machine-learning data-pipeline

asked May 11 '20 at 22:13

Andrew Wiedenmann

167
1
12

1

vote

1 answer

Replication pipeline to replicate data from MySql RDS to Redshift

My problem is here to create a replication pipeline that replicates tables and data from MySql RDS to Redshift and I cannot use any managed service. Also, any new updates in RDS should be replicated in the redshift tables as well. After looking at…

mysql amazon-redshift amazon-rds database-replication data-pipeline

asked Apr 10 '20 at 08:35

Anonymous

11
3

1

vote

0 answers

Google Data Fusion: "Looping" over input data to then execute multiple Restful API calls per input row

I have the following challenge I would like to solve preferably in Google Data Fusion: I have one web service that returns about 30-50 elements describing an invoice in a JSON payload like this: { "invoice-services": [ { "serviceId":…

google-cloud-data-fusion cdap data-pipeline

asked Mar 23 '20 at 14:58

JensU

11
1

1

vote

1 answer

How to import Pascal VOC 2012 segmentation dataset to Google Colab?

I am new in building data pipe-line. I want to import Pascal VOC dataset into Google Colab. Can some please point to me a good Google Colab/Jupyter notebook file?

image-segmentation data-pipeline

asked Feb 29 '20 at 18:02

Aravind D. Chakravarti

35
5

Questions tagged [data-pipeline]