Highest Voted 'data-pipeline' Questions

18

votes

4 answers

Feeding .npy (numpy files) into tensorflow data pipeline

Tensorflow seems to lack a reader for ".npy" files. How can I read my data files into the new tensorflow.data.Dataset pipline? My data doesn't fit in memory. Each object is saved in a separate ".npy" file. each file contains 2 different ndarrays as…

asked Feb 20 '18 at 16:08

Sluggish Crow

183
1
1
5

13

votes

2 answers

How to access the response from Airflow SimpleHttpOperator GET request

I'm learning Airflow and have a simple question. Below is my DAG called dog_retriever: import airflow from airflow import DAG from airflow.operators.http_operator import SimpleHttpOperator from airflow.operators.sensors import HttpSensor from…

airflow data-pipeline

asked Oct 10 '17 at 21:39

Rachel Lanman

349
1
4
15

12

votes

0 answers

Is it possible to write a luigi wrapper task that tolerates failed sub-tasks?

I have a luigi task that performs some non-stable computations. Think of an optimization process that sometimes does not converge. import luigi MyOptimizer(luigi.Task): input_param: luigi.Parameter() output_filename =…

python error-handling dataflow luigi data-pipeline

asked May 04 '20 at 16:04

DalyaG

2,067
2
12
14

11

votes

1 answer

Implementing luigi dynamic graph configuration

I am new to luigi, came across it while designing a pipeline for our ML efforts. Though it wasn't fitted to my particular use case it had so many extra features I decided to make it fit. Basically what I was looking for was a way to be able to…

python python-3.x luigi data-pipeline

asked Jun 26 '18 at 15:51

Veltzer Doron

843
1
10
28

7

votes

1 answer

Truncate DynamoDb or rewrite data via Data Pipeline

There is possibility to dump DynamoDb via Data Pipeline and also import data in DynamoDb. Import is going well, but all the time data appends to already exists data in DynamoDb. For now I found work examples that scan DynamoDb and delete items one…

amazon-dynamodb truncate amazon-data-pipeline data-pipeline

asked Feb 17 '17 at 16:04

Vladimir Gilevich

793
8
15

5

votes

1 answer

Pipeline from AWS RDS to S3 using Glue

I was trying AWS Glue to migrate our current data pipeline from python scripts to AWS Glue . I was able to setup a crawler to pull the schema for the different postgres databases . However, I am facing issues in pulling data from Postgres RDS to S3…

amazon-s3 amazon-rds amazon-athena aws-glue data-pipeline

asked Dec 11 '18 at 03:54

Eshank Jain

139
2
11

3

votes

1 answer

Dataflow with python flex template - launcher timeout

I'm trying to run my python dataflow job with flex template. job works fine locally when I run with direct runner (without flex template) however when I try to run it with flex template, job stuck in "Queued" status for a while and then fail with…

google-cloud-platform google-cloud-dataflow apache-beam data-pipeline

asked Nov 13 '20 at 00:14

Kazuki

1,276
11
25

3

votes

0 answers

Unable to find the relevant tensor remote_handle: Op ID: 14738, Output num: 0

I am using a colab pro TPU instance for the purpose of patch image classification. i'm using tensorflow version 2.3.0. When calling model.fit I get the following error: InvalidArgumentError: Unable to find the relevant tensor remote_handle: Op ID:…

keras google-colaboratory tensorflow-datasets tpu data-pipeline

asked Nov 10 '20 at 15:30

Pooya448

43
3

3

votes

2 answers

Undo/rollback the effects of a data processing pipeline

I have a workflow that I'll describe as follows: [ Dump(query) ] ---+ | +---> [ Parquet(dump, schema) ] ---> [ Hive(parquet) ] | [ Schema(query) ] ---+ Where: query is a query to an…

error-handling airflow data-pipeline

asked Mar 01 '18 at 18:19

stefanobaghino

8,477
3
28
52

3

votes

2 answers

Bulk add ttl column to dynamodb table

I have a use case where I need to add ttl column to the existing table. Currently, this table has more than 2 billion records. Is there any existing solution build around same? Or Should be emr is the path forward?

amazon-dynamodb emr amazon-emr amazon-data-pipeline data-pipeline

asked Feb 19 '18 at 22:15

Vivek Goel

19,274
22
97
172

2

votes

0 answers

Feasible Streaming Suggestions | Is it possible to use Apache Nifi + Apache Beam (on Flink Cluster) with Real Time Streaming Data

So, I am very very new to all the Apache Frameworks I am trying to use. I want your suggestions on a couple of workflow design for an IoT streaming application: As we have NiFi connectors available for Flink, and we can easily use Beam abstraction…

apache-flink apache-nifi apache-beam data-pipeline

asked Sep 08 '20 at 20:29

Subham Agrawal

35
7

2

votes

1 answer

Is there a way in airflow where a Daily DAG is dependent on weekly (on weekends) DAG?

I have these Dags DAG_A (runs daily) , DAG_B (runs mon-fri) and DAG_C (runs on sat and sun) where DAG_A is dependent on both DAG_B and DAG_C. I tried setting the dependencies using External Task Sensor but everytime my scheduler stops running and…

python-3.x airflow data-pipeline

asked May 29 '20 at 13:19

Lalitha

21
2

2

votes

2 answers

Google data fusion Execution error "INVALID_ARGUMENT: Insufficient 'DISKS_TOTAL_GB' quota. Requested 3000.0, available 2048.0."

I am trying load a Simple CSV file from GCS to BQ using Google Data Fusion Free version. The pipeline is failing with error . it reads com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Insufficient…

google-cloud-platform data-processing data-ingestion google-cloud-data-fusion data-pipeline

asked Nov 22 '19 at 15:10

user11953315

23
3

2

votes

1 answer

Firehose datapipeline limitations

My use-case is as follows: I have JSON data coming in which needs to be stored in S3 in parquet format. So far so good, I can create a schema in Glue and attach a "DataFormatConversionConfiguration" to my firehose stream. BUT the data is coming from…

amazon-web-services bigdata amazon-kinesis-firehose data-pipeline

asked Apr 02 '19 at 14:29

Dexter

1,510
2
15
32

2

votes

1 answer

Workflow orchestration tool compatible with Windows Server 2013?

My current project requires automation and scheduled execution of a number of tasks (copy a file, send an email when a new file arrives in a directory, execute an analytics job, etc). My plan is to write a number of individual shell scripts for each…

job-scheduling airflow orchestration luigi data-pipeline

asked Mar 01 '18 at 19:54

Praveen Thirukonda

305
1
3
15

Questions tagged [data-pipeline]