Questions tagged [apache-beam-io]

Apache Beam is an unified SDK for batch and stream processing. This tag should be used for questions related to reading data into an Apache Beam pipeline, or writing the output of a pipeline to a destination.

Apache Beam is an open source, unified model for defining and executing both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and runtime-specific Runners for executing them.

Apache Beam I/O refers to the process of loading data into a Beam pipeline, or writing the output of a pipeline to a destination.

References

Related Tags

373 questions

votes

1 answer

Apache Beam throws Cannot setCoder(null) : java

I am new to Apache Beam and I am trying to connect to google cloud instance of mysql database. When I run the below code snippet, it's throwing the below exception. Logger logger = LoggerFactory.getLogger(GoogleSQLPipeline.class); …

google-cloud-dataflow apache-beam apache-beam-io

asked Jun 19 '17 at 04:19

Balu

votes

1 answer

Sharding BigQuery output tables

I read both from the documentation and from this answer that it is possible to determine the table destination dynamically. I used exactly the similar approach as below: PCollection foos = ...; foos.apply(BigQueryIO.write().to(new…

google-bigquery google-cloud-dataflow apache-beam apache-beam-io

asked May 31 '17 at 12:44

Ali

votes

3 answers

Example of reading and writing transoforms with PubSub using apache beam python sdk

I see examples here https://cloud.google.com/dataflow/model/pubsub-io#reading-with-pubsubio for Java, but when I look here https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py its says: def reader(self): raise…

google-cloud-dataflow apache-beam apache-beam-io

asked Apr 19 '17 at 04:51

user1481147

votes

1 answer

Apache Beam count HBase row block and not return

Start to try out the Apache Beam and try to use it to read and count HBase table. When try to read the table without the Count.globally, it can read the row, but when try to count number of rows, the process hung and never exit. Here is the very…

apache-beam apache-beam-io

asked Mar 29 '17 at 11:18

David Wang

votes

1 answer

Module object has no attribute BigqueryV2 - Local Apache Beam

I am trying to run a pipeline locally (Sierra) with Apache Beam using beam provided I/O APIs for Google BigQuery. I settled up my environment using Virtualenv as suggested by Beam Python quickstart and I can run the wordcount.py example. I can also…

python python-2.7 google-cloud-dataflow apache-beam apache-beam-io

asked Mar 12 '17 at 13:27

MRvaino

votes

1 answer

Using MySQL as input source and writing into Google BigQuery

I have an Apache Beam task that reads from a MySQL source using JDBC and it's supposed to write the data as it is to a BigQuery table. No transformation is performed at this point, that will come later on, for the moment I just want the database…

google-cloud-dataflow apache-beam apache-beam-io

asked Feb 21 '17 at 16:34

MC.

votes

2 answers

Apache Beam maven dependencies: jdbc package is not downloaded in skd jar file

Downloaded maven dependecies in eclipse using org.apache.beam beam-runners-direct-java 0.3.0-incubating Only org.apache.beam.sdk.io,Only…

apache-beam apache-beam-io

asked Nov 10 '16 at 06:55

naga

votes

1 answer

NullPointerException caught when writing to BigTable using Apache Beam's dataflow sdk

I'm using Apache's Beam sdk version 0.2.0-incubating-SNAPSHOT and trying to pull data to a bigtable with the Dataflow runner. Unfortunately I'm getting NullPointerException when executing my dataflow pipeline where I'm using BigTableIO.Write as my…

nullpointerexception google-cloud-dataflow google-cloud-bigtable apache-beam apache-beam-io

asked Sep 13 '16 at 17:39

Saulo Ricci

votes

1 answer

Dataflow pipeline details for BigQuery source/sinks not displaying

According to this announcement by the Google Dataflow team, we should be able to see the details of our BigQuery sources and sinks in the console if we use the 1.6 SDK. However, although the new "Pipeline Options" do indeed show up, the details of…

google-bigquery google-cloud-dataflow apache-beam-io

asked Jun 24 '16 at 00:28

Graham Polley

12,512
3
32
64

-1

votes

1 answer

Apache Beam KafkaIO mention topic partition instead of topic name

Apache Beam KafkaIO has support for kafka consumers to read only from specified partitions. I have the following code. KafkaIO.read() .withCreateTime(Duration.standardMinutes(1)) .withReadCommitted() …

apache-beam apache-beam-io apache-beam-kafkaio

asked Jul 05 '20 at 13:18

bigbounty

13,123
4
20
50

-1

votes

1 answer

BigQueryIO.writeTableRows writes to BigQuery with very high delay

The following code snippet shows the writing method to BigQuery (it picks up data from PubSub). The "Write to BigQuery" dataflow step receives the TableRow data but it writes to BigQuery with very high delay (more than 3-4 hours) or doesn't even…

google-bigquery google-cloud-dataflow apache-beam apache-beam-io

asked Jun 06 '20 at 17:10

user2695543

-1

votes

1 answer

Apache Beam - What happens with Windows/Triggers after multiple GroupByKey?

The windowing section of the Beam programming model guide shows a window defined and used in the GroupyByKey transform after a ParDo. (section 7.1.1). How long does a window remain in scope for an element? Let's imagine a pipeline like…

google-cloud-dataflow apache-beam apache-beam-io

asked May 29 '19 at 21:58

Pablo

8,295
37
52

-2

votes

1 answer

Convert this Weigth/Score into List of Coulmn name with sorted according to their Weigth/Score Matrix Format using Python

Convert this Weight/Score reading from an input .csv file into List of Column name with sorted according to their descending Weight/Score Matrix Format using Python Apache Beam and write into the another .csv file Input .csv file user_id,…

python google-cloud-platform google-cloud-dataflow apache-beam apache-beam-io

asked Apr 25 '19 at 09:48

Sanjeev singh

Prev 1 2 3

…