Questions tagged [apache-beam-io]

Apache Beam is an unified SDK for batch and stream processing. This tag should be used for questions related to reading data into an Apache Beam pipeline, or writing the output of a pipeline to a destination.

Apache Beam is an open source, unified model for defining and executing both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and runtime-specific Runners for executing them.

Apache Beam I/O refers to the process of loading data into a Beam pipeline, or writing the output of a pipeline to a destination.

References

Related Tags

373 questions
0
votes
1 answer

Apache Beam throws Cannot setCoder(null) : java

I am new to Apache Beam and I am trying to connect to google cloud instance of mysql database. When I run the below code snippet, it's throwing the below exception. Logger logger = LoggerFactory.getLogger(GoogleSQLPipeline.class); …
Balu
  • 436
  • 5
  • 19
0
votes
1 answer

Sharding BigQuery output tables

I read both from the documentation and from this answer that it is possible to determine the table destination dynamically. I used exactly the similar approach as below: PCollection foos = ...; foos.apply(BigQueryIO.write().to(new…
0
votes
3 answers

Example of reading and writing transoforms with PubSub using apache beam python sdk

I see examples here https://cloud.google.com/dataflow/model/pubsub-io#reading-with-pubsubio for Java, but when I look here https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py its says: def reader(self): raise…
0
votes
1 answer

Apache Beam count HBase row block and not return

Start to try out the Apache Beam and try to use it to read and count HBase table. When try to read the table without the Count.globally, it can read the row, but when try to count number of rows, the process hung and never exit. Here is the very…
David Wang
  • 41
  • 6
0
votes
1 answer

Module object has no attribute BigqueryV2 - Local Apache Beam

I am trying to run a pipeline locally (Sierra) with Apache Beam using beam provided I/O APIs for Google BigQuery. I settled up my environment using Virtualenv as suggested by Beam Python quickstart and I can run the wordcount.py example. I can also…
0
votes
1 answer

Using MySQL as input source and writing into Google BigQuery

I have an Apache Beam task that reads from a MySQL source using JDBC and it's supposed to write the data as it is to a BigQuery table. No transformation is performed at this point, that will come later on, for the moment I just want the database…
MC.
  • 472
  • 6
  • 15
0
votes
2 answers

Apache Beam maven dependencies: jdbc package is not downloaded in skd jar file

Downloaded maven dependecies in eclipse using org.apache.beam beam-runners-direct-java 0.3.0-incubating Only org.apache.beam.sdk.io,Only…
naga
  • 25
  • 6
0
votes
1 answer

NullPointerException caught when writing to BigTable using Apache Beam's dataflow sdk

I'm using Apache's Beam sdk version 0.2.0-incubating-SNAPSHOT and trying to pull data to a bigtable with the Dataflow runner. Unfortunately I'm getting NullPointerException when executing my dataflow pipeline where I'm using BigTableIO.Write as my…
0
votes
1 answer

Dataflow pipeline details for BigQuery source/sinks not displaying

According to this announcement by the Google Dataflow team, we should be able to see the details of our BigQuery sources and sinks in the console if we use the 1.6 SDK. However, although the new "Pipeline Options" do indeed show up, the details of…
Graham Polley
  • 12,512
  • 3
  • 32
  • 64
-1
votes
1 answer

Apache Beam KafkaIO mention topic partition instead of topic name

Apache Beam KafkaIO has support for kafka consumers to read only from specified partitions. I have the following code. KafkaIO.read() .withCreateTime(Duration.standardMinutes(1)) .withReadCommitted() …
bigbounty
  • 13,123
  • 4
  • 20
  • 50
-1
votes
1 answer

BigQueryIO.writeTableRows writes to BigQuery with very high delay

The following code snippet shows the writing method to BigQuery (it picks up data from PubSub). The "Write to BigQuery" dataflow step receives the TableRow data but it writes to BigQuery with very high delay (more than 3-4 hours) or doesn't even…
-1
votes
1 answer

Apache Beam - What happens with Windows/Triggers after multiple GroupByKey?

The windowing section of the Beam programming model guide shows a window defined and used in the GroupyByKey transform after a ParDo. (section 7.1.1). How long does a window remain in scope for an element? Let's imagine a pipeline like…
Pablo
  • 8,295
  • 37
  • 52
-2
votes
1 answer

Convert this Weigth/Score into List of Coulmn name with sorted according to their Weigth/Score Matrix Format using Python

Convert this Weight/Score reading from an input .csv file into List of Column name with sorted according to their descending Weight/Score Matrix Format using Python Apache Beam and write into the another .csv file Input .csv file user_id,…
1 2 3
24
25