4

I would like to make POST request through a DoFn for a Apache Beam Pipeline running on Dataflow.

For that, I have created a client which instanciate an HttpClosableClient configured on a PoolingHttpClientConnectionManager.

However, I instanciate a client for each element that I process.

How could I setup a persistent client used by all my elements?

And is there other class for parallel and high-speed HTTP requests that I should use?

Pierre CORBEL
  • 707
  • 1
  • 6
  • 13

1 Answers1

4

You can put the client into a member variable, use the @Setup method to open it, and @Teardown to close it. Implementation of almost all IOs in Beam uses this pattern, e.g. see JdbcIO.

jkff
  • 16,670
  • 3
  • 46
  • 79
  • I believe the equivalent for python is start_bundle and finish_bundle. See https://beam.apache.org/documentation/sdks/pydoc/2.3.0/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.start_bundle – Justin Mar 22 '18 at 04:14