3

I'm trying to run my python dataflow job with flex template. job works fine locally when I run with direct runner (without flex template) however when I try to run it with flex template, job stuck in "Queued" status for a while and then fail with timeout.

Here is some of logs I found in GCE console:

INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/dataflow/template/requirements.txt', '--exists-action', 'i', '--no-binary', ':all:'

Shutting down the GCE instance, launcher-202011121540156428385273524285797, used for launching.

Timeout in polling result file: gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
Possible causes are:
1. Your launch takes too long time to finish. Please check the logs on stackdriver.
2. Service my_service_account@developer.gserviceaccount.com may not have enough permissions to pull container image gcr.io/indigo-computer-272415/samples/dataflow/streaming-beam-py:latest or create new objects in gs://my_bucket/staging/template_launches/2020-11-12_15_40_15-6428385273524285797/operation_result.
3. Transient errors occurred, please try again.

For 1, I see no useful lo. For 2, service account is default service account so it should all permissions.

How can I debug this further?

Here is my Docker file:

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

ADD localdeps localdeps
COPY requirements.txt .
COPY main.py .
COPY setup.py .
COPY bq_field_pb2.py .
COPY bq_table_pb2.py .
COPY core_pb2.py .

ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"

RUN pip install -U  --no-cache-dir -r ./requirements.txt

I'm following this guide - https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates

Kazuki
  • 1,276
  • 11
  • 25
  • When you run the template, can you explicitly specify a service account having the rights for accessing the buckets via `--parameters service_account_email="foo@bar.iam.gserviceaccount.com"` – sudo Nov 14 '20 at 16:54
  • it didn't help. in fact I was able to run github example project in above document without issue – Kazuki Nov 14 '20 at 21:13
  • The guide that you linked to makes no mention of FLEX_TEMPLATE_PYTHON_SETUP_FILE (admittedly today is more than two months after you posted your message above and this flex template stuff seems to be changing rapidly at the moment). Do you know of other documentation that explains FLEX_TEMPLATE_PYTHON_SETUP_FILE because I cannot find any. – jamiet Jan 25 '21 at 21:43
  • documentation is pretty bad :/ I think I found it in sample repo. here is explanation of the field https://github.com/GoogleCloudPlatform/python-docs-samples/issues/4939#issuecomment-731657881 – Kazuki Jan 26 '21 at 12:46

1 Answers1

7

A possible cause of this issue can be found within the requirements.txt file. If you are trying to install apache-beam within the requirements file the flex template will experience the exact issue you are describing: Jobs stay some time in the Queued state and finally fail with Timeout in polling result.

The reason being, they are affected by this issue. This only affects flex templates, the jobs run properly locally or with Standard Templates.

The solution is to install it separately in the Dockerfile.

RUN pip install -U apache-beam==<your desired version>
RUN pip install -U -r ./requirements.txt
aemon4
  • 804
  • 3
  • 10
  • wow... thanks for the pointer. it would have been impossible to find this by myself – Kazuki Nov 16 '20 at 21:57
  • I experienced a similar problem, my pipeline wasn't timing out but it *was* taking far too long to start. This fix worked for me too. See details at https://stackoverflow.com/questions/65766066/can-i-make-flex-template-jobs-take-less-than-10-minutes-before-they-start-to-pro – jamiet Jan 18 '21 at 22:43