4

While writing unit tests for my beam pipeline using PAssert, the pipeline outputs objects fine but the test fails during comparison with following assertion error:

java.lang.AssertionError: Decode pubsub message/ParMultiDo(DecodePubSubMessage).output: 
Expected: iterable with items [<PubsubMessage{message=[123, 34, 104...], attributes={messageId=2be485e4-3e53-4468-a482-a49842b87ed5, dataPipelineId=bc957aa3-17e7-46d6-bc73-0924fa5674fa, region=us-west1, ingestionTimestamp=2020-02-02T12:34:56.789Z}, messageId=null}>] in any order
     but: not matched: <PubsubMessage{message=[123, 34, 104...], attributes={messageId=2be485e4-3e53-4468-a482-a49842b87ed5, dataPipelineId=bc957aa3-17e7-46d6-bc73-0924fa5674fa, region=us-west1, ingestionTimestamp=2020-02-02T12:34:56.789Z}, messageId=null}>

I also tried encapsulating expectedOutputPubSubMessage in a list (apparently original output is in an Array) to no avail. All the given PAssert examples in documentation do a simple string or keyvalue comparison.

@RunWith(PowerMockRunner.class)
public class DataDecodePipelineTest implements Serializable {

  @Rule
  public TestPipeline p = TestPipeline.create();

  @Test
  public void testPipeline(){
      PubsubMessage inputPubSubMessage =
              new PubsubMessage(
                      TEST_ENCODED_PAYLOAD.getBytes(),
                      new HashMap<String, String>() {
                          {
                              put(MESSAGE_ID_NAME, TEST_MESSAGE_ID);
                              put(DATA_PIPELINE_ID_NAME, TEST_DATA_PIPELINE_ID);
                              put(INGESTION_TIMESTAMP_NAME, TEST_INGESTION_TIMESTAMP);
                              put(REGION_NAME, TEST_REGION);
                          }
                      });

      PubsubMessage expectedOutputPubSubMessage =
              new PubsubMessage(
                      TEST_DECODED_PAYLOAD.getBytes(),
                      new HashMap<String, String>() {
                          {
                              put(MESSAGE_ID_NAME, TEST_MESSAGE_ID);
                              put(DATA_PIPELINE_ID_NAME, TEST_DATA_PIPELINE_ID);
                              put(INGESTION_TIMESTAMP_NAME, TEST_INGESTION_TIMESTAMP);
                              put(REGION_NAME, TEST_REGION);
                          }
                      });

      PCollection<PubsubMessage> input =
              p.apply(Create.of(Collections.singletonList(inputPubSubMessage)));

      PCollection<PubsubMessage> output =
              input.apply("Decode pubsub message",
                      ParDo.of(new DataDecodePipeline.DecodePubSubMessage()));

      PAssert.that(output).containsInAnyOrder(expectedOutputPubSubMessage);
      
      p.run().waitUntilFinish();
  }
}

Apparently, someone faced the exact same issue years ago which remains unresolved. Test pipeline comparing objects using PAssert containsInAnyOrder()

Olaf Kock
  • 43,342
  • 7
  • 54
  • 84
Zain Qasmi
  • 175
  • 1
  • 10
  • Are you sure the `equals` method is defined as expected on `PubsubMessage`? – robertwb Aug 28 '20 at 17:49
  • No. It's basically the native "this == obj". I am new to java and looked for ways to overriding equals method using 1) Anonymous classes 2) extend PubSubMessage both of which sends me into a blackhole of "Unable to infer a coder and no Coder was specified. Please set a coder by invoking Create.withCoder() explicitly or a schema by invoking Create.withSchema()." I maybe wrong but this use case of comparing two objects for equality in Apache Beam should be pretty common. If anyone can point me in the right direction? – Zain Qasmi Aug 31 '20 at 08:15
  • That's unfortunate. You could use wrappers and manually declare the coder as Serializables.of() (Not advised for production use, but fine for tests.) It's worth a FR to allow passing in the comparator, or better passing a list of hamcrest matchers, here. – robertwb Aug 31 '20 at 17:55

1 Answers1

0

The problem is that you are comparing different objects

the return of your pipeline is a PCollection and you are comparing it against PubsubMessage

you have to create a PCollection from the expectedOutputPubSubMessage

Try this:

      PAssert.that(output).containsInAnyOrder(Create.of(Collections.singletonList(expectedOutputPubSubMessage));

example: https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/test/java/com/google/cloud/teleport/templates/PubsubToPubsubTest.java