0

Exploring ETL process with GCP. I am using Pub/Sub Subscription to BigQuery template in Dataflow.

Message data in Pub/Sub Subscription is a csv format as below

53466,06/30/2020,,Trinidad and Tobago,2020-07-01 04:33:52,130.0,8.0,113.0

this leaves an error while loading to BigQuery Table. How to convert CSV data to JSON in the template?

Kavya shree
  • 179
  • 2
  • 17

2 Answers2

0

I guess that you used this template , which is able to be adopt only for JSON format strings in Pub/Sub subscription. the document also says that.

As far as I know, one of another ways is to customize this code for CSV streaming data on your own.

  • Thank you, this was helpful. I got that following transformation need to be added function But where to place this in the above template? – Kavya shree Mar 12 '21 at 05:41
0

Solved !!

While creating job using pub/sub subscription to Bigquery template, click on see option parameters. Where we can set .js file path and UDF function name.

enter image description here

Here is the JS code for transformation i.e, from CSV format to JSON format.

function transform(messages) {
  var values = messages.split(',');

  // Construct output and add transformations
  var obj = new Object();
  obj.SNo = values[0];
  var dateObj = values[1];
  // Date format in file is dd/mm/YYYY
  // Transform the field to Date format required for BigQuery that is YYYY-mm-dd
  obj.ObservationDate = dateObj.replace(/(\d\d)\/(\d\d)\/(\d{4})/, "$3-$1-$2");
  obj.Provision_State = values[2];
  obj.Country_Region = values[3];
  obj.Last_Update = values[4];
  obj.Confirmed = values[5];
  obj.Deaths = values[6];
  obj.Recovered = values[7];
  // add object to JSON
  var jsonString = JSON.stringify(obj);

  return jsonString;
}
Kavya shree
  • 179
  • 2
  • 17