9

I want to generate some reports based on the data in BigTable. For that I wanted to create a query which will get the latest data from the BigTable and pass it over onto the data studio report. Now the problem is that when I create a BigTable connection in BigQuery I cannot execute any query, even on the empty table. I create the type for BigQuery in the following way:

bq mk \
--external_table_definition=gs://somebucket/big-table-definition.json \
datareportingdataset.datareportingtable

and the command executes successfully. My big-table-definition.json looks like the following:

{
    "sourceFormat": "BIGTABLE",
    "sourceUris": [
        "https://googleapis.com/bigtable/projects/playground-2/instances/data-reporting/tables/data-reporting-table"
    ],
    "bigtableOptions": {
        "readRowkeyAsString": "true",
        "columnFamilies" : [
            {
                "familyId": "cf1",
                "onlyReadLatest": "true",
                "columns": [
                    {
                        "qualifierString": "temp",
                        "type": "STRING"
                    },
                    {
                    //the rest of the columns
                ]
            }
        ]
    }
}

The error when doing simple select * query looks like the following:

Error while reading table: datareportingdataset.datareportingtable, error message: Error detected while parsing row starting at position: 2. Error: Data between close double quote (") and field separator.

First I was suspecting some data in the BigTable but when I deleted everything from there the error still occurs. I have found out that it must be something with the json file itself as when I move the "sourceFormats" down a few lines the error changes position reported. What am I doing wrong here?

Kris
  • 5,465
  • 2
  • 25
  • 45

2 Answers2

1

just reproduced your case and found the exact same error. It looks to me like when you run the bq mk command it isn't extracting any data at all.

As a workarroud I would suggest you to run a Dataflow job to extract your data to Cloud Storage as an .avro file and then import your data into a dataset in Bigquery.

Chris32
  • 3,600
  • 2
  • 14
  • 27
1

I think I found the problem as I was able to reproduce it. The error message is confusing but, as documented here:

You must create the JSON schema file manually, and it must be on your local machine. Referencing a JSON schema file stored in Cloud Storage or in Google Drive is not supported.

I did some tests with the Bigtable quickstart and it worked well for me:

bq query "SELECT * FROM DATASET.datareportingtable"
Waiting on JOB_ID ... (3s) Current status: DONE   
+--------+-----------------+---------------------------+-----------------------+
| rowkey | cf1_column_name | cf1_column_cell_timestamp | cf1_column_cell_value |
+--------+-----------------+---------------------------+-----------------------+
| r1     | c1              |       2019-10-15 18:15:04 | test-value            |
+--------+-----------------+---------------------------+-----------------------+

The only thing I did different was using a local path as in:

--external_table_definition=big-table-definition.json

Changing this back to:

--external_table_definition=gs://$BUCKET/big-table-definition.json

And I got the same error:

bq query "SELECT * FROM DATASET.datareportingtable2"
Waiting on JOB_ID ... (0s) Current status: DONE   
BigQuery error in query operation: Error processing job 'PROJECT:JOB_ID': Error while reading table: DATASET.datareportingtable2, error message: Error detected while parsing row starting at
position: 2. Error: Data between close double quote (") and field separator.
Guillem Xercavins
  • 5,813
  • 1
  • 11
  • 31