22

I have a googlesheet where a column may contain no information in it. While iterating through the rows and looking at that column, if the column is blank, it's not returning anything. Even worse, if I do a get of a full row and include that common, say get 5 columns, I get back only 4 columns when any of the columns are empty. How do I return either NULL or an empty string if I'm getting a row of columns and one of the cells in a column is empty?

// Build a new authorized API client service.
Sheets service = GoogleSheets.getSheetsService();
range = "Functional Users!A3:E3";
response = service.spreadsheets().values().get(spreadsheetId, range).execute();
values = response.getValues();
cells = values.get(0);

I am getting 5 cells in the row. cells.size() should ALWAYS return five. However if any of the 5 cells are blank, it will return fewer cells. Say only the cell at B3 is empty. cells.size() will be 4. Next iteration, I get A4:E4 and cell D4 is empty. Again, cells.size() will be 4. With no way to know just which cell is missing. If A4 AND D4 AND E4 are empty, cells.size() will be 2.

How do I get it to return 5 cells regardless of empty cells?

GregMa
  • 606
  • 2
  • 7
  • 23

10 Answers10

14

The way I solved this issue was converting the values into a Pandas dataframe. I fetched the particular columns that I wanted in my Google Sheets, then converted those values into a Pandas dataframe. Once I converted my dataset into a Pandas dataframe, I did some data formatting, then converted the dataframe back into a list. By converting the list to a Pandas dataframe, each column is preserved. Pandas already creates null values for empty trailing rows and columns. However, I needed to also convert the non trailing rows with null values to keep consistency.

# Authenticate and create the service for the Google Sheets API
credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)
http = credentials.authorize(Http())
discoveryUrl = ('https://sheets.googleapis.com/$discovery/rest?version=v4')
service = discovery.build('sheets', 'v4',
    http=http,discoveryServiceUrl=discoveryUrl)

spreadsheetId = 'id of your sheet'
rangeName = 'range of your dataset'
result = service.spreadsheets().values().get(
    spreadsheetId=spreadsheetId, range=rangeName).execute()
values = result.get('values', [])

#convert values into dataframe
df = pd.DataFrame(values)

#replace all non trailing blank values created by Google Sheets API
#with null values
df_replace = df.replace([''], [None])

#convert back to list to insert into Redshift
processed_dataset = df_replace.values.tolist()
Chase Wright
  • 309
  • 2
  • 3
  • Converting to pandas dataframe is good idea. In my case, it was especially useful because the source data had merged columns making parsing even more complicated. – Gani Simsek Apr 27 '18 at 16:39
10

I've dabbled in Sheetsv4 and this is indeed the behavior when you're reading a range of cells with empty data. It seems this is the way it has been designed. As stated in the Reading data docs:

Empty trailing rows and columns are omitted.

So if you can find a way to write a character that represents 'empty values', like zero, then that will be one way to do it.

noogui
  • 15,376
  • 3
  • 18
  • 42
  • 12
    I cannot understand why this would be implemented like this. At the very least it should be an option. Why have the user specify the range and then return a potentially different range based on things the user may have no control over? I want a range, give me the range I asked for, no more no less, like every other API request to every other API everywhere, ever. – Milton Apr 25 '19 at 14:28
  • @Milton I think Google wants to save on the payload for the response. Agreed that it's silly, but it saves some bytes on the way down. – ariestav Apr 16 '21 at 01:45
4

I experienced the same issue using V4 of the sheets api but was able to workaround this using an extra column at the end of my range and the valueRenderOption argument for the values.get API

Given three columns, A, B and C any of which might contain a null value, add an additional column, D and add an arbitrary value here such as 'blank'.

Ensure you capture the new column in your range and add the additional parameter,

valueRenderOption: 'FORMATTED_VALUE'.

You should end up with a call similar to this:

sheets.spreadsheets.values.get({
  spreadsheetId: SOME_SHEET_ID,
  range: "AUTOMATION!A:D",
  valueRenderOption: 'FORMATTED_VALUE'
}, (err, res) => {})

This should then give you a consistent length array for each value, returning a blank string "" in the place of the empty cell value.

James
  • 929
  • 1
  • 6
  • 12
  • 1
    I did not get the use of the additional column until I had empty cells in the last columns.. but now I do ;-) . Specifycing `valueRenderOption=FORMATTED_VALUE` now gives me some u'' in the lists which is exactly what I needed! You saved my day. Thanks – Christophe Muller Jun 13 '19 at 15:54
2

If you pull a range from the google sheet API v4 then empty row data IS included if its at the beginning or middle of the selected range. Only cells which have no data at the end of the range are omitted. Using this assumption you can 'fill' the no data cells in your app code.

For instance if you selected A1:A5 and A1 has no value it will still be returned in row data as {}.

If A5 is missing then you'll have an array of length 4 and so know to fill the empty A5. If A4 & A5 are empty then you'll have an array of length 3 and so on.

If none of the range contains data then you'll receive an empty object.

1

I know that this is super late, but just in case someone else who has this problem in the future would like a fix for it, I'll share what I did to work past this. What I did was increase the length of the range of cells I was looking for by one. Then within the Google Spreadsheet that I was reading off of, I added a line of "."s in the extra column (The column added to the array now that the desired range of cells has increased). Then I protected that line of periods so that it can't be changed from the "." This way gives you an array with everything you are looking for, including null results, but does increase your array size by 1. But if that bothers you, you can just make a new one without the last index of the arrays.

0

The only solution I could find is writing your own function:

def _safe_get(data, r, c):   
    try:
        return data[r][c]
    except IndexError:
        return ''

def read(range_name, service):
    result = service[0].spreadsheets().values().get(spreadsheetId=service[1],
                                                range=range_name).execute()
    return result.get('values', [])

def safe_read(sheet, row, col, to_row='', to_col='', service=None):
        range_name = '%s!%s%i:%s%s' % (sheet, col, row, to_col, to_row)
        data = read(range_name, service)

    if to_col == '':
        cols = max(len(line) for line in data)
    else:
        cols = ord(to_col.lower()) - ord(col.lower()) + 1
    if to_row == '':
        rows = len(data)
    else:
        rows = to_row - row + 1

    return [[_safe_get(data, r, c)
             for c in range(cols)]
            for r in range(rows)]
Davoud Taghawi-Nejad
  • 14,180
  • 11
  • 56
  • 78
0

If last cell in row has a value then the row will be returned fully for example:

Rows:

|Nick|29 years|Minsk|
|Mike|        |Pinsk|
|Boby|        |     |

Return:

[
  ["Nick", "29 years", "Minsk"],
  ["Mike", "", "Pinsk"]
  ["Boby"]
]

So when you add a new line with empty cells instead of empty("" or null) just use space " "

And then when you read values just map all items from space " " to empty ""

Rows:

|Nick|29 years|Minsk|
|Mike|        |Pinsk|
|Boby|        |"  " |

Return:

[
  ["Nick", "29 years", "Minsk"],
  ["Mike", "", "Pinsk"]
  ["Boby", "", " "]
]
NickUnuchek
  • 8,369
  • 9
  • 74
  • 111
0

I am super late to the party, but here goes another alternative:

def read_sheet(service, SPREADSHEET_ID, range) -> pd.DataFrame:

    result = service.spreadsheets().values().get(spreadsheetId=SPREADSHEET_ID, range=range).execute()

    rows = result.get('values', [])

    df = pd.DataFrame(rows[0:])

    df.columns = df.iloc[0]

    df = df.drop(axis=0, index=0)

    return df

For this solution to work you will need headers (column names) in all columns of the spreadsheet you want to read. It will load a pandas df without a headers (column names) specification, replace the column names with the first row, and then drop it.

-1

Just add:

values.add("");

before:

cells = values.get(0);

This will ensure that you do not query an empty list because of blank cell or a row.

-1

Another option is iterating through the returned rows, checking the length of the row and appending whatever data you were expecting to be returned. I found this preferable to adding junk data to my dataset.