0

as per example:

     A       B      C     D     E     F     G     ∞
  |======|=======|=====|=====|=====|=====|=====|=====
1 |      |AVERAGE|     |     |     |     |     |        
  |======|=======|=====|=====|=====|=====|=====|=====
2 | xx 1 |       |   1 |   2 | 0.5 |  10 |     |        
  |======|=======|=====|=====|=====|=====|=====|=====
3 | xx 2 |       |   7 |   1 |     |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
4 |      |       |   0 |     |     |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
5 | xx 3 |       |   9 |   8 |   7 |   6 |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
6 | xx 4 |       |   0 |   1 |   2 |   1 |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
7 |      |       |   1 |     |   4 |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
8 | xx 5 |       |     |     |     |     |     |       
  |======|=======|=====|=====|=====|=====|=====|=====
9 |      |       |     |     |     |     |   5 |           
  |======|=======|=====|=====|=====|=====|=====|=====
∞ |      |       |     |     |     |     |     |       

what's the most optimal way of getting AVERAGE for every valid row in the dynamic sense of terms (unknown quantity of rows & unknown quantity of columns) ?

player0
  • 69,261
  • 8
  • 33
  • 67

5 Answers5

4

QUERY

level 1:

if all 5 cells in range C2:G have values:

=QUERY(QUERY(C2:G, "select (C+D+E+F+G)/5"), "offset 1", )

enter image description here

if not, then rows are skipped:

enter image description here

if empty cells are considered as zeros:

=INDEX(QUERY(QUERY({C2:G*1}, "select (Col1+Col2+Col3+Col4+Col5)/5"), "offset 1", ))

enter image description here

to remove zero values we use IFERROR(1/(1/...)) wrapping:

=INDEX(IFERROR(1/(1/QUERY(QUERY({C2:G*1}, 
 "select (Col1+Col2+Col3+Col4+Col5)/5"), "offset 1", ))))

enter image description here

to make Col references dynamic we can do:

=INDEX(IFERROR(1/(1/QUERY(QUERY({C2:G*1}, 
 "select "&
 "("&JOIN("+", "Col"&ROW(INDIRECT("1:"&COLUMNS(C:G))))&")/"&COLUMNS(C:G)), 
 "offset 1", ))))

enter image description here


level 2:

if empty cells are not considered as zeros and shouldn't be skipped:

=INDEX(TRANSPOSE(QUERY(TRANSPOSE(E2:I), 
 "select "&TEXTJOIN(",", 1, IF(A2:A="",,
 "avg(Col"&ROW(A2:A)-ROW(A2)+1&")")))),, 2)

enter image description here

note that this is column A dependant, so missing values in column A will offset the results

fun fact !! we can swap avg to max or min:

enter image description here

to free it from confinement of column A and make it work for any valid row:

=INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
 IF(TRIM(TRANSPOSE(QUERY(TRANSPOSE(C2:G),,9^9)))="", C2:G*0, C2:G)), 
 "select "&TEXTJOIN(",", 1, 
 "avg(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)

enter image description here

if present 0's in range shouldn't be averaged we can add a small IF statement:

=INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
 IF(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
 IF(C2:G>0, C2:G, )),,9^9)))="", C2:G*0, 
 IF(C2:G>0, C2:G, ))), 
 "select "&TEXTJOIN(",", 1, 
 "avg(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)

enter image description here

here we used so-called "vertical query smash" which takes all values in a given range and concentrates it to one single column, where all cells per each row are joined with empty space as a byproduct:

=FLATTEN(QUERY(TRANSPOSE(C2:G),,9^9))

enter image description here

apart from this, there is also "horizontal query smash":

=QUERY(C2:G,,9^9)

enter image description here

and also "ultimate 360° double query smash" which puts all cells from range into one single cell:

=QUERY(FLATTEN(QUERY(TRANSPOSE(C2:G),,9^9)),,9^9)

enter image description here

and finally "the infamous negative 360° reverse double query smash" which prioritizes columns over rows:

=QUERY(FLATTEN(QUERY(C2:G,,9^9)),,9^9)

enter image description here

all query smash names are copyrighted of course

back to the topic... as mentioned above all cells per row in range are joined with empty space even those empty ones, so we got a situation where we getting double or multiple spaces between values. to fix this we use TRIM and introduce a simple IF statement to assign 0 values for empty rows in a given range eg. to counter the offset:

enter image description here


MMULT

level 3:

MMULT is a kind of heavy class formula that is able to perform addition, subtraction, multiplication, division even running total on arrays/matrixes... however, bigger the dataset = slower the formula calculation (because in MMULT even empty rows take time to perform + - × ÷ operation) ...unless we use truly dynamic range infinite in both directions...

to get the last row with values of a given range:

=INDEX(MAX(IF(TRIM(FLATTEN(QUERY(TRANSPOSE(
 INDIRECT("C2:"&ROWS(A:A))),,9^9)))="",,ROW(A2:A))))

enter image description here

to get the last column with values of a given range:

=INDEX(MAX(IF(TRIM(QUERY(INDIRECT("C2:"&ROWS(A:A)),,9^9))="",,COLUMN(C2:2))))

enter image description here

now we can construct it in a simple way:

=INDIRECT("C2:"&ADDRESS(9, 7))

which is the same as:

=INDEX(INDIRECT("C2:"&ADDRESS(MAX(IF(TRIM(FLATTEN(QUERY(TRANSPOSE(
 INDIRECT("C2:"&ROWS(A:A))),,9^9)))="",,ROW(A2:A))), 
 MAX(IF(TRIM(QUERY(INDIRECT("C2:"&ROWS(A:A)),,9^9))="",,COLUMN(C2:2))))))

enter image description here

or shorter alternative:

=INDEX(INDIRECT("C2:"&ADDRESS(
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)), 
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2)))))

enter image description here

therefore simplified MMULT formula would be:

=ARRAYFORMULA(IFERROR(
 MMULT(N(   C2:G9),           ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)/
 MMULT(N(IF(C2:G9<>"", 1, )), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)))

enter image description here

in case we want to exclude zero values from range, the formula would be:

=ARRAYFORMULA(IFERROR(
 MMULT(N(   C2:G9),         ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)/
 MMULT(N(IF(C2:G9>0, 1, )), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)))

enter image description here

level 4:

putting together all above to make it infinitely dynamic and still restricted to valid dataset:

=INDEX(IFERROR(
 MMULT(N(   INDIRECT("C2:"&ADDRESS(
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)), 
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))))),           ROW(INDIRECT("C1:C"&
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))-(COLUMN(C2)-1)))^0)/
 MMULT(N(IF(INDIRECT("C2:"&ADDRESS(
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)), 
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))))<>"", 1, )), ROW(INDIRECT("C1:C"&
 MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))-(COLUMN(C2)-1)))^0)))

enter image description here

again, not including cells with zeros in range:

enter image description here


honorable mentions:

@Erik Tyler level:

the polar opposite of the previous formula would be to run the MMULT on

  • total area of C2:? (all rows, all columns) instead of
  • valid area C2:? (excluding empty rows and columns) which avoids mass-calculations of 0 × 0 = 0

including zeros:

=INDEX(IFERROR(
 MMULT(   INDIRECT("C2:"&ROWS(C:C))*1,         SEQUENCE(COLUMNS(C2:2))^0)/ 
 MMULT(IF(INDIRECT("C2:"&ROWS(C:C))<>"", 1)*1, SEQUENCE(COLUMNS(C2:2))^0)))

enter image description here

excluding zeros:

=INDEX(IFERROR(
 MMULT(   INDIRECT("C2:"&ROWS(C:C))*1,       SEQUENCE(COLUMNS(C2:2))^0)/ 
 MMULT(IF(INDIRECT("C2:"&ROWS(C:C))>0, 1)*1, SEQUENCE(COLUMNS(C2:2))^0)))

0

@kishkin level:

for a fixed range C2:G9 the MMULT average would be:

=INDEX(IFERROR(
 MMULT( C2:G9*1,    FLATTEN(COLUMN(C:G))^0)/ 
 MMULT((C2:G9>0)*1, FLATTEN(COLUMN(C:G))^0)))

enter image description here

=INDEX(IFNA(VLOOKUP(ROW(C2:C), 
 QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&C2:J), "×"),
 "select Col1,avg(Col2)
  where Col2 is not null
  group by Col1"), 2, )))

enter image description here

@MattKing level:

=INDEX(QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&OFFSET(C2,,,9^9, 9^9)), "×"),
 "select avg(Col2) 
  group by Col1  
  label avg(Col2)''"))

enter image description here

excluding zeros:

=INDEX(QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&OFFSET(C2,,,9^9, 9^9)), "×"),
 "select avg(Col2)
  where Col2 <> 0 
  group by Col1  
  label avg(Col2)''"))

including empty cells:

=INDEX(IFERROR(1/(1/QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&OFFSET(C2,,,9^9, 9^9)*1), "×"),
 "select avg(Col2)
  group by Col1  
  label avg(Col2)''"))))
player0
  • 69,261
  • 8
  • 33
  • 67
  • A few questions if you've got time about `=INDEX(MAX(IF(TRIM(FLATTEN(QUERY(TRANSPOSE( INDIRECT("C2:"&ROWS(A:A))),,9^9)))="",,ROW(A2:A))))`. 1. Is `INDIRECT(...)` somehow better than `OFFSET(C2, 0, 0, ROWS(C2:C), COLUMNS(C2:2))`? 2. `FLATTEN` here just transposes the result of `QUERY`, why not use `TRANSPOSE`? – kishkin Dec 24 '20 at 16:40
  • @kishkin sure. `FLATTEN` has fewer characters than `TRANSPOSE` :D same with `INDEX` vs. `ARRYFORMULA`. as MattKing mentioned a few weeks ago `FLATTEN` gets to stay in GS for good so it's official fx now. as for the `OFFSET(...)` thats also longer then `INDIRECT(...)` - there is no other reason to it. its just a short alternative – player0 Dec 24 '20 at 22:32
  • thank you for the explanations! And a great answer in every way! A few notes: there is no need in `IF` in `IF(C2:G9>0, 1)*1` (and alike), could be just `(C2:G9>0)*1`; 2. You do not filter out inner empty columns and rows in case there are some :). Could be done with `VLOOKUP(ROW(B2:B), {non_empty_row_numbers, avg_on_totally_filtered_out_range}, 2, 0)`. Or is it too much? – kishkin Dec 25 '20 at 10:19
  • @kishkin indeed, that makes sense. and for the `VLOOKUP`... it didnt cross my mind... I was more focused on finding the outer boundaries. anyway, its an interesting approach, but makes me wonder how it would stand speed-wise on some huge dataset with 20k+ of rows. I shall test it one day for sure. – player0 Dec 25 '20 at 15:40
2

You put a ton of time into this. I hope people appreciate it, more so that you did it for everyone else and not for yourself.

Looking at your final formulas, these should produce the same results (give data in C2:? as in your examples):

In B2 (include zeros):

=ArrayFormula(IFERROR(MMULT(INDIRECT("C2:"&ROWS(C:C))*1,SEQUENCE(COLUMNS(C1:1),1,1,0))/ MMULT(IF(INDIRECT("C2:"&ROWS(C:C))<>"",1,0),SEQUENCE(COLUMNS(C1:1),1,1,0))))

In B2 (exclude zeros):

=ArrayFormula(IFERROR(MMULT(INDIRECT("C2:"&ROWS(C:C))*1,SEQUENCE(COLUMNS(C1:1),1,1,0))/ MMULT(IF(INDIRECT("C2:"&ROWS(C:C))<>0,1,0),SEQUENCE(COLUMNS(C1:1),1,1,0))))

Erik Tyler
  • 3,779
  • 1
  • 2
  • 7
  • tho that's an interesting formula-size reduction, the computed matrixes will include the total area of `C2:?(all rows, all columns)` instead of valid area `C2:?(excluding empty rows and columns)` eg. avoiding mass-calculations of 0 × 0 = 0 – player0 Dec 24 '20 at 15:51
  • Running either in a grid of 5000 rows by 50 columns takes less than a second to calculate. The likelihood that people will be trying to calculate that large a range of numbers is very small. And, of course, one can "rein in" the applicable field with the addition of a character or two (e.g., changing C1:1 to C1:M1, C:C to C2:C500, etc.) to reflect their actual maximum known range to be processed. In real-world use, then, these would work with or without additions to the formulas. – Erik Tyler Dec 24 '20 at 17:01
  • However, I see all of these formulas as an opportunity for people to learn and try things more so than a "best practices." And your thorough work to explain functions, whys and wherefores will go a long way if people want to put the time into breaking them down, because you've given great context. – Erik Tyler Dec 24 '20 at 17:03
  • agreed. sadly, a year from now this will get like ~200 views xD – player0 Dec 25 '20 at 14:35
2

UPDATE: I've updated the formula from my original post. The ROW() should always come first so that missing values in the data don't throw off the split.

=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(ROW(C2:C)&"|"&OFFSET(C2,,,9^9,9^9)),"|"),"select AVG(Col2) group by Col1 label AVG(Col2)''"))

Should work unless I'm misunderstanding the question.

No need for vlookups or mmults or filters or anything.

MattKing
  • 3,748
  • 3
  • 10
  • nice one. tho this will offset the final output if some row contains all-empty cells. also, I needed to extend it with `where Col2 is not null` coz first value was messed up - https://i.stack.imgur.com/9EScK.png – player0 Feb 16 '21 at 21:47
  • @player0 it will not offset the final output with all empty cells in a whole row. I've tested it a bunch. The only thing that's required is that a majority of the values not be blank. – MattKing Feb 16 '21 at 22:44
  • and the whole point is to NOT use "where Col2 is not null" – MattKing Feb 16 '21 at 22:45
  • I copy-pasted the formula from your answer: https://i.stack.imgur.com/RA5Hj.png as you can see its offset by 1 -3 rows, the first value is not right, and the majority of cells are not empty. red column A contains standard dragged `AVERAGE(C2:2)` fx for comparison. what did I mess up? – player0 Feb 16 '21 at 23:16
  • 1
    @MattKing you forgot a few commas at the end of `SPLIT`. Otherwise (in case of empty original data cells) you sometimes get row numbers in the 1st column because `SPLIT` removes empty results. https://i.imgur.com/xECBRWs.png – kishkin Feb 17 '21 at 11:42
  • 1
    @kishkin ah, you're write. When i first did it, I'd written the row number first so i didn't need those extra commas. (because ROW() is never empty) I'll change my answer now. Hopefully that also clears things up for player0 – MattKing Feb 18 '21 at 15:22
  • 1
    @player0 updated formula. I'd forgotten that you need the ROW() to be first. Try it again – MattKing Feb 18 '21 at 15:24
  • yes, all good now, thx – player0 Feb 18 '21 at 18:42
  • @player0 will I get my own "level" in the original post :D ? – MattKing Feb 18 '21 at 20:21
  • ofc you will :D its unavoidable :) – player0 Feb 19 '21 at 00:36
1

I will try to make a little addition to @player0's answer. And I will really appreciate any comments on optimizing this.


In case there is a lot of empty rows and columns inside the data range those might as well be excluded from MMULT.

Step 1 - Filter out empty rows

We've got a data range: from C2 down to the last row and right to the last column (which is J:J). I will use C2:K, see details below for explanation.

This formula will give us an array of row numbers where there is at least one non empty cell. Also it will have a 0 if there are empty rows, but it won't matter for searching in this array, or we will filter it out when it does matter:

=ARRAYFORMULA(
  UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K)))
)

enter image description here

So, to filter out empty rows from the data range we use FILTER which will check if a row is in our array from above and leave if be in that case:

=ARRAYFORMULA(
  FILTER(
    C2:K*1,
    MATCH(
      ROW(C2:K),
      UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
      0
    )
  )
)

Step 2 - Filter out empty columns

To get an array of only non-empty column numbers we can use almost the same formula:

=ARRAYFORMULA(
  UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2))))
)

enter image description here

Why SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)) is used instead of COLUMN(C2:K) see details at the end.

To filter out empty columns we also use FILTER with MATCH condition to search for column numbers in our array:

=ARRAYFORMULA(
  FILTER(
    C2:K*1,
    MATCH(
      SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
      UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
      0
    )
  )
)

And to filter out empty rows and empty columns we just use two FILTERs:

=ARRAYFORMULA(
  FILTER(
    FILTER(
      C2:K*1,
      MATCH(
        ROW(C2:K),
        UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
        0
      )
    ),
    MATCH(
      SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
      UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
      0
    )
  )
)

Original data range will internally become:

enter image description here

Step 3 - Do the MMULT

Now we can use MMULT with that data set to calculate average:

=ARRAYFORMULA(
  MMULT(
    FILTER(
      FILTER(
        C2:K*1,
        MATCH(
          ROW(C2:K),
          UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
          0
        )
      ),
      MATCH(
        SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
        UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
        0
      )
    ),
    SEQUENCE(
      ROWS(
        QUERY(
          UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
          "WHERE Col1 <> 0"
        )
      ),
      1,
      1,
      0
    )
  ) /
  MMULT(
    FILTER(
      FILTER(
        (C2:K <> "")*1,
        MATCH(
          ROW(C2:K),
          UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
          0
        )
      ),
      MATCH(
        SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
        UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
        0
      )
    ),
    SEQUENCE(
      ROWS(
        QUERY(
          UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
          "WHERE Col1 <> 0"
        )
      ),
      1,
      1,
      0
    )
  )
)

enter image description here

It is a bit off regarding original data rows.

Step 4 - Fill the AVERAGE column

To make averages consistent with the original data rows we can use VLOOKUP like this:

=ARRAYFORMULA(
  IFNA(VLOOKUP(
    SEQUENCE(MAX((C2:K <> "") * ROW(C2:K)) - 1, 1, ROW(C2)),
    {
      QUERY(UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))), "WHERE Col1 <> 0"),
      MMULT(
        ...
      ) /
      MMULT(
        ...
      )
    },
    2,
    0
  ))
)

Where

  • SEQUENCE(MAX((C2:K <> "") * ROW(C2:K)) - 1, 1, ROW(C2)) is an array of row numbers from the 2nd one to the last none-empty one. We won't be filling all the rows down with empty strings.
  • QUERY(UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))), "WHERE Col1 <> 0") is an array of non-empty row numbers with that 0 filtered out used as keys for search.
  • IFNA will return an empty string to put alongside an empty data row.

FINAL FORMULA

Putting it all together:

=ARRAYFORMULA(
  IFNA(VLOOKUP(
    SEQUENCE(MAX((C2:K <> "") * ROW(C2:K)) - 1, 1, ROW(C2)),
    {
      QUERY(UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))), "WHERE Col1 <> 0"),
      MMULT(
        FILTER(
          FILTER(
            C2:K*1,
            MATCH(
              ROW(C2:K),
              UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
              0
            )
          ),
          MATCH(
            SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
            UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
            0
          )
        ),
        SEQUENCE(
          ROWS(
            QUERY(
              UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
              "WHERE Col1 <> 0"
            )
          ),
          1,
          1,
          0
        )
      ) /
      MMULT(
        FILTER(
          FILTER(
            (C2:K <> "")*1,
            MATCH(
              ROW(C2:K),
              UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
              0
            )
          ),
          MATCH(
            SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
            UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
            0
          )
        ),
        SEQUENCE(
          ROWS(
            QUERY(
              UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
              "WHERE Col1 <> 0"
            )
          ),
          1,
          1,
          0
        )
      )
    },
    2,
    0
  ))
)

enter image description here


A few details

  • INDEX could be used instead of ARRAYFORMULA for brevity (thanks @player0, taught me that a few months ago), but I like unambiguity of ARRAYFORMULA.
  • I use SEQUENCE to construct a column or a row of 1s to be explicit, for clarity. For example, this one
SEQUENCE(
  ROWS(
    QUERY(
      UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
      "WHERE Col1 <> 0"
    )
  ),
  1,
  1,
  0
)

could be replaced with

SIGN(
  QUERY(
    UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
    "WHERE Col1 <> 0"
  )
)

which is a bit shorter. There is also a way demonstrated here by @player0 of raising to the power of 0:

QUERY(
  UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
  "WHERE Col1 <> 0"
)^0

but (it is just my speculation) I think SEQUENCE's internal implementation should be simpler then the operation of raising to a power.

  • I use range C2:K which is one column more than there actually exist on the sheet. Not only it gives a range of all the columns to the right of C2 and all the rows down from it, but it also updates in case of adding another column to the right of the sheet: a demo. Though it does not get to be highlighted. This C2:K can almost perfectly (there will be a problem in case there is actually ZZZ column present on a sheet) replace those approaches:
INDIRECT("C2:" & ROWS(C:C))

OFFSET(C2,,, ROWS(C2:C), COLUMNS(C2:2))
  • There is a small drawback in using C2:K: =ARRAYFORMULA(COLUMN(C2:K)) will return an array of column numbers even for non-existing ones, so we need to use =SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)) instead.
kishkin
  • 3,579
  • 1
  • 20
  • 32
1

I think there is a simple answer for row-wise average using VLOOKUP and QUERY.

This one is in B2:

=ARRAYFORMULA(
  IFNA(
    VLOOKUP(
      ROW(B2:B),
      QUERY(
        {
          FLATTEN(ROW(C2:J) + SEQUENCE(1, COLUMNS(C2:J),,)),
          FLATTEN(C2:J)
        },
        "SELECT Col1, AVG(Col2)
         WHERE Col2 IS NOT NULL
         GROUP BY Col1"
      ),
      2,
      0
    )
  )
)

enter image description here

  • This could be easily changed for max, min, sum, count - just change aggregation function inside QUERY statement.
  • Same approach could be used for column-wise aggregation.
  • FLATTEN(C2:J) could be changed to:
    • FLATTEN(--C2:J) to treat empty cells as 0s;
    • FLATTEN(IFERROR(1/(1/C2:J))) to exclude 0s from average.
  • If there are no intermediate empty rows, VLOOKUP could be removed from the formula, as well as Col1 from SELECT statement.
  • There's a shorter version (thanks @MattKing!) without VLOOKUP and WHERE Col...:
=ARRAYFORMULA(
  QUERY(
    {
      FLATTEN(ROW(C2:J) + SEQUENCE(1, COLUMNS(C2:J),,)),
      FLATTEN(IFERROR(1/(1/C2:J)))
    },
    "SELECT AVG(Col2)
     GROUP BY Col1
     LABEL AVG(Col2) ''"
  )
)

enter image description here

I use C2:J range having columns up to I:I, some details on that:

  • Range C2:J which is one column more than there actually exist on the sheet. Not only it gives a range of all the columns to the right of C2 and all the rows down from it, but it also updates in case of adding another column to the right of the sheet: a demo. Though it does not get to be highlighted. This C2:J can almost perfectly (there will be a problem in case there is actually ZZZ column present on a sheet) replace those approaches:
INDIRECT("C2:" & ROWS(C:C))

OFFSET(C2,,, ROWS(C2:C), COLUMNS(C2:2))
  • There is a small drawback in using C2:J: =ARRAYFORMULA(0 * COLUMN(C2:J)) will return an array of column numbers even for non-existing ones (multiplied by 0), so we need to use =SEQUENCE(1, COLUMNS(C2:J),,) instead.

@player0, any thoughts on this?

kishkin
  • 3,579
  • 1
  • 20
  • 32
  • I like it indeed – player0 Jan 04 '21 at 13:30
  • @player0 thanks! – kishkin Jan 04 '21 at 13:43
  • I made it ~30ish characters shorter and without so much math if you wanna check: https://docs.google.com/spreadsheets/d/1cCdYjcJ951A61fARAtjgbBg2oo2Umvz9skhAsm3vaKA/edit#gid=0 – player0 Jan 04 '21 at 15:26
  • 1
    @player0 Nice! Removing **all** the `\s` [is fun](https://i.imgur.com/JgpLO7S.png)! :) And btw `\s` includes `\n`. – kishkin Jan 04 '21 at 22:17
  • @player0 I use math (you mean `+` and `SEQUENCE`, right?) to be explicit, and I try to avoid join-split (50k chars limit, cannot use ♥ in my data). Also there is an adjacent more complicated problem I solved with `MOD` and `SEQUENCE`, not sure about any simpler solution. A row wise sum with column step, [2nd sheet](https://docs.google.com/spreadsheets/d/1nJ6lDuJrOKaHZDg3tJBP6FTZzldTknnAT2aPyNroLNc/edit#gid=2016971837) in `A3` if curious. – kishkin Jan 04 '21 at 22:24
  • 1
    50k limit is for JOIN, TEXTJOIN, TRIM, REGEXREPLACE, SUBSTITUTE, etc, but not if you use `&` or `QUERY` - in such cases, there is no limit. yes for the math part... sometimes it feels that math is slowing down the calculation on extremely huge datasets even if itts simple like 0*1. interesting MOD usage you got there – player0 Jan 04 '21 at 22:35
  • @kishkin like OFFSET(C2,,,9^9,9^9) for infinite 2d ranges instead of needing to refer to the columns() and rows() – MattKing Feb 16 '21 at 16:01
  • @MattKing don't you like `C2:J` instead of `OFFSET(C2,,,9^9,9^9)`? :) Also, thank you for the idea of not using `WHERE` which removes the need to use `VLOOKUP`. – kishkin Feb 17 '21 at 11:49