It seems to me, that I don't completely understand the conception of FLOPS. In CUDA SAMPLES, there is Matrix Multiplication Example (0_Simple/matrixMul). In this example the number of FLOPs (operations with floating point) per matrix multiplication is calculated via the formula:
double flopsPerMatrixMul = 2.0 * (double)dimsA.x * (double)dimsA.y * (double)dimsB.x;
So, this means, that in order to multiply matrix A(n x m)
over B(m x k)
, we need to do: 2*n*m*k
operations with floating point.
However, in order to calculate 1 element of the resulting matrix C (n x k)
, one have to perform m
multiplication and (m-1)
addition operations. So, the total number of operations (to calculate n x k
elements), is m*n*k
multiplications and (m-1)*n*k
additions.
Of course, we could set the number of additions to m*n*k
as well, and the total number of operations will be 2*n*m*k
, half of them are multiplications and half additions.
But, I guess, multiplication is more computationally expensive, than addition. Why this two types of operations are mixed up? Is it always the case in computer science? How can one take into account two different types of operations?
Sorry for my English)