Something weird is going on here. I'm assuming $g: \mathbb R^m \to \mathbb R$ and say $A$ is an $m\times n$ matrix. Let $\mathcal a(x): \mathbb R^n \to \mathbb R^m, x \mapsto Ax + b$ be the corresponding affine transformation, so that $f = g \circ a$. The chain rule says $Df(x) = Dg(a(x)) Da(x)$.

The Jacobian realization of $Dg$ is $\nabla g$ and is an $1\times m$ matrix (row vector), while the Jacobian for $a$ is $A$, an $m \times n $ matrix. The dimensions all agree, since this would make $\nabla f$ a $1\times n$ matrix, which agrees with the notion that the derivative of $f$ is a linear map $\mathbb R^n \to \mathbb R$.

So what I suspect is happening is some identification of $\mathbb R^n$ with its dual space under the Euclidean inner product; that is, you're realizing the gradient as a column vector instead of a row vector. The transpose is precisely the way this is done. If $T: V \to W$ is a linear transformation, then its adjoint is $T^\dagger: W^* \to V^*$. But under the Euclidean inner product, you can identify $\mathbb R^n \cong (\mathbb R^n)^*$, so
$$ (\nabla g(a(x)) A)^T = A^T [\nabla g(a(x))]^T = A^T \nabla g(a(x))$$
where we're abusing notation by identifying the row vector $\nabla g$ with the column vector $\nabla g$. This hidden identification is likely what is confusing you.