I'm a software engineer trying to learn linear algebra and feel like I'm having a hard time following matrix computations.

For example, this is a part of the least squared method for linear model:

$$\sum\limits_{i=1}^n ||\mathbf\theta^T\mathbf x_i-y_i||^2=(\mathbf{X\theta}-\mathbf y)^T(\mathbf{X\theta}-\mathbf y).$$

How do we jump from the first line, where there's a lot going on like Sigma $i=1\to n$, norm squared, $x_i$, $y_i$, etc., to the second line where those are wrapped nicely in that matrix representation with transpose thing?

I know that can arrive at the second line if I carefully write down, try playing with concrete matrices, and I'm very slow with this.

Is there any other way to reason, or visualize it? How do mathematicians tackle this kind of thing? Or everyone's kind of struggle with it privately too?