I do not know the actual history of determinant, but I think it is very well motivated. From the way I look at it, it's actually those properties of determinant that make sense. Then you derive the formula from them.

Let me start by trying to define the "signed volume" of a hyper-parallelepiped whose sides are $(u_1, u_2, \ldots, u_n)$. I'll call this function $\det$. (I have no idea why it is named "determinant". Wiki says Cauchy was the one who started using the term in the present sense.) Here are some observations regarding $\det$ that I consider quite natural:

- The unit hypercube whose sides are $(e_1, e_2, \ldots, e_n)$, where $e_i$ are standard basis vectors of $\mathbb R^n$, should have volume of $1$.
- If one of the sides is zero, the volume should be $0$.
- If you vary one side and keep all other sides fix, how would the signed volume change? You may think about a 3D case when you have a flat parallelogram defined by vectors $u_1$ and $u_2$ as a base of a solid shape, then try to extend the "height" direction by the third vector $u_3$. What happens to the volume as you scale $u_3$? Also, consider what happens if you have two height vectors $u_3$ and $\hat u_3$. $\det(u_1, u_2, u_3 + \hat u_3)$ should be equal to $\det(u_1, u_2, u_3) + \det(u_1, u_2, \hat u_3)$. (This is where you need your volume function to be signed.)
- If I add a multiple of one side, say $u_i$, to another side $u_j$ and replace $u_j$ by $\hat u_j = u_j + c u_i$, the signed volume should not change because the addition to $u_j$ is in the direction of $u_i$. (Think about how a rectangle can be sheered into a parallelogram with equal area.)

With these three properties, you get familiar properties of $\det$:

- $\det(e_1, \ldots, e_n) = 1$.
- $\det(u_1, \ldots, u_n) = 0$ if $u_i = 0$ for some $i$.
- $\det(u_1, \ldots, u_i + c\hat u_i, \ldots, u_n) = \det(u_1, \ldots, u_i, \ldots, u_n) + c\det(u_1, \ldots, \hat u_i, \ldots, u_n)$.
- $\det(u_1, \ldots, u_i, \ldots, u_j, \ldots, u_n) = \det(u_1, \ldots, u_1, \ldots, u_j + cu_i, \ldots, u_n)$. (It may happen that $j < i$.)

You can then derive the formula for $\det$. You can use these properties to deduce further easier-to-use (in my opinion) properties:

- Swapping two columns changes the sign of $\det$.

This should tell you why oddness and evenness of permutations matter. To actually (inefficiently) compute the determinant $\det(u_1, u_2, \ldots, u_n)$, write $u_i$ as $u_i = \sum_{j=1}^n u_{ij}e_j$, and expand by multilinearity. For example, in 2D case,

$$
\begin{align*}
\det(u, v) & =
\det(u_1e_1 + u_2e_2, v_1e_1 + v_2e_2) \\
& = u_1v_1\underbrace{\det(e_1, e_1)}_0 + u_1v_2\underbrace{\det(e_1, e_2)}_1 + u_2v_1\underbrace{\det(e_2, e_1)}_{-1} + u_2v_2\underbrace{\det(e_2, e_2)}_0 \\
& = u_1v_2 - u_2v_1.
\end{align*}
$$

(If you are not familiar with multilinearity, just think of it as a product. Ignore the word $\det$ from the second line and you get a simple expansion of products. Then you evaluate "unusual product" between vectors $e_i$ by the definition of $\det$. Note, however, that the order is important, as $\det(u, v) = - \det(v, u)$.)