I see three questions here:

- Shouldn't exist more awareness about the fact that Z-transform (ZT) and generating functions (GF) are almost the same thing?

I think so. I've always found this strange and unfortunate, and I'd like to see in every textbook about ZT or GF a footnote ("The 'generating functions' employed in combinatorial mathematics are basically the same thing as the Z-transform" and viceversa).

- Are they (apart from the change of sign) really the same thing?

Formally, they are obviously the same thing, but the context is different:

In the Z-transform $x[n] \leftrightarrow X(z) $, the input is usually **double-sided** (the sum runs over all integers), the "right sided" transform is less used. Further, in signal processing, $x[n]$ is almost always one of these: 1) a **signal**, 2) the **impulse response of a LTI filter** (causal or not), 3) a (auto/cross) **correlation function**.
Hence, $x[n]$ is typically either bounded and decreasing for $n\to \pm \infty$ (for the case of filters and correlations) or (for the case of stochastic signals) stationary zero-mean sequences.

The generating function, instead, is usually applied to right-sided **sequences** (i.e. any $f:\mathbb{N} \to \mathbb{R}$). Apart from that, they are arbitrary; they often grow without bounds.

Because the ZT is applied to double-sided input, then the mapping $x[n] \leftrightarrow X(z) $ is not one-to-one: to have a unique inverse, we need to specify a ROC (region of convergence) of $X(z)$, in the complex plane.
For GF the problem of unicity does not arise, the ROC is implied. (However, as pointed out in a comment, the radius of convergence can be relevant to characterize some sequence properties).

The Z-transform $X(z)$ is not usually regarded as a formal series, but as a "true" complex function. And because of the AR/MA/ARMA models that are usually considered in classical signal processing, we almost always deal with rational functions, which can be characterized in terms of zeros and poles.

The ZT transform is naturally thought as a generalization of the Fourier transform, as typically $x[n]$ is square summable (with perhaps the addition of sinusoids - or countable Dirac deltas in the transform). This correspondence is given by the natural mapping $z \leftrightarrow e^{jw}$, i.e. the DTF is the ZT along the unit circle in the complex plane (same as the continuous Fourier transform is the Laplace transform along the $y$ axis). And the classic concepts (e.g. energy per frequency band) are normally pertinent and useful. In the GF scenario, we don't often think of Fourier transforms.

The different convention can be understood from the previous difference. Regarding the ZT as a generalization of the DFT, the negative sign is more natural (the input is expressed as a "synthesis" of sinusoids). BTW: this gives a ROC that for causal signals -or right handed transform- extends "to the exterior" of the largest pole; which in turns implies the common rule: a stable causal filter must have its poles inside the unit circle.
For the GF, being just a formal series, it feels more natural to use positive exponents.