Very roughly speaking: you can think of the difference in terms of the Heisenberg Uncertainty Principle, one version of which says that "bandwidth" (frequency spread) and "duration" (temporal spread) cannot be both made arbitrarily small.

The classical Fourier transform of a function allows you to make a measurement with 0 bandwidth: the evaluation $\hat{f}(k)$ tells us precisely the size of the component of frequency $k$. But by doing so you lose all control on spatial duration: you do not know when in time the signal is sounded. This is the limiting case of the Uncertainty Principle: absolute precision on frequency and zero control on temporal spread. (Whereas the original signal, when measured at a fixed time, gives you only absolute precision on the amplitude at that fixed time, but zero information about the frequency spectrum of the signal, and represents the other extreme of the Uncertainty Principle.)

The wavelet transform take advantage of the intermediate cases of the Uncertainty Principle. Each wavelet measurement (the wavelet transform corresponding to a fixed parameter) tells you something about the temporal extent of the signal, as well as something about the frequency spectrum of the signal. That is to say, from the parameter $w$ (which is the analogue of the frequency parameter $k$ for the Fourier transform), we can derive a characteristic frequency $k(w)$ and a characteristic time $t(w)$, and say that our initial function includes a signal of "roughly frequency $k(w)$" that happened at "roughly time $t(w)$".

How is this helpful? Let us say we are looking at the signal of the light emitted from a traffic light. So for some time it will be red, and for some time it will be green (ignore the yellow for now). If we take the Fourier transform of the observed frequency, we can say that

- At some time the traffic light shows red. (We know frequency to infinite precision, and that the red part of the signal is non-zero.)
- At some time the traffic light shows green.

But a functioning traffic light would have either red or green shown at a time, and not both. And if the traffic light malfunctions and shows both lights at the same time, we would still see from the Fourier transform

- At some time the traffic light shows red.
- At some time the traffic light shows green.

But if we take the wavelet transform we can sacrifice frequency precision to gain temporal information. So with the wavelet transform done on the working traffic light we may see

- At parameter $w$ which corresponds roughly to $t(w)$ being 1 o'clock sharp and $k(w)$ corresponding to red, the wavelet transform is large and non-zero. This can be taken to mean that sometime around 1 o'clock sharp (could be exactly 1 o'clock, could be 1 minute past, could be 30 second before) the light showed a color that is more or less red (could be a little bit purple, or maybe a little bit amber).
- At parameter $w$ which corresponds roughly to $t(w)$ being 1 o'clock sharp and $k(w)$ corresponding to green, the wavelet transform is almost zero. This can be taken to mean that at all the times around 1 o'clock (say plus or minus 2 minutes) the traffic light does not show any hint of green.
- At parameter $w$ which corresponds roughly to $t(w)$ being five minutes past 1 and $k(w)$ corresponding to green, the wavelet transform is large and non-zero. This would indicate that around 1:05 (maybe 1:06, or 1:04) the light shined greenish (could have a tinge of teal or a bit of yellow in it).

This would tell us that not only can the traffic light show both red and green lights, that at least at around 1 o'clock the light is working properly and only showing one light.