Your question is a natural one and the answer is controversial, lying at heart
of a decades-long debate between frequentist and Bayesian statisticians. *Statistical
inference is not mathematical deduction.* Philosophical issues arise when
one takes a bit of information in a sample and tries to make a helpful
statement about the population from which the sample was chosen. Here is *my*
attempt at an elementary explanation of these issues as they arise in your
question. Others may have different views and post different explanations.

Suppose you have a random sample $X_1, X_2, \dots X_n$ from $Norm(\mu, \sigma)$
with $\sigma$ known and $\mu$ to be estimated. Then
$\bar X \sim Norm(\mu, \sigma/\sqrt{n})$ and we have
$$P\left(-1.96 \le \frac{\bar X - \mu}{\sigma/\sqrt{n}} \le 1.96\right) = 0.95.$$
After some elementary manipulation, this becomes
$$P(\bar X - 1.96\sigma/\sqrt{n} \le \mu \le \bar X + 1.96\sigma/\sqrt{n}) = 0.95.$$
According to the frequentist interpretation of probability, the two displayed
equations mean the same thing: Over the long run, the event inside parentheses will be true 95% of the time. This interpretation holds as long as $\bar X$ is viewed as a random variable based on a random sample of size $n$ from the normal population specified at the start. Notice that the second equation needs to
be interpreted as meaning that the *random interval*
$\bar X \pm 1.96\sigma/\sqrt{n}$ happens to include the unknown mean $\mu.$

However, when we have a particular sample and the numerical value of an
observed mean $\bar X,$ the frequentist "long run" approach to probability
is in potential conflict with a naive interpretation of the interval. In this
particular case $\bar X$ is a fixed observed number and $\mu$ is a fixed
unknown number. Either $\mu$ lies in the interval or it doesn't. There is no "probability" about it. The *process*
by which the interval is derived leads to coverage in 95% of cases over the
long run. *As shorthand for the previous part of this paragraph,* it is customary to use the word
**confidence** instead of *probability*.

There is really no difference between the two words. It is just that the
proper frequentist use of the word *probability* becomes awkward, and people have
decided to use *confidence* instead.

In a Bayesian approach to estimation, one establishes a probability framework
for the *experiment at hand* from the start by choosing a "prior distribution." Then a *Bayesian probability
interval* (sometimes called a *credible interval*) is based on a melding of the prior
distribution and the data. A difficulty Bayesian statisticians may have in helping
nonstatisticians understand their interval estimates is
to explain the origin and influence of the prior distribution.