It is likely you are confused by the unfamiliar math notation of Duhamel's formula as used routinely in physics (as in the text you are quoting), in the following practical form,
$$
\bbox[yellow]{ \frac{d}{ds} e^{A+s B} |_{s=0}= \int_{0}^{1}d\tau ~ e^{(1-\tau) A} B~e^{\tau A} } ~~. \tag{3}
$$
Your identity (1) then follows from the two leading terms of the Taylor expansion of your left-hand side around 0,
$$e^{A+\epsilon B}=e^A + \epsilon \frac{d}{d\epsilon}e^{A+\epsilon B} ~|_{\epsilon=0} + O(\epsilon^2) \\ = e^{A}\left(1+\int_{0}^{1}d\tau e^{-\tau A}\epsilon Be^{\tau A}\right)+ O(\epsilon^2). \tag{1}$$

In the physicists' seat-of the pants proof of (3), one suspends existence and well-definedness fussbudgetry and looks at the large *N* limit definition of the matrix exponential,
$$ e^{M(s)} = \lim_{N \to \infty} \left(1+\frac{M(s)}{N}\right)^N .$$

Differentiate w.r.t. *s*, utilize the chain rule, and exchange the order of differentiation and limit,
$$\begin{align}\frac{d}{ds}e^{M(s)} &= \lim_{N \to \infty}\frac{d}{ds}\left(1+\frac{M(s)}{N}\right)^N\\
&= \lim_{N \to \infty}\sum_{k=1}^N\left(1+\frac{M(s)}{N}\right)^{N-k}\frac{1}{N}\frac{dM(s)}{ds}\left(1+\frac{M(s)}{N}\right)^{k-1}~~~,\end{align}
$$
recalling *M(s)* and *M'(s)* don't commute.

Divide the unit interval into *N* sections
*Δτ= Δk/N* with *Δk=1*, since the sum indices are integers. Finally, let *N→∞*, so *Δτ→dτ* with *k/N → τ* and Σ→∫. You find

$$ \frac{d}{ds}e^{M(s)} = \int_{0}^1 d\tau ~e^{(1-\tau)M}M'e^{\tau M},
$$
trivially leading to (3), the first part of your question.

But don't stop here! The second part follows directly,
$$\begin{align}\frac{d}{ds}e^{M(s)} &= \int_{0}^1 d\tau ~e^{(1-\tau)M}M'e^{\tau M}\\
&= e^M \int_{0}^1 d\tau ~ \mathrm{Ad}_{e^{-\tau M}} M' \\
&= e^M \int_{0}^1d\tau ~ e^{-\mathrm{ad}_{\tau M}} M'\\
&= e^M \frac{1-e^{-\mathrm{ad}_M}}{\mathrm{ad}_M}\frac{dM}{ds}.
\end{align}$$
You may then fuss about existence at your convenience. But you see it leads to your (2) directly.

Going from the first line to the second, we basically apply the definition of Ad. From the second to the 3rd, we applied the basic "Hadamard lemma" of basic utility in physics, even higher than (3). It is easy to reassure your self of its logic by expanding a few orders of the exponentials in powers of the the linear operators ad$_M$ (which commute the argument hit with *M*).

For the last line, you do the trivial integral in τ
to find a function of ad$_M$ s with *no* negative powers: the output,
$ \sum_{n = 0}^\infty (\mathrm{-ad}_M)^n / (n + 1)!~$ , has no singularity in ad$_M$, and the BCH article discusses it... it is a celebrated function associated with the generating function of Bernoulli numbers. Let's say it is "nice" and obviates your question about the evanescent inverse of ad$_M$.

Your text telegraphs these basics merely assuming them, but misses a teaching moment. That book does that a lot.