This is too long for a comment, too.
Experimentally, another type of approach leads to better approximations.
Assume that we want a polynomial approximation of degree $n$ of a $\mathcal{C}^1([0,1])$-function $f(x)$ with an $O(n^{-1})$ error term. Consider the function $g(x)$ whose graph is the convex envelope of the points $(0,f(0)),(1/n,f(1/n),\ldots,(1,f(1))$. Clearly:
$$\| f(x)-g(x)\|_{\infty} \leq \frac{M}{n},$$
where $M$ is the Lipshitz constant of $f(x)$, i.e. $\max_{x\in[0,1]}|f'(x)|$. If $f(x)$ is a non-decreasing function, the stronger
$$\| f(x)-g(x)\|_{\infty} \leq \frac{M}{2n}$$
holds. Let now $h_p(x)$ be a "step-identity" continuous function of the form
$$ h_p(x)=\left\{\begin{array}{rl}0 &\mbox{if }x\leq p,\\(x-p) &\mbox{if }x\geq p.\end{array}\right.$$
If $t>0$, $k_{p,t}(x)=h_{p}(x)-h_{p+t}(x)$ is a continuous "step-identity-step" function whose derivative is zero for $x<p$ or $x>p+t$ and one for $x\in(p,p+t)$. Moreover, $g(x)$ can be written as a linear combination of $k_{p,t}$-functions, i.e.:
$$ g(x) = \sum_{j=0}^{n-1} \frac{f\left(\frac{j+1}{n}\right)-f\left(\frac{j}{n}\right)}{\frac{1}{n}}\,k_{\frac{j}{n},\frac{1}{n}}(x), $$
so the problem can be substantially reformulated as: find polynomial approximations of $k_{0,\frac{1}{n}}(x)$ that are non-decreasing and have an $\|\cdot\|_{\infty}$-error term bounded by constant times the reciprocal of the degree. Up to integration, this is the same as finding a positive and "good" polynomial approximation of the characteristic function of the $[0,1/n]$ interval. Bernstein polynomials almost do this, since they are positive, but with the wrong error term.

Another reformulation is to find a tight polynomial upper bound for the absolute value function over the $[-1,1]$-interval. A classical try is to consider $q_n(y)$ as the truncated Taylor series of $\sqrt{1-y}$ w.r.t. $y=0$ and take $p_n(x)=q_n(1-x^2)$. This gives $q_n(x)\geq|x|$ as wanted, but the error term is $q_n(0)=O\left(\frac{1}{\sqrt{n}}\right)$, too big. Another try is to consider the Fourier-Chebyshev approximation of $|x|$, i.e.
$$ p_{2n}(x) = \frac{2}{\pi}-\frac{4}{\pi}\sum_{k=1}^{n}\frac{(-1)^k T_{2k}(x)}{4k^2-1}. $$
$p_{2n}(x)=|x|$ holds in $2n+2$ points, so $p_{2n}(x)$ in not an upper bound neither a lower bound for $|x|$, but $|p_{2n}(x)-|x||$ is a fast decreasing function:
$$|p_{2n}(x)-|x||\leq \min\left(\frac{2}{(2n+1)\pi},\frac{1}{2\pi n^2 |x|}\right) $$
and the error term is the right one: $\|p_{2n}(x)-|x|\|_{\infty}=p_{2n}(0)=\frac{2}{(2n+1)\pi}=O\left(\frac{1}{n}\right)$. If we approximate $k_{0,\frac{1}{n}}$ with $p_{2n}(x)-p_{2n}(x-1/n)$, then $g(x)$ with these approximated $k$s, we end with a polynomial approximation $r_{2n}(x)$ of $f(x)$
that satisfies:
$$ \|f(x)-r_{2n}(x)\|_{\infty}\leq \frac{M}{n}+Mn\int_{-1}^{1}\min\left(\frac{2}{(2n+1)\pi},\frac{1}{2\pi n^2 |x|}\right)dx=O\left(\frac{\log n}{n}\right),$$
that is far better than the Bernstein approximation, but it is still not what we want.

**Can we exploit the oscillations of the Chebyshev approximations in order to remove the logarithmic factor in the RHS?**
**UPDATE:** I believe not, since we have substantially built a trigonometric polynomial approximation through the convolution with respect to the Dirichlet kernel, whose $L^1$-norm is not bounded, but $\Theta(\log n)$ (this fact gives birth to the Gibbs phenomenon, for istance); then a reformulation is to find a "good" trigonometric kernel, for which "good" means non-negative (in order to preserve monotonicity) and almost orthogonal to continuous, piecewise-linear functions.
In the literature I found that the Jackson kernel, that is the square of the Fejer kernel, satisfies all the prescribed properties, but I am still not satisfied, since the Fourier coefficients of the Jackson kernel are pretty complicated.

In order to get a monotonic polynomial approximation of a non-decreasing $\mathcal{C}^1([0,1])$-function $f(x)$, we can choose a positive $K$ such that $\sqrt{f'(x)+K}$ is regular enough to ensure the existence of a $O(n^{-1})$-polynomial approximation, then replicate fedja's argument, but the first step still seems out-of-reach, actually.