A few answers to your questions:

*Is the first "simple" optimization bad for some reason?*

Yes. As already pointed out by @Rahul this algorithm does not solve at all your optimization problem $\min f+g$ in general (I develop this more below). The only useful exception is the case when $f$ and $g$ are the indicator functions of two closed convex sets $C$ and $D$. In that particular case the algorithm becomes the alternated projection algorihm, which *do* converge to a point of the intersection $C \cap D$ (if there is any).

*Does it converge but at a slower rate than DR?*

Because of the above point, this algorithm is not studied in general. In the particular case of indicator function of sets, both algorithms are valid and produce different sequences. They seem to have essentially the same kind of convergence rates, and trying to determine which is the best is still a subject of research, in particular in the nonconvex setting.

*Or does it not converge for many cases?*

It converges if and only if $\text{prox}_g \circ \text{prox}_f$ admits a fixed point. See [1, Prop 23.8, Cor 4.41 & Thm 5.23].

*Is there a simple way to relate the simple algorithm to the DR algorithm (perhaps by adding momentum into the optimization algorithm)?*

No, they really are of a different nature, you hardly can interpret them as a perturbation/modifiation of the other. One way I understand the Douglas Rachford algorithm is to look at it from the dual point of view. If you minimize one function by applying the proximal algorithm, you can show that you are also solving the dual problem with an augmented lagrangian method. If instead you want to minimize the sum of two functions, you can see that doing Douglas-Rachford is equivalent to perform an alternated augmented lagrangian method (or ADMM) on the dual, which is the most "natural" thing you could think about (more details on this in [2, Section 3.2.3]).

**Details about why the algorithm doesn't solve the original optimization problem:**

Assume for instance that your sequence $y^k$ converges to some $\bar y$. Then, by passing to the limit in the algorithm, you see that $\bar y$ is a fixed point for the composed operator $\text{prox}_g \circ \text{prox}_f$. Assuming that your functions are smooth, and using the definition of proximal operator $\text{prox}_f = (Id + \nabla f)^{-1}$, you easily deduce that
\begin{equation}
0 = \nabla f(\bar y + \nabla g(\bar y)) + \nabla g(\bar y).
\end{equation}
This is not the optimality condition you'd expect, which is $0 = \nabla f(\bar y) + \nabla g(\bar y)$, because the gradient of $g$ is mixed up with the one of $f$. This is more clear if you consider the particular case where $f$ is a symmetric positive quadratic function $f(x)=\langle Ax,x \rangle - \langle b,x\rangle$, in which case the limit would satisfy the undesirable condition $0 = \nabla f(\bar y) + (Id + A) \nabla g(\bar y)$.

On the other hand, it might seems that the Douglas-Rachford algorithm is mixing up the proximal operators of $f$ and $g$. But it does it in such a way that at the limit it disentangles them, and you obtain a solution of your optimization problem. With your notations again, if the sequences $x^k$ and $y^k$ converge to some $\bar x$ and $\bar y$, they satisfy the equations
\begin{eqnarray}
\bar x = \text{prox}_f(\bar y) \\
\bar y = \bar y + \text{prox}_g(2\bar x - \bar y) - \bar x
\end{eqnarray}
which can be rearranged (using the definition of the prox) as $0=\nabla f(x) + \nabla g(x)$
as desired.

[1] Convex Analysis and Monotone Operator Theory in Hilbert Spaces, second edition, by Bauschke and Combettes.

[2] Applications of Lagrangian-Based Alternating Direction Methods
and Connections to Split Bregman, by Esser, 2009. ftp://arachne.math.ucla.edu/pub/camreport/cam09-31.pdf