What an interesting problem! I will try to explain *only* the intuition, as much as possible, and leave the discussion of the math to Galperin's original translated paper which, aside from a few typos, is quite accessible. Also, I will *not* follow the exact method in the video, but rather follow Galperin's original method, as I believe it is easier to understand that way. The main difference is that in Galperin's original method, all bounces in the system are counted, not just those between the two balls. This also results in a different relationship between the mass of the large ball and the mass of the small one, but it's the same basic idea. In particular, $\pi$ shows up for the same reason.

First, let's set up the problem as shown below:

Place the large ball of mass $M$ on the line $\ell$, and place the small ball of mass $m$ between the large ball and the wall. Let the distance of the small ball from the wall at time $t$ be given by the function $x(t)$, and let the distance of the large ball from the wall be given by the function $y(t)$. Now this gives us a way of representing the position of both balls as a point $P(t)=(x(t),y(t))$ in the Cartesian plane, $\mathbb R ^2$.

Play around with this a bit to recognize what various parts of the plane correspond to. First, since the little ball is always between the large ball and the wall, $x(t)\le y(t)$ and so the point $P(t)$ will be between the $y$-axis and the line $y=x$. For instance, when the little ball hits the wall, then $x(t)$ will be zero, and so the point $P(t)$ will be on the $y$-axis. When the large ball hits the little ball, then $y(t)=x(t)$ so the point will be on the line $y=x$. This gives us a good way of visualizing all activity in the system as the movement of a point in the two dimensional plane.

Now let's set up a relative criteria for the masses $M$ and $m$ (and here is where I deviate from the video). Let $M=100^Nm$, roll the large ball toward the small ball with some velocity $V$, and let's count the total number of bounces in the system. If the masses were equal, when $N=0$, we would get three total bounces in the system:

- The "large" ball would hit the "small" ball and, similar to Newton's cradle, all the velocity of the moving ball would be transferred to the second with the first bounce.
- The "small" ball would bounce against the wall and, conserving kinetic energy, move in the opposite direction with the same speed as going in.
- The "small" ball would bounce against the "large" ball, transferring all it's momentum into the "large" ball.

This is 3 bounces, which happens to be the first digit of $\pi$. The point $P(t)$ will follow the dotted path outlined in Figure A. Since the masses were equal, note that we have a very nice reflective property of the path of $P(t)$, that is, the angle of incidence is equal to the angle of reflection when it "bounces" against $y=x$ and the $y$-axis. If the masses are unequal, however, this reflective property does not hold because the "large" ball will keep moving with some velocity toward the wall (see Figure B.).

To explain why this reflective property is desirable, let's "fold-out" the angle between the $y$-axis and the line $y=x$ for the case when the masses are equal, as shown in Figure C. (Think of flipping the angle over on itself in a clockwise manner.)

The dotted line still represents the movement of $P(t)$, but every time it would have bounced off of an object, this time it passes into the next folded-out section of the angle instead. Because the angle of incidence is equal to the angle of refraction, we have that the path of $P(t)$ is just a nice straight line, and it is very easy to see that it bounces three times. If only we could get the reflective property to hold when the masses were different sizes...

Well, as it turns out (and this does require some math to verify), we *can* get the reflective property to hold when the masses are different sizes by scaling the graph differently in the $y$ and $x$ directions. First, verify yourself that the ball bouncing against the $y$-axis always has the reflective property, and note that it doesn't matter how you scale the graph (stretch it in the $y$ or $x$ direction, and the bounce of $P(t)$ against the $y$-axis will still be reflective). Thus we only have to scale it in a way so that the bounce off the $y=x$ line is reflective. If you think about it, scaling the graph with different scales for the $x$ and $y$ axis will be akin to changing the angle of the $y=x$ line, and so what we want to do is "rotate" the $y=x$ line to a point where the reflective property holds. As it turns out, and again this does require some math to verify, the proper choice of scaling is $Y=\sqrt M y$ and $X=\sqrt m x$.

This scaling changes the angle between the $y$-axis and the line that represents a bounce between the two balls, which is now $Y=\sqrt{\frac M m}X$. Take the point $Y=\sqrt M$ and $X=\sqrt m$ which is on this line, and note that the triangle thus created by the points $(0,0)$, $(0,\sqrt M)$, and $(\sqrt m, \sqrt M)$ yields that $\tan(\theta)=\sqrt {\frac m M}$. Now, let's "fold-out" the angle, which now has the reflective property:

It's no matter that this angle doesn't perfectly divide $\pi$ (Ahah! There it is!), we do note however that the resulting "folding-out" of an angle will make $\left \lceil \frac \pi \theta \right \rceil$ copies of the angle, and therefore the line $P(t)$ will intersect $\left \lfloor \frac \pi \theta \right \rfloor$ lines. **This is why the number that comes out is related to $\pi$!**

Now the punchline: we have that $\tan \theta = \sqrt \frac m M$, so $\theta = \arctan \sqrt {\frac m M}$. When we set $M = 100^N m$ for some $n\in \mathbb N$, we have that

$$\theta = \arctan \sqrt {\frac m {100^N m}} = \arctan \sqrt {100^{-N}}=\arctan {10^{-N}}$$

And therefore $P(t)$ will intersect $\left \lfloor \frac \pi {\arctan 10^{-N}} \right \rfloor$ lines, which is exactly the same as saying the system will have exactly $\left \lfloor \frac \pi {\arctan 10^{-N}} \right \rfloor$ collisions. Now Galperin argues that

$$\left \lfloor \frac \pi {\arctan 10^{-N}} \right \rfloor \approx \left \lfloor \frac \pi {10^{-N}} \right \rfloor= \left \lfloor \pi 10^N \right \rfloor$$

and proves that it is exactly equal for values of $N<100,000,000$. He conjectures that it also holds more generally, but this is not proven. Clearly, $\left \lfloor \pi 10^N \right \rfloor$ is precisely the first $N+1$ digits of $\pi$!

This approach seems to make $\pi$ naturally come out, but (unless I'm mistaken, which is certainly possible) you can make the number of bounces line up with any number you'd like. For instance, let $M=\frac {\pi^2} {e^2} 100^N m$. Then we would end up with the number of bounces equal to $\left \lfloor e 10^N \right \rfloor$, that is, the first $N+1$ digits of $e$. Of course the interesting thing about the "natural" choice of $M=100^N m$ is that it does not depend on already knowing the value of $\pi$ in order to calculate it. On the other hand, if you had a need to create a physical system which bounced a particular number of times, you now can do it.

In fact, if you're willing to accept an uglier mass relation between the two balls, you can get it exact without resorting to Galperin's approximation. Just let $M=\cot^2(10^{-N})m$.