73

I am a Software Engineering student and this year I learned about how CPUs work, it turns out that electronic engineers and I also see it a lot in my field, we do use derivatives with discontinuous functions. For instance in order to calculate the optimal amount of ripple adders so as to minimise the execution time of the addition process:

$$\text{ExecutionTime}(n, k) = \Delta(4k+\frac{2n}{k}-4)$$ $$\frac{d\,\text{ExecutionTime}(n, k)}{dk}=4\Delta-\frac{2n\Delta}{k^2}=0$$ $$k= \sqrt{\frac{n}{2}}$$

where $n$ is the number of bits in the numbers to add, $k$ is the amount of adders in ripple and $\Delta$ is the "delta gate" (the time that takes to a gate to operate).

Clearly you can see that the execution time function is not continuous at all because $k$ is a natural number and so is $n$. This is driving me crazy because on the one hand I understand that I can analyse the function as a continuous one and get results in that way, and indeed I think that's what we do ("I think", that's why I am asking), but my intuition and knowledge about mathematical analysis tells me that this is completely wrong, because the truth is that the function is not continuous and will never be and because of that, the derivative with respect to $k$ or $n$ does not exist because there is no rate of change.

If someone could explain me if my first guess is correct or not and why, I'd appreciate it a lot, thanks for reading and helping!

Yly
  • 14,399
  • 4
  • 29
  • 70
Santiago Pardal
  • 876
  • 1
  • 6
  • 11
  • 14
    I double majored at university. Mathematics and engineering. Im a pure mathematician before Im an engineer. I breezed through the engineering program because the associated mathematics was most peoples biggest hurtle. But one thing that always threw me off were the odd-ball discontinuous or non-differentiable functions that were "assigned" meaning in all violation of mathematics. Through fiat alone, a derivative was given to it, an integral, etc. Never mind the lack of mathematical rigor; just appreciate the mathematical self-consistency and the practicality and approximation it yields. – CogitoErgoCogitoSum Aug 30 '20 at 21:54
  • 1
    Every logical and mathematical system is built on axioms. One could construct any system one desires, just so long as it is self-consistent. You may treat all of pure mathematics as a sort of axiom in its own right, then arbitrarily define/asset nonsense functions with nonsense relatioships. Just so long as no contradictions arise, youre fine. Mathematics and logical systems are bit more forgiving, I think, than we give them credit for. – CogitoErgoCogitoSum Aug 30 '20 at 21:57
  • To avoid confusion I removed my previous comments - in your case, doing this method will give you an exact solution. But in general optimisation questions - it does **not** have to be the case that rounding will give you the best solution over the whole numbers like this (it's only going to give you an approximation). In your specific case though - it does indeed give exact – Riemann'sPointyNose Aug 31 '20 at 01:58
  • 8
    An answer to a different question: it always surprises me how seldom people use the natural way to optimize a function of an integer variable—namely, just look at the discrete difference $E(n,k+1)-E(n,k)$, whose sign tells us whether the function is increasing or decreasing just like a derivative does for continuous functions. (Sometimes the discrete quotient works better than the discrete difference.) – Greg Martin Aug 31 '20 at 06:16
  • 5
    @CogitoErgoCogitoSum. Similar experience here. Majored in physics, work as an engineer. First thing that came into my head when I saw the title was "They don't care if it's correct or not. No, it's not correct, but they still don't care as long as it seems to work" – Mad Physicist Aug 31 '20 at 15:28
  • 20
    An Engineer doesn't care whether a method is "correct" as long as it gives the right answer within machine tolerances. – OrangeDog Aug 31 '20 at 16:51
  • @MadPhysicist Do you think it's a waste of time to optimise in such detail if what I am trying to make works? As an engineer I ask (I am on my third year so I want to be prepared and know as much as I can). Because I can't imagine engineers in the aeroespacial or nuclear industry not doing so, there are certain fields in which (In my opinion) "just make it work" isn't enough. Same question goes for OrangeDog and CogitoErgoCogitoSum (don't know how to @ them). Thanks! – Santiago Pardal Aug 31 '20 at 18:09
  • 2
    @SantiagoPardal. The definition of "just make it work" is different between fields, but engineering is fundamentally not academia. You are not interested in the theory: you want to make a product you can sell. Of course a good engineer knows a lot of theory. I think you are right to understand the math behind everything you do as completely as possible. It is not a waste. Keep in mind that your bosses in the "real world" won't care how you got results. They will just notice whether you got better results or not. – Mad Physicist Aug 31 '20 at 18:30
  • 3
    @SantiagoPardal. This type of question is actually a good example of why you are right to ask about such things. The first sencence of the accepted answer says it all: "In general, computing the extrema of a continuous function and rounding them to integers does *not* yield the extrema of the restriction of that function to the integers." Many times, you and the other guy will get similar answers. In the cases where the assumption fails, you will know what to do and will get better results. No one will care how you did it: you will just be the go-to guy for optimization or whatever. – Mad Physicist Aug 31 '20 at 18:34
  • 1
    Engineers do all sorts of things which are not defined rigorously. For example, they might start with an equation whose right-hand side is $\delta(x) \delta(t)$, which is not necessarily well-defined, as one has to be careful to define products of distributions. – Tom Aug 31 '20 at 18:40
  • @MadPhysicist Thank you so much for helping. This kind of exchanges makes me try to get better every day. I truly appreciate the contribution of all of you guys, thanks again! – Santiago Pardal Aug 31 '20 at 19:38
  • 1
    Not an answer to your question, but note that you can easily get $O(\log n)$ gate delays for $n$-bit addition, by using a recursive structure, and it even has very little wire crossings. – user21820 Aug 31 '20 at 21:04
  • 4
    Physicists and engineers pay functions better than mathematicians and functions work harder for them (apologies to Lewis Caroll) All the functions in physics and engineering are uniformly continuous and in this case low pass filtered so that samples at the integers are sufficient. If it is properly filtered, the maximum at an integer will be one side or the other of the global maximum of the version extended from the integers to the reals. – Ross Millikan Sep 01 '20 at 03:25
  • 3
    Just to clarify, because one instructor differentiates inappropriately does not mean that all engineers do so. However, in some cases the lack of rigour is supplanted by strong intuition (which may be situation dependent) which can be subsequently justified by additional mathematical machinery. A standard example is the use of distributions (as in generalised functions). – copper.hat Sep 01 '20 at 18:19
  • @OrangeDog I guess that's the mindset that brought us to the current status of Undefined Behavior in C and general hilarity of Software Engineering. – hmijail mourns resignees Sep 14 '20 at 03:33

5 Answers5

117

In general, computing the extrema of a continuous function and rounding them to integers does not yield the extrema of the restriction of that function to the integers. It is not hard to construct examples.

However, your particular function is convex on the domain $k>0$. In this case the extremum is at one or both of the two integers nearest to the unique extremum of the continuous function.

It would have been nice to explicitly state this fact when determining the minimum by this method, as it is really not obvious, but unfortunately such subtleties are often forgotten (or never known in the first place) in such applied fields. So I commend you for noticing the problem and asking!

  • 23
    It's a good point that convexity actually makes the method pretty rigorous, but it should also be mentioned that it often gives good results even for grossly non-convex functions. You could argue this then assumes that the function is still _locally_ convex in a sufficiently big region, but physicists or engineers may not bother proving this kind of thing. – leftaroundabout Aug 31 '20 at 08:20
  • 6
    @leftaroundabout Your last sentence is precisely the sentiment I tried to capture in my last paragraph. Understanding when the method does or doesn't work is valuable but quite difficult. I guess the key takeaway is that the general method is very convenient for finding good *candidates* for optima. – This site has become a dump. Aug 31 '20 at 08:29
39

The main question here seems to be "why can we differentiate a function only defined on integers?". The proper answer, as divined by the OP, is that we can't--there is no unique way to define such a derivative, because we can interpolate the function in many different ways. However, in the cases that you are seeing, what we are really interested in is not the derivative of the function, per se, but rather the extrema of the function. The derivative is just a tool used to find the extrema.

So what's really going on here is that we start out with a function $f:\mathbb{N}\rightarrow \mathbb{R}$ defined only on positive integers, and we implicitly extend $f$ to another function $\tilde{f}:\mathbb{R}\rightarrow\mathbb{R}$ defined on all real numbers. By "extend" we mean that values of $\tilde{f}$ coincide with those of $f$ on the integers. Now, here's the crux of the matter: If we can show that there is some integer $n$ such that $\tilde{f}(n)\geq \tilde{f}(m)$ for all integers $m$, i.e. $n$ is a maximum of $\tilde{f}$ over the integers, then we know the same is true for $f$, our original function. The advantage of doing this is that now can use calculus and derivatives to analyze $\tilde{f}$. It doesn't matter how we extend $f$ to $\tilde{f}$, because at the end of the day we're are only using $\tilde{f}$ as a tool to find properties of $f$, like maxima.

In many cases, there is a natural way to extend $f$ to $\tilde{f}$. In your case, $f=\text{ExecutionTime}$, and to extend it you just take the formula $\Delta \left(4k + \frac{2n}{k} - 4\right)$ and allow $n$ and $k$ to be real-valued instead of integer-valued. You could have extended it a different way--e.g. $\Delta \left(4k + \frac{2n}{k} - 4\right) + \sin(2\pi k)$ is also a valid extension of $\text{ExecutionTime}(n,k)$, but this is not as convenient. And all we are trying to do is find a convenient way to analyze the original, integer-valued function, so if there's a straightforward way to do it we might as well use it.


As an illustrative example, an interesting (and non-trivial) case of this idea of extending an integer-valued function to a continuous-valued one is the gamma function $\Gamma$, which is a continuous extension of the integer-valued factorial function. $\Gamma$ is not the only way to extend the factorial function, but it is for most purposes (in fact, all purposes that I know of) the most convenient.

Yly
  • 14,399
  • 4
  • 29
  • 70
  • 11
    I would like to add that using standard mathematical terminology (which may differ from High School terminology) any function $'maths N\to\mathbb R$ is continuous. (There is no appropriate concept of differentiation, though,) – Carsten S Aug 31 '20 at 10:55
15

You are confusing a mathematical model of the system with the system itself. The map is not the territory.

Obviously in the real system both $n$ and $k$ must be integers. On the other hand, the math formula for the execution time is a perfectly good function for any real (or even complex!) values of $n$ and $k$ except when $k = 0$.

So you can certainly find the minimum value if $k$ according to the math model, even though the answer $k = \sqrt{(n/2)}$ gives a fractional value of $k$ for most integer values of $n$.

If you want to make the math more rigorous, since the function is convex you can then say that

For any integer $k \ge \sqrt{n/2}$, $E(n,k+1) > E(n,k)$, and

For any integer $k \le \sqrt{n/2}$ and $k > 1$, $E(n,k-1) > E(n,k)$.

Therefore, the minimum value of $E$ when $k$ is an integer is one of the (one or two) integers in the range $\sqrt{n/2} - 1 < k < \sqrt{n/2} + 1$.

On the other hand, engineers are interested in getting results, not doing rigorous math, and if you sketch a graph of the general shape of $E(n,k)$ this rigorous mathematical argument is "obvious".

alephzero
  • 1,241
  • 7
  • 9
  • 3
    I completely agree with your last paragraph, but I also think that we should take into consideration the rigorous mathematical arguments, because looking at graphs not always show the whole picture and things we may take by obvious may not be right with small variations. – Santiago Pardal Aug 31 '20 at 14:09
8

The continuous solution $k= \sqrt{\frac{n}{2}}$ gives in general an approximation for the extrema point, then we can estimate the optimal value by the integer part

$$ k_1=\left\lfloor \sqrt{\frac{n}{2}}\right\rfloor \le\sqrt{\frac{n}{2}} \le \left\lfloor \sqrt{\frac{n}{2}}\right\rfloor +1=k_2$$

such that ExecutionTime$(n, k)$ is minimum.

As noticed by Servaes in this particular case convexity guarantees that the minimum value given by $k_1$ or $k_2$ is the optimal integr value for the given function.

user
  • 139,915
  • 12
  • 70
  • 131
  • Yes, that's what I was thinking about, but the fact that the function is discontinuous bothers me for some reason. Is there any reason/exception why we can do this? – Santiago Pardal Aug 30 '20 at 20:37
  • Hmm yeah they must be using an approx like this - but in general for optimization it does not need to be the case that simply rounding and getting a whole value is optimal over the whole numbers, right? Especially if the function is steep at the Maxima – Riemann'sPointyNose Aug 30 '20 at 20:39
  • Like, if the Maxima was 1.25 for example - it could be actually that 3 is better than 1 or 2 - but using this method they would conclude either 1 or 2 – Riemann'sPointyNose Aug 30 '20 at 20:40
  • 1
    @Riemann'sPointyNose Yes I agree in general the method do not guarantee for an extrema point. – user Aug 30 '20 at 20:44
  • 1
    @SantiagoPardal Assuming $k$ continuous "relax" the problem in an easier form. Once we have the solution for the continuous case we can easily find the integer value nearest to that optimal value. See also [Relaxation (approximation)](https://en.wikipedia.org/wiki/Relaxation_(approximation)). – user Aug 30 '20 at 20:45
  • 1
    @Riemann'sPointyNose Thanks, feel free to add more observation also in an answer if you like! Bye – user Aug 30 '20 at 20:49
  • I don't see why it would necessarily have to be considered strictly as an approximation. For example, if $f : [0, \infty) \to \mathbb{R}$ is differentiable everywhere and has a single critical point at $a$ which forms a global maximum, then $f$ is strictly increasing on $[0, a]$ and strictly decreasing on $[a, \infty)$. Therefore, the maximum value of $f |_{\mathbb{N}}$ would have to occur *exactly* either at $\lfloor a \rfloor$ or at $\lceil a \rceil$. – Daniel Schepler Aug 30 '20 at 21:03
  • @DanielSchepler I've adjusted a littel bit the answer, let me know if you think it works well in this way. Thanks – user Aug 30 '20 at 21:08
  • @DanielSchepler indeed. Maybe I was a bit unclear - this is further justification which is not given by the original post, what I was trying to say was that for general optimisation stuff like this - doing this will only give you an approximation. Servaes also gave an answer below where he justified this as being exact in this particular case :) – Riemann'sPointyNose Aug 31 '20 at 01:53
  • @DanielSchepler to avoid confusion - I removed the comments where I claimed it was an approximation (where what I really should have said was "**in general** it will only give you an approximation). I would have edited the comments instead but there's a time limit to how long you have to edit them – Riemann'sPointyNose Aug 31 '20 at 01:56
1

If you consider that k and n are continuous variables, you obtain a continuous function that may be derivated and is entangled to the initial discontinuous function. The real problem is if the extreme point of this function can approximate the extreme point of the discontinuous one. The error may be evaluated with the Cauchy reminder formula for Newton or Lagrange interpolation which has a k! in the denominator and may be tolerable if k is enough big.