37

I've been interested in non-standard analysis recently. I was reading up on it and noticed the following interesting comment on the Wikipedia page about hyperreal numbers, right after giving an example of a nonstandard differentiation:

The use of the standard part in the definition of the derivative is a rigorous alternative to the traditional practice of neglecting the square of an infinitesimal quantity... the typical method from Newton through the 19th century would have been simply to discard the $dx^2$ term.

I've never heard anything like this before, and really find it fascinating that Newton's method was to define the relation $dx^2 = 0$. If we actually formalize the above structure by taking $\mathbb{R}$ and adjoining an element $dx^2 = 0$ to it, we get the "dual numbers," isomorphic to the quotient ring $\mathbb{R}[x]/x^2$. I'd seen some things about how this algebra plays into automated differentiation algorithms for some computer software systems, but I've never heard anything about Newton directly working in this algebra. So I have a few questions:

  1. Does anyone have more historical information on the way that Newton performed differentiation, and its relation to the dual numbers?
  2. Does anyone know how effectively real analysis can be formalized with the dual numbers? Does the resulting system play nice enough to develop all of the important modern results?
  3. If we start with $\mathbb{C}[x]/x^2$ instead, can we likewise develop complex analysis?

Since this idea is so simple, I'm very curious how powerful it is. I'm also curious if it has any major drawbacks too, since I'm not sure why anyone would mess with the foundational baggage involved in defining the hyperreals if this simple 2-dimensional real algebra could really do the trick.

Mike Battaglia
  • 6,096
  • 1
  • 22
  • 47
  • 1
    The "major drawback" as you put it is that an ordinary function on the reals does not extend to the dual numbers. – Mikhail Katz Dec 19 '14 at 09:02
  • I got so many comments about how 2=0 causes problems that I made a new question about automatic differentiation in ℝ(()), where is just a new formal quantity that isn't nilpotent. This seems to do better, since you get an expansion of derivatives of all orders, and you also get a field. See here: https://math.stackexchange.com/questions/3202846/is-this-elementary-nilpotent-free-approach-to-automatic-differentiation-strong – Mike Battaglia Nov 20 '20 at 19:13

8 Answers8

18

The biggest draw back (and it's a big one) is that the ring of dual numbers is not a field. It has plenty of zero divisors. So, Newton, or any of the mathematicians of the early days of calculus, certainly did not work directly in the ring of dual numbers. They of course did not consider the ring to exist (as rings did not exist at all yet), but from their writing it is clear they envisaged a field of real numbers with, somehow, some notions of infinitesimals. Their work is of course very vague, but correct. Much more on that can be found in math history books. Many interesting discussions can be found in the recent book "Adventures in Formalism", also related to the early days of calculus and how things developed.

Some (rather unsatisfactory) portions of analysis can be developed in the ring of dual numbers, but it does not go too far. The idea, as you say, is very simple, perhaps too simple. One immediately gets into trouble when trying to define the derivative as the quotient of the infinitesimal $f(x+h)-f(x)$ divided by $h$, where $h$ is infinitesimal. The difficulty is that the non-zero infinitesimals in the ring of dual numbers are not invertible. So, it's the end of the party. (As you say though, some aspects of the party remain with automatic differentiation). In some sense, the dual numbers form a first order approximation to actual infinitesimals: The square of an infinitesimal is of an order of magnitude smaller than the infinitesimal you started with, but in the ring of dual numbers, the square of an 'infinitesimal' is precisely $0$. So, in a nonstandard model of the reals you have whole layers of infinitesimals. In the dual numbers there is only one layer, nothing in it is invertible, and they all square to $0$.

The book Models for smooth infinitesimal analysis explores many different models for analysis with infinitesimals. None of them is particularly simple.

Ittay Weiss
  • 76,165
  • 7
  • 131
  • 224
  • 10
    On the other hand, for a polynomial $f$ we have that $f(x+\varepsilon)-f(x)$ equals $f'(x) \cdot \varepsilon$, so that $f'(x)=\frac{f(x+\varepsilon)-f(x)}{\varepsilon}$ in $k[\varepsilon]/\varepsilon^2 [x]$. This also illustrates that the ring of dual numbers is useful in algebraic geometry. – Martin Brandenburg Mar 26 '13 at 09:13
  • 3
    Thanks for the detailed response and the reference, which I'll definitely check out. A question: is there any utility in adjoining an element $\omega = \frac{1}{\epsilon}$ to the ring, which then has the property that $\omega^2 = \infty$ (in the extended real line sense)? This still wouldn't give you a field, but it would at least make the various dual numbers invertible, and having derivatives take values in $\mathbb{R} \cup {\infty}$ doesn't seem that far out there. – Mike Battaglia Mar 26 '13 at 09:19
  • 2
    Quick afterthought since I can't edit: That should say $\mathbb{R} \cup \{\infty, -\infty\}$ above. So you'd get numbers of the form $a + b\epsilon + c\omega$, where $\epsilon^2 = 0$ and $\omega^2 = \infty$ and $-\omega^2 = -\infty$. Not a field, but still seems possibly useful... – Mike Battaglia Mar 26 '13 at 09:25
  • 1
    @MikeBattaglia, you will still have the problem of there not being any layers of infinitesimals. The dual number is a rather crude model for adjoining infinitesimals. As Martin explains in his answer the comment, the dual numbers do find applications in algebraic geometry. But for analysis, it does not look promising. – Ittay Weiss Mar 26 '13 at 09:32
  • Ittay: I understand that the dual numbers are "crude" in that sense, and that the hyperreals have layers and layers of infinitesimals whereas the dual numbers just have a single $\epsilon$\ that squares to zero. But, are you saying that the lack of "layers" in the dual case is the problem that prevents it from doing analysis? I see why the zero divisors would cause a problem, but why the lack of layers specifically? – Mike Battaglia Mar 27 '13 at 11:34
  • 1
    Because it's not a very smooth situation to have $\epsilon$ be small, but positive, and then suddenly $\epsilon ^2$ just vanishes. It shouldn't vanish, but rather becomes considerably smaller. And this goes on. So this going from $\epsilon >0 $ to $\epsilon^2 =0$ is a bit like a jump discontinuity that is built into the foundations of the theory. – Ittay Weiss Mar 27 '13 at 19:12
  • Ah, that it acts like a jump discontinuity is very interesting. Thanks for explaining that. – Mike Battaglia Mar 28 '13 at 03:30
  • Ittay, I'm just coming back to this now, and after all this time I'm still intrigued by what you wrote by saying that $\epsilon^2 = 0$ behaves like a jump discontinuity. Like I mentioned above, I had the idea to come up with this element $\omega$ such that $\omega^2 = \infty$, which is my attempt to come up with a "first approximation to infinite quantities," paralleling how you say the dual numbers are a "first approximation to infinitesimals." I was kind of curious to see what properties these "first approximations" might have. (continued) – Mike Battaglia Aug 16 '13 at 07:09
  • You mention these sorts of things are unlikely to be useful, since they act like "jump discontinuities built directly into the theory." Could you elaborate on that a bit? What specific problems does it cause? Does it somehow cause problems in defining the concept of a continuous function, or something like that? – Mike Battaglia Aug 16 '13 at 07:09
  • @MikeBattaglia I just mean that having $\epsilon >0 $ yet suddenly $\epsilon ^2=0$ is itself a jump. You would expect $\epsilon^2 $ to still be a positive infinitesimal, only one of a smaller order of magnitude than $\epsilon $. You are thus missing an entire order of smallness. You have, in a manner of speaking, one layer of infinitesimals, the numbers $r\epsilon$, yet when you square them they disappear. So, they only capture one layer of what should be just one of a whole spectrum of infinitesimal magnitudes. – Ittay Weiss Aug 16 '13 at 08:14
  • Yes, I understood that, but what I'm saying is, how exactly does this behavior, specifically, cause problems in developing real analysis? Do you have any examples of how this causes things to go wrong? – Mike Battaglia Aug 16 '13 at 08:21
  • 1
    @Mike Battaglia maybe it causes problems with distributiveness of multiplication? $0/0=\epsilon^2 \omega^2=(\epsilon\omega)(\epsilon\omega)=1$. – Anixx Apr 17 '14 at 06:06
  • Do you really mean an order of magnitude smaller? $.001^2=.000001$ which is three orders of magnitude smaller. – isomorphismes Oct 17 '14 at 04:08
  • @isomorphismes the orders of magnitude here refer to infinitesimals being an order of magnitude below the positives reals. In this context .001 and .000000000000001 have the same order of magnitude. – Ittay Weiss Oct 17 '14 at 04:16
  • 3
    The main problem with the dual numbers is not that they are not a field, but that an ordinary function does not extend to the dual numbers, and therefore this approach is not helpful in calculus and analysis. – Mikhail Katz Dec 19 '14 at 09:01
  • This is true, although the dual numbers do extend a huge set of functions - for instance, any piecewise-analytic function immediately extends to the dual numbers. I would expect you can go even further, for instance, and also extend something like the Weierstrass function, as it's just an infinite sum of cosines, converging pointwise each of which has a dual extension. I am somewhat curious how far you can take it, really! – Mike Battaglia Mar 24 '22 at 20:54
8

No for 1. and 3., this ring is not really useful in analysis. But it is quite important for analytical considerations in algebraic geometry, the main reason being that the scheme $\mathrm{Spec}(k[\varepsilon]/\varepsilon^2)$ classifies tangent vectors. This makes it possible to define the tangent space of arbitrary functors $F : \mathsf{CRing} \to \mathsf{Set}$ at some $x \in F(k)$, namely as the fiber of $F(k[\varepsilon]/\varepsilon^2) \to F(k)$ at $x$. There is no manifold which represents tangent vectors for manifolds, so this is the main difference.

Martin Brandenburg
  • 146,755
  • 15
  • 248
  • 458
  • This seems like a great response but unfortunately I'm not able to understand it yet! I've just started Hartshorne now but I'm still working on sheaves and haven't gotten to schemes yet. I'm going to have to come back to this in a few weeks and see if it all makes sense then.. – Mike Battaglia Mar 26 '13 at 09:23
  • Better look up the notion of a derivation and try to prove that homomorphisms of rings $A \to k[\varepsilon]/\varepsilon^2$ correspond 1:1 to pairs consisting of a homomorphism of rings $A \to k$ and a $k$-derivation $A \to k$. This is the observation on which everything else rests. – Martin Brandenburg Mar 26 '13 at 09:51
  • @MikeBattaglia There is a related explanation of [first order infinitesmial symmetries](http://qchu.wordpress.com/2011/02/26/the-quaternions-and-lie-algebras-i/) in one of Qiaochu Yuan's old blogposts that may or may not help you develop a picture of the above idea. – rschwieb Mar 26 '13 at 10:01
  • Thanks for the references, everyone - will check those out. – Mike Battaglia Mar 27 '13 at 21:20
5

In the dual numbers for any differentiable function holds $ f(x+\epsilon) = f(x) + \epsilon f^\prime (x)$. This is enough to handle computationally 1st derivatives. Of course it is not enough for the conventional definition of second derivative. So you can consider the duals as a computational model. Of course, on of the drawbacks is that the purely imaginary numbers are not invertible.

user48672
  • 1,039
  • 7
  • 16
  • 1
    This is a dubious claim. How do you define $f(x+\epsilon)$ for dual numbers if $f$ is an ordinary function? – Mikhail Katz Dec 19 '14 at 08:59
  • Well you can extend the function to the duals. On the real axis of the dual plane it will retain the same values as the function prototype acting on the reals. Automatic differentiation algorithms rely exactly on this extension. – user48672 Dec 20 '14 at 00:28
  • What does "retaining the same values" mean exactly? Does it mean that $f(x+\epsilon)=f(x)$? – Mikhail Katz Dec 20 '14 at 19:48
  • It means $f ( x + \epsilon . 0) = f (x)$. Think about $\epsilon $ as an imaginary unit similar to $i$. – user48672 Dec 21 '14 at 00:18
  • 2
    You can define it that way but this definition is useless for defining the derivative of $f$, because according to your definition the derivative $\frac{f(x+dx)}{dx}$ will be always zero, with $dx$ the nilpotent infinitesimal. – Mikhail Katz Dec 21 '14 at 09:35
  • As I mentioned $\epsilon$ does not have an inverse element. Therefore, you can't have $f(x+\epsilon) \over \epsilon$. You can check the validity of $f(x+\epsilon) = f(x) + \epsilon f^\prime (x)$ if you evaluate it on a polynomial (simplest case). For example $(x+ \epsilon)^2 = x^2 + 2 x \cdot \epsilon$. So one can recognize the derivative in the $2 x$ term. – user48672 Dec 21 '14 at 15:43
  • 2
    you missed the point. As in Smooth Infinitesimal Analysis, one could define the derivative $L$ by $f(x+\epsilon)= f(x)+L\epsilon$. However, unlike Smooth Infinitesimal Analysis, there does not seem to be any useful way of defining $f$ at $x+\epsilon$ in the dual numbers. – Mikhail Katz Dec 22 '14 at 09:37
  • As you point out for polynomials one can do this algebraically but that's not very interesting or helpful, and does not seem to justify nilpotence at any rate. – Mikhail Katz Dec 22 '14 at 10:13
  • You jump to conclusions too soon. From the polynomial identity it folows that one could handle exponents, tigonometric functions and logarithmes. Roots come almost for free from baisc identities. So this gives reasonably broad toolbox for computations. For a lot ot people this is a good enough reason to use dual numbers. I don't know what you would have in mind. – user48672 Dec 22 '14 at 23:23
  • I do not compare this approach with SIA. I only claim that dual numbers and their cousins provide reasonable computational models. That is NSA in an elementary fashion. – user48672 Dec 22 '14 at 23:26
  • 2
    That would be nice and Terry Tao has some ideas in this direction (he has a page on his blog called "cheap version of nonstandard analysis" or something like that). The problem is that as far as I can see the approach with the dual numbers does not work. I like the idea of finding cheap versions but Tao's is still considerably more involved than dual numbers. Again the difference is that Tao's approach works. – Mikhail Katz Dec 23 '14 at 09:15
  • Here is an article that provides some results using cheap nsa: https://arxiv.org/abs/1804.09746 – Quant Christo Jan 27 '20 at 16:39
5

The answers by Ittay Weiss and Martin Brandenburg are helpful. I would like to point out a more direct shorcoming of the dual numbers as far as analysis (and even calculus) is concerned is that it is not clear how to extend a generic real function to the dual numbers, even say a $C^\infty$ smooth function. Thus, if one wishes to form a ratio of infinitesimals involved in the definition of the derivative, it is not clear what should appear in the numerator. Over the hyperreals, one has a systematic way of extending every real function to the wider hyperreal domain, and the transfer principle (which is arguably a formalisation of the Leibnizian Law of Continuity) ensures that such an extension is meaningful.

For this reason, the answer to the original question would be: No, dual numbers are insufficient to capture "Newton's historical method for doing calculus". The hyperreals provide a framework where the procedures of 17th century infinitesimal calculus can be successfully formalized.

Mikhail Katz
  • 35,814
  • 3
  • 58
  • 116
  • I never saw this before, but since I'm seeing it now: I mentioned above, in my comments to Ittay, an algebraic structure where $\omega = 1/\epsilon$ is defined, with the property $\omega^2 = \infty$ and $-\omega^2 = -\infty$, taken from the extended real line $\mathbb{R} \cup \{\infty, -\infty\}$. Thus, every number in this system is of the form $a+b\epsilon+c\omega$, other than the special numbers $\infty$ and $-\infty$. This makes all sorts of things invertible which weren't invertible before. Would this still pose a problem in extending generic real functions to the dual numbers? – Mike Battaglia Aug 16 '13 at 07:01
  • 3
    @Mike: One immediate problem is that the condition $\epsilon^2=0$ would not allow us to carry out the usual algebraic simplifications on the relation $(\epsilon\omega)^2=1$. Thus one of the ordinary rules of algebra has to break down, which is inconvenient in a number system. In the hyperreal framework all ordinary rules of algebra continue to hold over the extended domain. – Mikhail Katz Aug 16 '13 at 07:49
  • Aha, great point. I never saw that before. Thanks. – Mike Battaglia Aug 16 '13 at 08:02
  • Just thinking about this again after many years - it would seem any analytic function can be extended to the dual numbers, since you can just use the Taylor series and formally plug in, for instance $1+\epsilon$ or whatever. Then using this and the ordering on the dual numbers, you can get the same thing for piecewise-analytic functions. But it isn't quite the same as nonstandard analysis in that, for instance, if you try $1/(1-(1+\epsilon))$ you get $1 + (1+\epsilon) + (1+\epsilon)^2 + ... = 1 + (1+\epsilon) + (1+2\epsilon) + (1+3\epsilon)$ and thus it doesn't converge. – Mike Battaglia Mar 24 '22 at 21:03
5

I think no one mentioned Synthetic differential geometry, there you have nonzero quantities with $dx^2=0$. For a very readable introduction I suggest:

Bell, A primer of infinitesimal Analysis

  • 5
    Note that these quantities aren't actually nonzero. What one can say about them is that one cannot prove that they are zero; nor can one prove that they are nonzero. This is of course only possible when the background logic is intuitionistic. – Mikhail Katz Nov 28 '13 at 19:14
  • Right. But at least one can prove that the set of all nilsquare elements does not reduce to the set $\{0\}$ – Michael Bächtold Nov 28 '13 at 21:44
  • 1
    How does that go? Is this really a "set" in the traditional sense? I recall that one needs a more sophisticated topos theory setting to make this work. Does one get a nonempty set of which one cannot exhibit any element? – Mikhail Katz Nov 29 '13 at 08:22
  • 1
    @user72694 have a look at page 5 of these notes: http://home.sandiego.edu/~shulman/papers/sdg-pizza-seminar.pdf – Michael Bächtold Nov 29 '13 at 17:39
  • @user72694, just curious, why do you need intuitionistic logic to have it so you can't prove the quantities are zero, nor can you prove they're nonzero? For instance, in ZFC, you can't prove that $|\omega_1| = |\Bbb R|$, nor can you prove that $|\omega_1| \neq |\Bbb R|$, and the background logic is classical. – Mike Battaglia Nov 24 '15 at 20:06
  • @MikeBattaglia: the notes of Mike Shulman (linked above) give one reason: if we allow the law of the excluded middle, then from the first axiom of synthetic differential calculus it would follow that $R=0$. – Michael Bächtold Nov 24 '15 at 21:56
4

Duals Numbers, attributed to Eduard Study, are already practically used, for example here:
https://github.com/JuliaDiff/DualNumbers.jl

From my point of view division seems not so much a problem, since it fails when ordinary division would also fail. Here is a set of arithmetic operations defined for duals, I am writing (x,y) instead of x+εy:

$$-(x,y) = (-x,-y)$$

$$(x,y)+(z,t) = (x+z,y+t)$$

$$(x,y)*(z,t) = (x*z,x*t+y*z)$$

$$\frac{(x,y)}{(z,t)} = (\frac{x}{z}, \frac{y*z-x*t}{z^2})$$

I guess the claim that duals cannot be used to define derivative, stems from a confusion with Jerome Keislers standard part. He writes translated to dual equations the following, and division is exactly the problem:

f((x,h)) - f((x,0))
------------------- = (f'(x), e) /* doesn't work */
      (0,h)

But if we use the hypothesis:

f((x,y)) = (f(x), f'(x)*y)

We then find the following by using this hypothesis and the aforementioned arithmetic operations:

f((x,h)) - f((x,0))
------------------- = (0, f'(x))  /* works */
      (h,0)

And if this isn't convincing enough, we can also use the hypothesis to show, that duals reflect the chain rule:

f(g( (x,1) )) = f( (g(x), g'(x)) )

              = ( f(g(x)), f'(g(x))*g'(x) )

Hyperduals are an extension of duals where second or higher order derivatives can be also calculated.

But currently I rather would wish for duals that can compute f(x+) and f(x-) for me, i.e. left and right derivative. Currently experimenting.

3

Yes... and no....

On the one hand, the dual numbers $\mathbb{R}[\epsilon] / (\epsilon^2)$ are a topological ring, and the projection $\mathbb{R}[\epsilon] / (\epsilon^2) \to \mathbb{R}$ is a vector bundle over the real line. In fact, it is isomorphic (as a vector bundle) to the tangent bundle $T\mathbb{R}$ and to the cotangent bundle $T^*\mathbb{R}$ in a rather suggestive way.

On the other hand, aside from being a neat way to differentiate polynomials, I'm not sure it actually does anything for you. e.g. while it's interesting to organize derivative information such as $\log(x+\epsilon y) = \log(x) + \epsilon \frac{y}{x}$, I don't know if it actually does anything to help you derive such formula.

On the cotangent side, matters are worse — I'm not aware of anything the dual numbers could do for you that the exterior algebra doesn't do as well or better.

  • If you assume that your function is analytic then the result follows trivially. This is not very restrictive for applications because the special functions are analytic. – user48672 Jan 30 '20 at 23:02
0

This seems to be generating a lot of confusion.

It is simply that f(x+he)=f(x)+he df/dx where ee=0 ; algebraically and precisley for Taylor expandable differentiable functions.(ie analytic functions)

You can go through the table of all the elementary functions and write out the answer using this algebraic definition. Also proove all properties of derivatives. It is no better nor worse than the Lagrange definition of derivative, of whatever is sitting on the second term of the expansion.

Probably good for educational purposes, no torture using limit concept, the derivative is simply there. Of course analyists may misunderstand the algebra part. It is a ring, not a field. The he part is an ideal of the ring.

J. weisz
  • 29
  • 1