7

[Prologue]

This Q&A is meant to explain more clearly the inner working of my approximations search class which I first published here

I was requested for more detailed info about this few times already (for various reasons) so I decided to write Q&A style topic about this which I can easily reference in the future and do not need to explain it over and over again.

[Question]

How to approximate values/parameters in Real domain (double) to achieve fitting of polynomials,parametric functions or solve (difficult) equations (like transcendental) ?

Restrictions

  • real domain (double precision)
  • C++ language
  • configurable precision of approximation
  • known interval for search
  • fitted value/parameter is not strictly monotonic or not function at all
Community
  • 1
  • 1
Spektre
  • 41,942
  • 8
  • 91
  • 312
  • 1
    Dippy question: Why are Runge-Kutta or Newton-Raphson methods not applicable here? – scooter me fecit Mar 22 '16 at 20:12
  • @ScottM aren't they limited only to functions? – Spektre Mar 22 '16 at 20:17
  • @Spektre: If you're computing the first derivative (delta y's), then integrating via Runge-Kutta or approximating via Newton-Raphson might be a better choice. There are also adaptive step size variants. – scooter me fecit Mar 22 '16 at 21:41
  • NR is extremely sensitive to oscillations, rounding errors, and needing pretty good initial guesses. RK far less so. – Mike 'Pomax' Kamermans Mar 22 '16 at 21:42
  • 1
    I'm voting to close this question as off-topic because SO is for Q&A, not discussion. – duffymo Mar 22 '16 at 23:50
  • 1
    "non-monotonic" is perfectly fine; "not a function" makes no sense. if it's not a function, what are we approximating here? I assume you meant non-monotonic, continuous function not necessarily having a closed-form definition. – Will Ness Apr 02 '16 at 09:10

2 Answers2

6

Approximation search

This is analogy to binary search but without its restrictions that searched function/value/parameter must be strictly monotonic function while sharing the O(log(n)) complexity.

For example Let assume following problem

We have known function y=f(x) and want to find x0 such that y0=f(x0). This can be basically done by inverse function to f but there are many functions that we do not know how to compute inverse to it. So how to compute this in such case?

knowns

  • y=f(x) - input function
  • y0 - wanted point y value
  • a0,a1 - solution x interval range

Unknowns

  • x0 - wanted point x value must be in range x0=<a0,a1>

Algorithm

  1. probe some points x(i)=<a0,a1> evenly dispersed along the range with some step da

    So for example x(i)=a0+i*da where i={ 0,1,2,3... }

  2. for each x(i) compute the distance/error ee of the y=f(x(i))

    This can be computed for example like this: ee=fabs(f(x(i))-y0) but any other metrics can be used too.

  3. remember point aa=x(i) with minimal distance/error ee

  4. stop when x(i)>a1

  5. recursively increase accuracy

    so first restrict the range to search only around found solution for example:

    a0'=aa-da;
    a1'=aa+da;
    

    then increase precision of search by lowering search step:

    da'=0.1*da;
    

    if da' is not too small or if max recursions count is not reached then go to #1

  6. found solution is in aa

This is what I have in mind:

img1

On the left side is the initial search illustrated (bullets #1,#2,#3,#4). On the right side next recursive search (bullet #5). This will recursively loop until desired accuracy is reached (number of recursions). Each recursion increase the accuracy 10 times (0.1*da). The gray vertical lines represent probed x(i) points.

Here the C++ source code for this:

//---------------------------------------------------------------------------
//--- approx ver: 1.01 ------------------------------------------------------
//---------------------------------------------------------------------------
#ifndef _approx_h
#define _approx_h
#include <math.h>
//---------------------------------------------------------------------------
class approx
    {
public:
    double a,aa,a0,a1,da,*e,e0;
    int i,n;
    bool done,stop;

    approx()            { a=0.0; aa=0.0; a0=0.0; a1=1.0; da=0.1; e=NULL; e0=NULL; i=0; n=5; done=true; }
    approx(approx& a)   { *this=a; }
    ~approx()           {}
    approx* operator = (const approx *a) { *this=*a; return this; }
    //approx* operator = (const approx &a) { ...copy... return this; }

    void init(double _a0,double _a1,double _da,int _n,double *_e)
        {
        if (_a0<=_a1) { a0=_a0; a1=_a1; }
        else          { a0=_a1; a1=_a0; }
        da=fabs(_da);
        n =_n ;
        e =_e ;
        e0=-1.0;
        i=0; a=a0; aa=a0;
        done=false; stop=false;
        }
    void step()
        {
        if ((e0<0.0)||(e0>*e)) { e0=*e; aa=a; }         // better solution
        if (stop)                                       // increase accuracy
            {
            i++; if (i>=n) { done=true; a=aa; return; } // final solution
            a0=aa-fabs(da);
            a1=aa+fabs(da);
            a=a0; da*=0.1;
            a0+=da; a1-=da;
            stop=false;
            }
        else{
            a+=da; if (a>a1) { a=a1; stop=true; }       // next point
            }
        }
    };
//---------------------------------------------------------------------------
#endif
//---------------------------------------------------------------------------

This is how to use it:

approx aa;
double ee,x,y,x0,y0=here_your_known_value;
//            a0,  a1, da,n, ee  
for (aa.init(0.0,10.0,0.1,6,&ee); !aa.done; aa.step())
    {
    x = aa.a;        // this is x(i)
    y = f(x)         // here compute the y value for whatever you want to fit
    ee = fabs(y-y0); // compute error of solution for the approximation search
    }

in the rem above for (aa.init(... are the operand named. The a0,a1 is the interval on which the x(i) is probed, da is initial step between x(i) and n is the number of recursions. so if n=6 and da=0.1 the final max error of x fit will be ~0.1/10^6=0.0000001. The &ee is pointer to variable where the actual error will be computed. I choose pointer so there are not collisions when nesting this and also for speed as passing parameter to heavily used function creates heap trashing.

[notes]

This approximation search can be nested to any dimensionality (but of coarse you need to be careful about the speed) see some examples

In case of non-function fit and the need of getting "all" the solutions you can use recursive subdivision of search interval after solution found to check for another solution. See example:

What you should be aware of?

you have to carefully choose the search interval <a0,a1> so it contains the solution but is not too wide (or it would be slow). Also initial step da is very important if it is too big you can miss local min/max solutions or if too small the thing will got too slow (especially for nested multidimensional fits).

Spektre
  • 41,942
  • 8
  • 91
  • 312
1

a combination of secant (with bracketing, but see correction at the bottom) and bisection method is much better:

enter image description here

we find root approximations by secants, and keep the root bracketed as in bisection.

enter image description here

always keep the two edges of the interval so that the delta at one edge is negative, and at the other it is positive, so the root is guaranteed to be inside; and instead of halving, use the secant method.

Pseudocode:

given a function f
given two points a, b, such that a < b and sign(f(a)) /= sign(f(b))
given tolerance tol
find root z of f such that abs(f(z)) < tol     -- stop_condition

DO:
    x = root of f by linear interpolation of f between a and b
    m = midpoint between a and b

    if stop_condition holds at x or m, set z and STOP

    [a,b] := [a,x,m,b].sort.choose_shortest_interval_with_
                                   _opposite_signs_at_its_ends

This obviously halves the interval [a,b], or does even better, at each iteration; so unless the function is extremely bad behaving (like, say, sin(1/x) near x=0), this will converge very quickly, taking only two evaluations of f at the most, for each iteration step.

And we can detect the bad behaving cases by checking that b-a not becomes too small (esp. if we're working with finite precision, as in doubles).

update: apparently this is actually double false position method, which is secant with bracketing, as described by the pseudocode above. Augmenting it by the middle point as in bisection ensures convergence even in the most pathological cases.

Will Ness
  • 62,652
  • 8
  • 86
  • 167
  • credit for the original graphics of course due to user Spektre in their answer above. – Will Ness Apr 16 '17 at 18:51
  • Could (how?) this be nested in the same manner as Spektre's solution? For instance, I'm using that solution (or, something very close, anyhow) to solve a 3-dimensional localization problem. How would one nest something like this (here's an example of a 2-d nesting of Spektre's solution for solving this localization problem, which I've extended to 3-d: https://pastebin.com/nQpX2hrX) – KeithMadison Oct 22 '20 at 13:28
  • https://en.wikipedia.org/wiki/Secant_method#Generalizations – Will Ness Oct 22 '20 at 16:44
  • @KeithMadison for higher dimensions the simplest method is bisection. as for 1d we bracket a root by two points `f(x +- delta)`, for 2d it's 4 points `f(x +- delta, y +- delta)`, for 3d -- 8 points `f(x +- delta, y +- delta, z +- delta)`, etc. then we calculate `f(x,y)` and choose the subdivision with opposite signs of errors, so that the true root is inside it. this can be improved by adding an analog of secant, like gradient descent. and apparently this is actually double false position method, not secant. I've updated the answer. – Will Ness Oct 23 '20 at 11:19
  • Apologies, could you elaborate a bit? I'm not entirely sure I follow (I guess, I find this sort of thing difficult to imagine). Or, at least, I get the big idea, but I'm not sure about the details. So, with OP's approx. class, I can solve a time of arrival localization problem in 2d/3d by choosing some search interval, calculating the error with each step by computing time of arrival for the current 'best point', and feeding that back into the algorithm. It isn't entirely clear to me how something of that nature translates to this method. It's probably simple- I just can't make the connection. – KeithMadison Nov 14 '20 at 04:35
  • I'm imagining `f_i(x, y, z) = time of arrival at station i`, and there would be four functions for four stations in a three dimensional solution (referring to the localization problem, since I feel I understand it somewhat well). But, we now have four equations. – KeithMadison Nov 14 '20 at 04:42
  • first we go downhill by some interval as OP describes (but better by _adaptive_ interval) until we find the sign of delta has flipped. btw how do you go downhill in 3D? you calculate the deltas at 8 corners of the cube, and go from there by gradient descent. if you always pick one of the three directions you just walk in zigzag, but it also works I guess. bisection comes into play when the delta sign at at least one of the corners is flipped. this means zero is inside the cube. halve the cube's edges, you get 8 sub-cubes. choose one with flipped delta signs at corners, & recurse. @KeithMadison – Will Ness Nov 14 '20 at 08:23
  • you can improve upon halving, with secant, using linear interpolation for the 8 delta values,(\*) but always taking care of bracketing, i.e. always make sure the signs of the chosen cube are actually flipped (i.e. some are positive, some negative). --- (\*) using some linear algebra, matrices and such. for 1D it's a point. for 2D its a line, and for 3D it's a plane. calculate that plane's intersections with your original cube and use those points as corners of new, reduced cube (unless the halved one gives you a smaller candidate, perhaps then choose that one). something like that. – Will Ness Nov 14 '20 at 08:37
  • re "gradient descent" - it gives you both direction and suggests the next distance to jump by. use the direction with same old step, or use the new step too. if the step is greater than the cube's size, it could be a wrong jump. you usually want your bracketing cube (that's *interval*, `x-delta....x+delta`, in 1D case) to shrink, or at least not to grow, at each step. so have your search always check all these things. ---- it's been a long time since I did this thing; so maybe bracketed gradient descent is the 3D equivalent of what I've described in the answer. – Will Ness Nov 14 '20 at 08:37