0

I have a db table with 2M entries

My XPositions table structure is

Id - int
FID - int
CoordinateQue - int
Latitude - float
Longitude - float

Each row represents a marker position and I need to calculate distance between each coordinates and save to another table.

My xWeights table structure is;

Id - int
x_Id - int
Tox - int
Distance - decimal(18,8)

So far my working code is

var query = _xRepository.TableNoTracking;
var xNodes = query.ToList()
var n = new xWeights();

foreach (var x in xNodes)
{
    for (var i = 0; i < xNodes.Count; i++)
    {
        if(x.Id == xNodes[i].Id)
        {
            //Do nothing - Same Node
        }
        else
        { 
        var R = 6378137; 
        var φ1 = (Math.PI / 180) * x.Latitude;
        var φ2 = (Math.PI / 180) * xNodes[i].Latitude;
        var Δφ = (xNodes[i].Latitude - x.Latitude) * (Math.PI / 180);
        var Δλ = (xNodes[i].Longitude - x.Longitude) * (Math.PI / 180);
        var Δψ = Math.Log(Math.Tan(Math.PI / 4 + φ2 / 2) / Math.Tan(Math.PI / 4 + φ1 / 2));
        var q = Math.Abs(Δψ) > 10e-12 ? Δφ / Δψ : Math.Cos(φ1); // E-W course creates problem with 0/0
        // if Longitude over 180° take shorter rhumb line across the anti-meridian:
        if (Math.Abs(Δλ) > Math.PI) Δλ = Δλ > 0 ? -(2 * Math.PI - Δλ) : (2 * Math.PI + Δλ);
        var dist = (Math.Sqrt(Δφ * Δφ + q * q * Δλ * Δλ)) * R;

        n.x_Id = x.Id;
        n.Tox = xNodes[i].Id;
        n.Distance = dist;

            _xWeightsRepository.Insert(n);
        }
    }
}

My problem is; I am getting approximately 35k records per minute so will be 2.1M record per hour. This will take forever to finish this. Any ideas how to improve the performance?

abatishchev
  • 92,232
  • 78
  • 284
  • 421
Bulut Kartal
  • 83
  • 1
  • 20
  • Yes. Stop doing the calculation row by agonizing row. You should do this directly in t-sql. Then think about what you need to do to a column and not what you want to do to each row. But you said you have 2 million rows and it will take an hour to complete. That isn't forever unless this is behind a button click or something that is in the application. ;) – Sean Lange Mar 30 '18 at 19:22
  • I have no idea how to do it in sql. Unfortunately it is in application. yes I have 2M records which will pair with 2M records. So this way it will almost take 2M hours :) – Bulut Kartal Mar 30 '18 at 19:38
  • You're funny. You complain that it takes so long, when you perform **ONE** insert operation per inner loop? Of course it takes forever, what did you expect? – Tseng Mar 30 '18 at 19:44
  • @Tseng what is your suggestion? Adding to a list and do the insert that way? – Bulut Kartal Mar 30 '18 at 19:57
  • It may be one improvement. Please note that `.AddRange` is much more performant than `.Add` because it invokes the change tracker only once, rather than on every call. However, the fact alone that you are fetching 2m records into memory requires considerable amount of time (transferiing and serializing the data), then adding it back consumes more memory too (because of tracking done by DbContext). Most performant is indeed to perform it on database and insert it there too (stored procedure). If you are willing to move business logic there is a different topic – Tseng Mar 30 '18 at 20:04
  • I am really bad on stored procedures. just basic stuff on sql :) Have no idea how to do it... – Bulut Kartal Mar 30 '18 at 20:48
  • How often do you have to do this? PS You could set `n.x_Id` in the outside loop. – NetMage Mar 30 '18 at 20:49
  • @NetMage probably once in my lifetime and never again :))) – Bulut Kartal Mar 30 '18 at 20:58
  • Yeah for a one off thing who cares? You have something that works, just run it during a maintenance window. – Sean Lange Mar 30 '18 at 21:01
  • @SeanLange I will need to wait for 2M hours to complete :) I don't think I will live that long :) 228 years :) – Bulut Kartal Mar 30 '18 at 21:04
  • @Tseng I tried adding to list but since I have a lot records I had out of memory exception :) – Bulut Kartal Mar 30 '18 at 21:12
  • Adding to list then insert made it slower. any ideas how can I do it in sql? – Bulut Kartal Mar 30 '18 at 22:38
  • 1
    @BulutKartal Are you using LINQ to SQL or Entity Framework 6 or EF Core? Did you do `AddRange` to the `DBSet`? See [this](https://stackoverflow.com/a/5942176/2557128) answer. – NetMage Mar 30 '18 at 22:51
  • @NetMage I use Entity Framework 6, I will check, Ty! – Bulut Kartal Mar 30 '18 at 22:57
  • I used .BulkInsert(List) it is a huge improvement but still not good enough. Reduced to 3 years from 228 years:) – Bulut Kartal Mar 31 '18 at 04:11

1 Answers1

2

The problem is not with this function, but with what you are trying to achieve.

You are trying to insert every from-to combination into _xWeightsRepository. If there are 2 million nodes, then that means 4 thousand billion weights.

If you could insert a weight per CPU clock cycle (which is several orders of magnitude faster than you could ever actually hope to achieve) then you'll still be waiting ten or twenty years.

Check out SQL spatial indexes. I'm going to take a guess that your answer lies in that direction: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-spatial-index-transact-sql

Peter Aylett
  • 690
  • 3
  • 7