3

I'm trying to produce a Kusto query to measure the "own" duration of the requests (subtracting out durations of dependencies). However, I can't really figure out how to work this out through a pure Kusto query.

To better understand what would would expected, below a sample case:

High level view (where R is the request and Dx the dependencies)

 R    =============================== (31ms)
 D1     ******* (7ms)
 D2          ******** (8ms)
 D3                        ****** (6ms)
 D4                          ** (2ms)
 D5         **** (4ms)

 Proj ==*************======******====
  • D1 overlaps D2 during 2ms
  • D5 and D4 shouldn't be taken into account as completely overlapped by other dependencies
  • Proj being a projection of a potential intermediate step where only meaningful dependencies segments are shown

Given the following testbed dataset

let reqs = datatable (timestamp: datetime, id:string, duration: real)
[
  datetime("2020-12-15T08:00:00.000Z"), "r1", 31    // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
  datetime("2020-12-15T08:00:00.002Z"), "r1", 7,    // D1
  datetime("2020-12-15T08:00:00.007Z"), "r1", 8,    // D2
  datetime("2020-12-15T08:00:00.021Z"), "r1", 6,    // D3
  datetime("2020-12-15T08:00:00.023Z"), "r1", 2,    // D4
  datetime("2020-12-15T08:00:00.006Z"), "r1", 4,    // D5
];

In this particular case, the Kusto query, joining the two data tables, should be able to retrieve 12 (duration of the request, removing all dependencies), ie.

Expected total duration = 31 - (7 + 8 - 2) - (6) = 12

Any help to move this forward would be greatly appreciated <3

Tomer Shetah
  • 7,646
  • 6
  • 20
  • 32
nulltoken
  • 55,645
  • 19
  • 127
  • 125

2 Answers2

2

I succeeded to solve that using that using row_window_session(). This is a Window function. You can read more about it at Window functions overview.

The solution is:

let reqs = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
  datetime("2020-12-15T08:00:00.000Z"), "r1", 31    // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
  datetime("2020-12-15T08:00:00.002Z"), "r1", 7,    // D1
  datetime("2020-12-15T08:00:00.007Z"), "r1", 8,    // D2
  datetime("2020-12-15T08:00:00.021Z"), "r1", 6,    // D3
  datetime("2020-12-15T08:00:00.006Z"), "r1", 4,    // D5
  datetime("2020-12-15T08:00:00.023Z"), "r1", 2,    // D4
];
deps
| extend endTime = timestamp + totimespan(duration * 10000)
| sort by timestamp asc 
| serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
| summarize max(endTime) by operation_ParentId, SessionStarted
| extend diff = max_endTime - SessionStarted
| summarize todouble(sum(diff)) by operation_ParentId
| join reqs on operation_ParentId
| extend diff = duration - sum_diff / 10000
| project diff 

The idea here is to sort the entries by the open time, and as long as the next previous end time is later than the current start time, we don't open a new session. Let's explain each line of this query to see how this is being done:

  1. Calculate the endTime based on the duration. To normalize the data I'll multiply by 10000 the duration:
    | extend endTime = timestamp + totimespan(duration * 10000)
    
  2. Sort by start time:
    | sort by timestamp asc 
    
  3. This is the key of this solution. It is calculated on the timestamp column. The next two parameters are limits when to start new buckets. Since we don't want to seal a bucket based on time that have passed, I provided 1 hour which will not hit with this input. The forth argument helps us to create a new session based on the data. As long as there are more rows that will result in timestamp > prev(endTime) they will have the same start time.
    | serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
    
  4. Now we have multiple rows per session start. So we want to keep only the latest time per session. We also keep operation_ParentId to later on join on that key:
    | summarize max(endTime) by operation_ParentId, SessionStarted
    
  5. Calculate the time of each session:
    | extend diff = max_endTime - SessionStarted
    
  6. Sum up all session times:
    | summarize todouble(sum(diff)) by operation_ParentId
    
  7. Join on req to get the total starting time:
    | join reqs on operation_ParentId
    
  8. Calculate the diff between the total time and the session times. Unnormalize the data:
    | extend diff = duration - sum_diff / 10000
    
  9. Project the final result:
    | project diff 
    

You can find this query running at Kusto Samples open database.

Having said that, please note that this is a linear operation. Meaning that if there are 2 following segments, that should be under the same segment, but they do not intersect, it will fail. For example, adding the following into deps:

datetime("2020-12-15T08:00:00.026Z"), "r1", 1,    // D6

which should not add anything to the calculation, cause it to misbehave. This is because d4 is the previous point, and it has no point of contact with d6, although d3 covers them both. To solve that, you need to repeat the same logic of steps 3-5. Unfortunately Kusto does not have recursions, therefore you cannot solve this for any kind of input. But assuming there are no really depth such cases that breaks this logic, I think it is good enough.

Tomer Shetah
  • 7,646
  • 6
  • 20
  • 32
0

Take a look at the query below to see if it can meet your requirement:

let reqs = datatable (timestamp: datetime, id:string, duration: real, key1:string)
[
  datetime("2020-12-15T08:00:00.000Z"), "r1", 31 , "k1"   // R
];

let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real,name:string)
[
  datetime("2020-12-15T08:00:00.002Z"), "r1", 7, "D1", 
  datetime("2020-12-15T08:00:00.007Z"), "r1", 8, "D2", 
  datetime("2020-12-15T08:00:00.021Z"), "r1", 6, "D3", 
  datetime("2020-12-15T08:00:00.023Z"), "r1", 2, "D4", 
  datetime("2020-12-15T08:00:00.006Z"), "r1", 4, "D5"
];

let d2 = deps
| where name !in ("D4","D5")
| summarize a=sum(duration)-2
| extend key1="k1";

reqs 
| join d2 on key1
| extend result = duration - a
| project result

Test result:

enter image description here

Ivan Yang
  • 29,072
  • 2
  • 13
  • 31
  • Thanks for that. However, the question was about how to "measure the "own" duration of the requests (substracting out durations of dependencies)" in general, not how to solve _this_ particular example. Updating the question to make the intent more clear. – nulltoken Dec 16 '20 at 13:58