24

I am trying to insert a large(-ish) number of elements in the shortest time possible and I tried these two alternatives:

1) Pipelining:

List<Task> addTasks = new List<Task>();
for (int i = 0; i < table.Rows.Count; i++)
{
    DataRow row = table.Rows[i];
    Task<bool> addAsync = redisDB.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
    addTasks.Add(addAsync);
}
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);

2) Batching:

List<Task> addTasks = new List<Task>();
IBatch batch = redisDB.CreateBatch();
for (int i = 0; i < table.Rows.Count; i++)
{
    DataRow row = table.Rows[i];
    Task<bool> addAsync = batch.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
    addTasks.Add(addAsync);
}
batch.Execute();
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);

I am not noticing any significant time difference (actually I expected the batch method to be faster): for approx 250K inserts I get approx 7 sec for pipelining vs approx 8 sec for batching.

Reading from the documentation on pipelining,

"Using pipelining allows us to get both requests onto the network immediately, eliminating most of the latency. Additionally, it also helps reduce packet fragmentation: 20 requests sent individually (waiting for each response) will require at least 20 packets, but 20 requests sent in a pipeline could fit into much fewer packets (perhaps even just one)."

To me, this sounds a lot like the a batching behaviour. I wonder if behind the scenes there's any big difference between the two because at a simple check with procmon I see almost the same number of TCP Sends on both versions.

CyberDude
  • 7,397
  • 5
  • 24
  • 43

1 Answers1

31

Behind the scenes, SE.Redis does quite a bit of work to try to avoid packet fragmentation, so it isn't surprising that it is quite similar in your case. The main difference between batching and flat pipelining are:

  • a batch will never be interleaved with competing operations on the same multiplexer (although it may be interleaved at the server; to avoid that you need to use a multi/exec transaction or a Lua script)
  • a batch will be always avoid the chance of undersized packets, because it knows about all the data ahead of time
  • but at the same time, the entire batch must be completed before anything can be sent, so this requires more in-memory buffering and may artificially introduce latency

In most cases, you will do better by avoiding batching, since SE.Redis achieves most of what it does automatically when simply adding work.

As a final note; if you want to avoid local overhead, one final approach might be:

redisDB.SetAdd(string.Format(keyFormat, row.Field<int>("Id")),
    row.Field<int>("Value"), flags: CommandFlags.FireAndForget);

This sends everything down the wire, neither waiting for responses nor allocating incomplete Tasks to represent future values. You might want to do something like a Ping at the end without fire-and-forget, to check the server is still talking to you. Note that using fire-and-forget does mean that you won't notice any server errors that get reported.

Marc Gravell
  • 927,783
  • 236
  • 2,422
  • 2,784
  • Re: the final approach. If using `SetAddAsync + FireAndForget` + final `Ping`. Excluding case of unknown transient errors, would the Sets be guaranteed to be added to by the time `Ping` completes? Or could they arrive out of order? – ttugates May 02 '21 at 15:08
  • 1
    @ttugates assuming we aren't talking about "cluster", order should *currently* be guaranteed either way; however, I want to introduce a new opt-in "pooled" mode, where-by we use more connections to avoid big pile-ups when something goes wrong. In that usage: batching would guarantee order, but non-batching could use multiple connections with no order guarantees. That mode would be opt-in, because of this semantic change. – Marc Gravell May 02 '21 at 16:31